Friday, October 18, 2013

Dialog

Our previous readings have largely focused on one-off or one-sided interactions.  This week we will think about dialogue, where the agent and human interact continuously using language.  Our first reading will be a philosophical paper by Paul Grice.  The reading for Tuesday's class will be a review paper covering the POMDP approach to dialogue systems, along with a philosophical paper that has influenced a lot of the thinking about dialogue.
For Thursday's class, we will read Adam Vogel's recent work about the emergence of Gricean maxims from DEC-POMDP dialogue models:

Please post on the blog a comment of about 250 words pointing out one advantage and one disadvantage of the POMDP approach to dialogue.  Post this response by 5pm on Sunday evening.  Then by 7am Tuesday morning, post a response to someone else's comment on the blog.

32 comments:

  1. Here is my elaboration as to one strength and one weakness of the POMDP approach to dialogue.

    Let’s start with a strength. The POMDP’s use of belief states allows it to keep track of distributions over multiple states in parallel. Its ability to do this allows it to consider dialogue possibilities over multiple states at once. To demonstrate this strength, let us contrast it with a deterministic system where only the one state is assumed at each time step. If the user conversing with the system has a sudden change of topic, or pushes the dialogue in a direction that is better represented by another state not chosen by the system, the system would have to backtrack in its steps to see which system it should have chosen. With the stochastic approach that the POMDP uses, the probability distributions of all the states are always known. Therefore when the user reports of a problem, then the probability of the most likely state simply reduces, and a new state (now with a higher probability) is chosen. This is only made possible by tracking the belief distributions of all states in parallel.


    A weakness of the POMDP is present when we apply the system to real world situations of great complexity. Modeling a real life scenario into a set of state-action matrices is not a trivial problem. If you model the world correctly by using many states and actions in your state-action space, then computation for POMDP becomes very complex and expensive. To exacerbate the problem, real life dialogue applications require real time responses (would not be much of a dialogue if one participant has to wait forever for a response). To by-pass this problem, one may need to use approximation techniques, such as the N-best or Bayesian network approaches presented in this paper.

    ReplyDelete
  2. They mention a few strengths, all of which seem compelling, but I'm not completely convinced that the POMDP is *the* model we should use for dialogue systems (see: weakness).

    (strength) One feature of the POMDP that makes it a natural fit for a dialogue system is that the belief state effectively models the uncertainty inherent in natural (spoken) language. As the paper mentions, the task of processing speech into words is still quite difficult - they claim that error rates are in the range of 15-30%, which is quite high. However, the POMDP can handle this quite elegantly, as it is implicitly assumed that the model does not know what state it is actually in. Observations are used to inform the belief model, so it is a natural fit to maintain a distribution over possible states, given that we can't perform speech->text as effectively as we would like. Compared to a model without 'built-in' uncertainty, this is a big advantage, as the alternative model would require some serious modifications in order to handle the complexities of speech errors.

    (weakness) The paper identifies several weakness of POMDPs as applied to dialogue systems that are specific to the practicality of implementing a POMDP (i.e. the state-action space being too large, training is difficult as finding test users is not easy, etc.). However, I am not convinced that dialogue should be modeled by a Markovian model. At the beginning of the paper, the authors explicitly assume that "dialogue evolves as a Markov process" (p.1). Dialogue appears, at a glance, non-Markovian. Assuming the Markov Property for dialogue allows us to limit the state-action space, which is a significant "win", but I think that ultimately the model will be insufficient at modeling dialogue. My argument rests primarily on anecdotal evidence about how dialogue works - in general, human dialogue seems to rest on far more than simply the previous utterance. Despite the fact that the model described, and specifically, the 3-tuple formulation of each state, incorporates dialogue history, I do not think this is sufficient to model dialogue. I stumbled across a 2005 sigdial paper by some folks at microsoft: http://research.microsoft.com/en-us/um/people/timpaek/Papers/sigdial2005.pdf in which they analyze precisely this question, and conclude that the Markovian assumption does not benefit dialogue systems. I'm curious what other folks in the class think about this, or if I'm the only one (perhaps I will try to come with alternative assumptions one might make that limit the state-action space but fit more naturally with the structure of dialogue)

    ReplyDelete
    Replies
    1. I think the appropriateness of using a Markov model depends on how you define a state. If you define the state as only the current utterance, then I agree, the previous state/utterance does not give enough information. However, if, as in the Young paper, you include the dialog history is the state, then for me the Markov assumption seems more reasonable. I see this as similar to the approach described in the QUD paper from the guest lecture by Scott which relied on the idea of a common ground between discourse participants.

      Delete
    2. It makes me a little uncomfortable too, but so long as the history is encoded in the state I don't see any reason that it's *inaccurate*. It might contribute to the impracticability of the problem by blowing up the state space though.

      Delete
    3. As I understand it, including the entire history in state is a way to side-step the non-Markovian nature of dialog. If your state includes the history, then predictions on the next state can use information in the entire dialog history to inform the future. This makes their model only semi-Markovian. That is to say, it is Markovian in world state, but it is _not_ Markovian in dialog observations.

      Here's an idea: a lot of dialog though, seems Markovian _so long as you understand the context of when it's said_. (How true is this?) Compare to our models with motion. Motion is non-Markovian when one considers only {X, Y}. You need to consider the history in order to get velocity and acceleration and predict next steps. However, expanding our state-space to {X, Y, Vx, Vy, Ax, Ay} makes motion Markovian. What if a similar transformation could be applied to dialog? A dialog token is an observation that informs our understanding of the 'groundings under discussion', but not included as part of state in the future. (Alternatively, only consider recent dialog history, or, crazy idea: could you have _fractional_ membership in state?) This would be a huge step towards making their model more tractable, and I think a system could get very far with this simplification.

      Delete
    4. "Here's an idea: a lot of dialog though, seems Markovian _so long as you understand the context of when it's said_. (How true is this?) "

      "Context" is kind of vague here. Perhaps dialog is Markovian if the current QUD or the whole stack of QUDs is part of the state. But carrying around the entire QUD in the state is still kind of "semi-Markovian." This is because the participants might be discussing some sub-question and eventually they resolve it, and then it becomes fair game to discuss the outer question was introduced some amount of time beforehand. It seems like we can't have these kinds of nested discussions unless we have the whole QUD stack in the state which would be equivalent to having "the entire dialog history."

      Delete
    5. I think that human dialog relies on both environment and previous states. And that is included as the belief states in the POMDP. For the other elements, such as culture or things, that should also be included in the environmental elements in POMDP. Or worse, if they are not. POMDP is so complex, and accuracy might be trade-offs with computational convenience. The other models are worth to look into.

      Delete
  3. One strength of the POMDP representation is its ability to quickly and easily switch between most likely states, without error correction or backtracking. This allows for rapid self-correction to promote better and more accurate interactions. This ability is a direct result of the POMDP representation of multiple belief states, resulting from the distribution updates made on each new user input. As long as the number of belief states is sufficiently limited, this update and transition between most likely belief states should be reasonably rapid and make for better usability.
    This leads to one of the disadvantages of the POMDP representation. To make the problem tractable in real world applications, the number of belief states must be heavily pruned. The alternative is to store all belief states, which would make each update step slower, resulting in a much poorer user interaction, especially if the goal is for dialogue with a human, which is based off near immediate responses. Unfortunately, this heavy pruning could remove the true belief state, which would then prevent the rapid and automatic correction that seems to be such a major strength of the POMDP approach. This creates a major problem where you would need to potentially remove one of the major strengths of the POMDP approach to make it tractable in the real world. While this scenario is unlikely (since you prune out unlikely branches), it seems that the POMDP approach would have difficulty recovering a pruned branch that was the true state.

    ReplyDelete
  4. A major disadvantage of POMDPs, is the difficulty of solving them as the state-space increases. MDPs, and POMDPs especially suffer from the 'curse of dimensionality' because as the number of actions and states increases the number of states to be planned over increases exponentially. This is not promising for systems that desire real-time conversation to be practical. There are promising ways to limit the size of the spaces, and this paper shows that compressing the space makes underlying tradeoffs between detail and state-space size. This is a fundamental problem that will always follow the use of POMDPs. Increasing the vocabulary and dialogue history will probably continue to require tricks to limit the dimensionality issue.

    The advantage of a POMDP, is that language understanding in a POMDP doesn't have to be modeled heuristically, and instead probabilistically. As shown in this paper, the belief-state space can also incorporate user's intentions, which shows promise in creating dialogue agents that might better understand ambiguous sentences.

    ReplyDelete
  5. One advantage is that PMODP framework provides the ability to maintain a belief distribution over all states so that the system can effectively pursue all possible dialogue paths in parallel. It chooses its next action based on the probability distribution across all states but not the most likely state. There is no requirement for backtracking or specific error correction dialogues. The probability of the current most likely state is reduced when the user points out a better state or signals a problem, and the system updates the probabilities and prefers to another state with higher probability.

    However, as we learned in the previous paper about PMODP, PMODP requires memory of previous actions and observations to reduce the ambiguities of the states of the world. The state-action space of a real world is extremely large, and standard POMDP methods do not scale to the complexity needed to represent a real-world dialogue system. The number of states, actions and observations can each easily be very large even in a moderately-sized system. In the paper, it indicates the space complexity several times and it would be the most difficult issue for PMODP-based dialogue to solve. The paper uses the user’s goal, the intent of the most recent user utterance and the dialogue history to factor the sate, and it does significantly reduce the POMDP model complexity, but it still uses N-best approach and factored Bayesian Network approach to support tractable real-world systems. Although the space complexity is reduced in a certain way, but the accuracy and time complexity may become worse somehow.

    ReplyDelete
  6. Advantage:
    In order to make progress in dialogue, the system need to figure out user's goal which is hidden behind user's utterance. In addition, when operating in noisy environments, speech recognizer is likely to introduce errors, resulting in deviation between user's real utterance and result of speech recognition. Therefore, neither user's goal nor utterance is directly observable. POMDP is particularly suitable for this kind of problem: by using belief state tracking, POMDP is able to capture the uncertainty in observation and maintain a probability distribution of multiple possible states, thus eliminates the requirement of back-tracking when the current most probable state is reduced by new observations.

    Disadvantage:
    Due to the enormous size of belief state space, POMDPs do not scale well to the complexity of real-world problems. Since the aim of dialogue system is to interact with humans in real time, methods that approximate solutions for POMDPs need to be made. As indicated in the review paper, there are certain techniques that could be employed to reduce the size of state space, for example, factoring the state by making certain independence assumptions, or only keeping track of the most probable N states, etc.

    ReplyDelete
    Replies
    1. I was wondering how's the robot's understanding of the speaker's goal. If we use the ways to leave out some states, how will the robot's understanding change?

      Delete
  7. This paper lists several strengths of using a POMDP framework for modeling spoken dialogue systems. The most apparent of which is the ability of a POMDP to model uncertainty over states. In spoken dialogue systems the agent (or robot) is not fully aware of the user's intentions. The SDS only receives information (observations) in the form of sentences, and must infer the hidden current state. A POMDP is fully capable of doing so by calculating a distribution of belief states, given an observation and knowledge of the system's last action. Also, intuitively, it makes sense to model a dialogue using a MDP framework (provided the dialogue's history is kept track of). Here the markovian assumption follows from the logical structure of a typical dialogue. For example, a conversation in which the first user makes a comment on a phrase that was uttered several minutes ago by the second user would confuse the second user.

    There are also several negative aspects of using a POMDP framework for handling spoken dialogue systems. Most notably is the fact that the algorithms required to calculate solutions for the POMDP are intractable in many cases for a SDS. Because of this, in order to turn the SDS problem into a workable POMDB model, either simplifying assumptions need to be made or domain-specific knowledge has to be added. While it seems that SDS systems still achieve good results, the algorithms that are needed must be carefully implemented and in some sense hand-crafted to each situation.

    ReplyDelete
  8. The paper introduces three major advantages of the POMDP model. One of them is: by maintaining a belief distribution over all states, the system can easily pursuing all possible dialogue paths in parallel choosing its next action not based on the most likely state but on the probability distribution across all states. Comparing with other methods using back0tracking or specific error correction dialogues, this method allows powerful dialogue policies to applied.

    A POMDP is defined as a tuple consists of : states, actions, a transition probability, expected reward, observations , an observation probability, a geometric discount factor, an initial belief state. One disadvantage of POMDP is that the calculation of probability of next state based on last state and action is intractable. Thus approximations and tractable algorithems for performing belief monitoring and policy optimisation have been introduced to solve the problem. Approximations can be applied to policy and the spoken dialogue model. The approximation method in improving dialogue model parameter includes: expecation maximisation, expectation propagation, reinforcement learning. However as the approximation proportion increase, there may be trade-off of the accuracy of the algorithm. If the model maintaining many belief states, the calculation become intractable. If more belief states are neglected, the major effects of the POMDP will be weaken.

    ReplyDelete
  9. I think using a Markovian assumption for dialogue system seems useful in modelling the problem, as usual nothing is strictly Markovian. With this in mind the advantages of modelling the system as a POMDP are clear, with more noise robustness and ability to backtrack on a conversation like a human would. What makes this exciting is the ability to be robust even with the errors in speech processing taken into account.
    The disadvantages are many but they deal with how large the state space is, or how difficult it is to evaluate such a system, or how the provided heuristics to reduce state space can turn wrong. I guess any machine algorithm would look bad with high dimensional data so I am willing to look past most of these problems, as they can be improved upon. The only issue can be the markovian assumption, which I am comfortable making (right now,) but not sure if it is precise.

    ReplyDelete
    Replies
    1. I agree with Nakul on the saying that "nothing is strictly Markovian". So the POMDP based system should be able to deal with the noise comes from the real world and robust enough to produce good results. I also strongly agrees with the idea it can handle errors in speech recognition. SDS can improve the recognition of speech based on history information and future utterances.

      Delete
    2. I find it interesting when I see a paper using tricks like keeping track of dialogue history to enforce the markovian assumption. While theoretically, the model is still markovian I feel a bit bothered by the fact that the history from the "beginning of time" is something being passed along. This seems to fundamentally violate any markovian assumption that the model wishes to create.

      Delete
    3. Not sure if it violates it, just extends it in to something that is not ideal, we make the markovian assumption to make the problem solving easier, but storing histories means we make our problem harder again. I think the key issue is modelling our problem right with the POMDP idea, which requires intuition right now, but there has to be a way to just figure out what depth of history is ideal for a type of problem solving. In RL and ML they solve it using a forgetting factor in online domain.

      Delete
  10. A major advantage of the POMDP approach to modeling dialogue is that it can handle speech recognition errors more easily. One way it does this is that since the Markov model is based on belief states, uncertainty about the actual state you are in is a part of the model. Therefore, the fact that there is a relatively high probability of speech recognition errors is part of the model instead of an exogenous factor. Additionally, a POMDP is better able to handle corrections when an error is detected. Since it keeps a probability distribution of all states and not just the single actual/most likely state, it is easier to adjust probabilities of the next state with new information. The probability of being in each state is continuously updated with each new input, so that new information is readily incorporated in the model and improvements made in real time.
    On the other hand, a significant challenge to implementing a POMDP is how computationally expensive it can be. In practice, the SDS state space is extremely large, so that computing and storing information over the entire state space is impossible. As a result, approximations must be used. It is a real trade off between accuracy and efficiency in the algorithm. Nor example, using the N-best approach only the n-most likely states are updated and remembered. This greatly reduces computation time, but breaks down if the actual state is not in the list.

    ReplyDelete
    Replies
    1. The paper introduces an approximation to emerge the N-best approach and the factored Bayesian Network approach which only an N-best list of values in the factored model is updated. Yes, it would be a problem if the actual correct state is not one of the N-best states. As one of its advantages mentioned in the paper, when this problem happened, the user can signal the problem and the probability of the current most likely state is reduced and the focus simply switches to another state. If there’s no such correct state, the system still goes wrong. I think one solution may be, all the possible states for the action are stored somewhere, and it only compute the possibilities for N-best states. When the user points out the problem, it should read back all the states and re-compute again. Though the complexity of this computation would be very expensive, but it avoids infinite errors and no need to consider all the states for all actions but all states for only one action. And of course, the future computation will be more accurate so that the complexity will be less expensive.

      Delete
  11. I see one advantage of POMDPs as the fact that they are not trying to model language directly. In reading the Grice paper and thinking about language, it becomes more and more clear that language is incredibly mis-directional. Thus, it is important to have a model that treats the conveyed information as primary, rather than the language itself. While this is a major advantage, I do disagree with calling it the hidden state an 'interpretation' of the sentence. The word to me implies two things: that there is ambiguity in the content of what is said, and moreover, that the 'meaning' is secondary: an interpretation follows from an expression. The reality is we should be treating this expression as primary, where the natural language, tone of voice, the agent's model of its fellow interlocutor's behavior, world knowledge, the given context, cultural norms, and so on all evidence this expression that the fellow interlocutor wishes to convey.

    Numerous people have noted the disadvantage of state-space explosions. I see this is not a problem so much as a symptom of a poor world model. I run the risk of repeating myself for saying so often that POMDPs seem poorly suited to this task, but hopefully developed an interesting idea or two in the following discussion. Namely, models with state-transitions playing such a heavy role seem ill-suited to the problem of natural language understanding. It seems we can have state be one of two things (or both, I suppose). One is the expression the interlocutor wishes to convey, the second is the actual state of the environment. In the POMDP setting, the agent's actions influence state transitions. (Consider the transition function T: S, A -> S). In both these cases, the agent's actions do not serve to advance state in any way (with the exception of speech acts in the latter case). Both the former and (perhaps to a lesser extent?) the latter are more akin to a (cooperative) game of Mastermind, where the agent is attempting to divine (through natural language) the expression or arrangement of knowledge that the interlocutor wishes to convey.

    ReplyDelete
  12. Advantage:
    Modeling a belief state, rather than a state, allows for complex uncertainty. Those states themselves could be represented by a traditional MDP with a bigger state space (allowing for some discretization), but the more important part is that it is the default way to think about a POMDP. The observation model is also distinct, because belief monitoring happens after the action is chosen and the subsequent observation is made in the new state.

    The importance of modeling this uncertainty is that in the case of (inevitable) errors -- at the input (speech recognition) or understanding (disambiguation) stage -- the system can seamlessly switch to a different alternative. Indeed, much of what Young et al. focus on in evaluation is the superior performance of POMDP dialogue systems under uncertainty.

    Disadvantage:
    POMDPs seem to me to be a good candidate for understanding the state of a conversation -- except perhaps for their general intractability -- but the model doesn't tell you anything about how the state should be modeled in the first place. This seems like (one of several [1]) important sub-problems which might lead to interesting implications. Some maxims (in Grice's lexicon) are handled by the natural language generator. Others, I would say relation, should follow from the state of the conversation. Clearly the plain facts of a conversation, and so its state, will inform the natural language generation as well. But implementing relation depends on modeling the user's goals, and so searching for something to advance them. These are clearly outside the scope of natural language generation, and it would be interesting to see a framework for encoding the state of a conversation in light of conversational principles such as Grice's maxims.


    [1] Another important one: defining the reward function.

    ReplyDelete
  13. An advantage of the POMDP approach to dialogue is that it offers a flexible model that can be improved through additional knowledge and can be made robust to noisy communication and unclear statements by modeling uncertainty in the “belief state.” It avoids requiring people to construct large flow-charts that attempt to model an entire dialog. These flow-charts are brittle and it's near impossible for constructors to take into account all the different possible error states. The POMDP model with it's built in uncertainty is more likely to be able to recover from bad states and is able to adjust beliefs to account of noisy communication. For example a user could repeat a mis-understood command several times and the POMDP system should be able to adjust it's understanding of the command.

    A primary disadvantage of using POMDPs in particular to solve this problem is that it does not have a clear reward function. POMDPs are most commonly associated with solving games which have very clear rewards (i.e. “if you get to the end of the maze you get 5 points”). But dialogue does not have an obvious quantifiable reward. The paper mentions you could potentially ask users to state whether the system achieved the goal they wanted when entering the dialog but this is a low-resolution reward and users might be polite or have low standards and say it was successful when it actually might easily be improved. So once optimized with the right parameters its clear there are advantages to POMDPs, but given its lack of clear reward structure it's difficult to optimize the parameters in the first.

    ReplyDelete
    Replies
    1. (Re: disadvantage) This is a really interesting point. A legitimate contribution to dialogue at any point in time may often be a number of many possible utterances - as you said, there is typically no singular, unique, response that the robot should be striving for. Many specific applications of dialogue systems might allow for unique answers, as in the UBots that assembled IKEA furniture; when the bot realizes it is incapable of proceeding, it should ask for assistance in such a way that enables it to proceed. However, in most dialogue scenarios, there is not an obvious 'correct' response - any natural language response that corresponds to some set of maxims that govern what an appropriate response might be (Grice's maxims are a good start), perhaps augmented by a set of maxims specific to robot-human task-driven dialogue, seem to be legitimate (e.g. a maxim that encourages the robot to 'act'/'speak' in such a way that progress is made toward the task through some metric, be it knowledge acquisition or perhaps physical progress). Perhaps we could hypothesize about a set of maxims that would allow for a reasonable structure for the reward function for a POMDP applied to natural language systems.

      Delete
    2. Short note: many commercial assistance systems have "Was this answer helpful? Yes/No." at the end. For this one domain, this should suffice as a reward function. Of course isn't global, but as Dave notes, many domains, when considered specifically, do have rewards.

      Delete
    3. It seems like a large part of the reward calculation needs to be put on the user end. In real dialogue interactions, if someone asks for help and they are successfully helped, they give some quick form of acknowledgment such as "Ah, yes", "Thanks", etc. While if the goal was not achieved there is some sign of giving up or correction that the goal was not achieved. When I am asked for help on anything (TA hours, etc.), I'm not really sure I have helped until I receive some response, either positive or negative, until then I'm unsure of whether I have truly reached the user's goal state. That seems like a reasonable burden to put on the user, the trick is convincing users to treat dialogue with a machine similarly as dialogue with a human.

      Delete
    4. Young et at. did address this question at the end of the review paper, as well as the potential drawbacks of requesting users to provide feedback (false "yes" due to politeness, or false "no" because users are having unrealistic expectations, etc.) Requesting users to provide more detailed feedback (e.g., when uses answers "no", ask user to choose between "the information provided are too vague" and "the information provided are completely nonsense") might be more helpful, although it puts further burden on users.

      Delete
    5. I though we can use repetitions of a user's request can be a negative sign of a conversation or a situation if it does not end nicely such as hanging up in the middle of a conversion. It would be also nice humans can judge a part of the conversation being misleading and learned lists of conversations can be used to provide negative feedback to the conversation.

      Delete
    6. Young et. al mentions (in section V. A) that rewards can be inferred from inverse reinforcement learning by observing human-human conversations, and inferring rewards from them, this scheme would work well in inferring rewards and reward functions.

      Delete
  14. As there are still some error rate on speech recognition, an advantage of POMDP-based SDS is that it can take unreliability of the input into account and provide checking and recovery mechanisms. I did RoboCup@Home in the past and I encountered a problem just matches this situation. In a test, the robot should take several orders from a person and deliver drinks. The command is like "Give ". The person would say:
    "Give me the coke"
    "Give Michael the pepsi"
    And the robot would repeat the command to let the person confirm the commands. But due to the problem of speech recognition, the robot mistakenly recognized "Michael" as "Michel" which they are somewhat similar with each other. So it would say:
    "You said Michel want the pepsi, right?"
    What made it worse is that the person didn't hear it clearly and mistakenly took this as a correct repeat and replied "yes" to the robot. And he realized this and wanted to recover from this error. For a conventional deterministic flowchart-based system, it should contain this state and manually code this into the system. Unfortunately, I didn't think of this situation and there were no states dealing with the situations. So our robot failed on this single test. But for a POMDP-based dialog system, it is able to do error recovery like this by simply saying more words to the robot.

    The disadvantages I worry about it is the intractable of the POMDP states of applying it to the real world. It uses approximation techniques like N-best or Bayesian network to prune the number of states to make it tractable, but we don't know whether the needed states are pruned or not. For the test in RoboCup@Home, it needs quite precise response between robot and human.

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. The first advantage of using POMDP-based SDSs over conventional deterministic flowchart-based systems is that you can model belief spaces over spoken inputs which incorporates innate errors in speech recognition. Then, the entire state spaces are simultaneously considered to infer the next action. This provides robustness over errors created by speech recognizers in a real world setting. Since it does not need to backtack to previous conversations in a flowchart to correct the error.
    Since its POMDP, we can use reinforcement learning mechanisms to train policies given appropriate reward functions.
    One of the disadvantages in using POMDP-based systems is that histories of conversation is incorporated into states. Although adding histories of conversation into the state space was essential in defining the Markovian assumption for the system, this limits the use of past conversations in dialogues due to the POMDP’s innate intractability. If you try to incorporate complex conversations occurred in the past, the belief space will increase exponentially. You have to limit your length of conversation, or need to find an approximate algorithm to bring down the problem. Therefore, to increase the complexity of conversation, carefully designed training algorithm should exist. The paper mentions several solutions to this, however it is still on-going effort.
    The system also requires to model simulated users either because the learning speed is not fast enough to do online or it is simply too hard to recruit enough people to train the model. Ideally the system wants to learn by itself only interacting with real users.
    The other disadvantage is within modeling the reward function for reinforcement learning. In real world situations, it is not easy to extract reliable reward. Although, you let users to judge the entire conversation, they are often times not sincere in responding to this type of questions. The author of this paper suggest using biometric measures to infer users’ emotional states. However, the area of learning how to infer emotions from human’s biometric data itself is also on-going research which has not been fully established yet.

    ReplyDelete