CSCI 2951K: Topics in Grounded Language for Robotics: Dialog

Friday, October 18, 2013

Dialog

Our previous readings have largely focused on one-off or one-sided interactions. This week we will think about dialogue, where the agent and human interact continuously using language. Our first reading will be a philosophical paper by Paul Grice. The reading for Tuesday's class will be a review paper covering the POMDP approach to dialogue systems, along with a philosophical paper that has influenced a lot of the thinking about dialogue.

Young, S., Gašić, M., Thomson, B., & Williams, J. D. (2013). POMDP-based statistical spoken dialog systems: A review. Proceedings of IEEE. 101:5. pages 1160-1179.
Grice, H. Paul. Logic and conversation.(1975): 41-58.

For Thursday's class, we will read Adam Vogel's recent work about the emergence of Gricean maxims from DEC-POMDP dialogue models:

Adam Vogel, Max Bodoia, Christopher Potts, and Dan Jurafsky. Emergence of Gricean Maxims from Multi-Agent Decision Theory. NAACL 2013
Adam Vogel, Christopher Potts, and Dan Jurafsky. Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs.

Please post on the blog a comment of about 250 words pointing out one advantage and one disadvantage of the POMDP approach to dialogue. Post this response by 5pm on Sunday evening. Then by 7am Tuesday morning, post a response to someone else's comment on the blog.

32 comments:

Bowei (Brad) WangOctober 19, 2013 at 5:26 PM
Here is my elaboration as to one strength and one weakness of the POMDP approach to dialogue.

Let’s start with a strength. The POMDP’s use of belief states allows it to keep track of distributions over multiple states in parallel. Its ability to do this allows it to consider dialogue possibilities over multiple states at once. To demonstrate this strength, let us contrast it with a deterministic system where only the one state is assumed at each time step. If the user conversing with the system has a sudden change of topic, or pushes the dialogue in a direction that is better represented by another state not chosen by the system, the system would have to backtrack in its steps to see which system it should have chosen. With the stochastic approach that the POMDP uses, the probability distributions of all the states are always known. Therefore when the user reports of a problem, then the probability of the most likely state simply reduces, and a new state (now with a higher probability) is chosen. This is only made possible by tracking the belief distributions of all states in parallel.

A weakness of the POMDP is present when we apply the system to real world situations of great complexity. Modeling a real life scenario into a set of state-action matrices is not a trivial problem. If you model the world correctly by using many states and actions in your state-action space, then computation for POMDP becomes very complex and expensive. To exacerbate the problem, real life dialogue applications require real time responses (would not be much of a dialogue if one participant has to wait forever for a response). To by-pass this problem, one may need to use approximation techniques, such as the N-best or Bayesian network approaches presented in this paper.
ReplyDelete
Replies
dabelOctober 20, 2013 at 12:49 PM
They mention a few strengths, all of which seem compelling, but I'm not completely convinced that the POMDP is *the* model we should use for dialogue systems (see: weakness).

(strength) One feature of the POMDP that makes it a natural fit for a dialogue system is that the belief state effectively models the uncertainty inherent in natural (spoken) language. As the paper mentions, the task of processing speech into words is still quite difficult - they claim that error rates are in the range of 15-30%, which is quite high. However, the POMDP can handle this quite elegantly, as it is implicitly assumed that the model does not know what state it is actually in. Observations are used to inform the belief model, so it is a natural fit to maintain a distribution over possible states, given that we can't perform speech->text as effectively as we would like. Compared to a model without 'built-in' uncertainty, this is a big advantage, as the alternative model would require some serious modifications in order to handle the complexities of speech errors.

(weakness) The paper identifies several weakness of POMDPs as applied to dialogue systems that are specific to the practicality of implementing a POMDP (i.e. the state-action space being too large, training is difficult as finding test users is not easy, etc.). However, I am not convinced that dialogue should be modeled by a Markovian model. At the beginning of the paper, the authors explicitly assume that "dialogue evolves as a Markov process" (p.1). Dialogue appears, at a glance, non-Markovian. Assuming the Markov Property for dialogue allows us to limit the state-action space, which is a significant "win", but I think that ultimately the model will be insufficient at modeling dialogue. My argument rests primarily on anecdotal evidence about how dialogue works - in general, human dialogue seems to rest on far more than simply the previous utterance. Despite the fact that the model described, and specifically, the 3-tuple formulation of each state, incorporates dialogue history, I do not think this is sufficient to model dialogue. I stumbled across a 2005 sigdial paper by some folks at microsoft: http://research.microsoft.com/en-us/um/people/timpaek/Papers/sigdial2005.pdf in which they analyze precisely this question, and conclude that the Markovian assumption does not benefit dialogue systems. I'm curious what other folks in the class think about this, or if I'm the only one (perhaps I will try to come with alternative assumptions one might make that limit the state-action space but fit more naturally with the structure of dialogue)
ReplyDelete
Replies
UnknownOctober 20, 2013 at 1:02 PM
One strength of the POMDP representation is its ability to quickly and easily switch between most likely states, without error correction or backtracking. This allows for rapid self-correction to promote better and more accurate interactions. This ability is a direct result of the POMDP representation of multiple belief states, resulting from the distribution updates made on each new user input. As long as the number of belief states is sufficiently limited, this update and transition between most likely belief states should be reasonably rapid and make for better usability.
This leads to one of the disadvantages of the POMDP representation. To make the problem tractable in real world applications, the number of belief states must be heavily pruned. The alternative is to store all belief states, which would make each update step slower, resulting in a much poorer user interaction, especially if the goal is for dialogue with a human, which is based off near immediate responses. Unfortunately, this heavy pruning could remove the true belief state, which would then prevent the rapid and automatic correction that seems to be such a major strength of the POMDP approach. This creates a major problem where you would need to potentially remove one of the major strengths of the POMDP approach to make it tractable in the real world. While this scenario is unlikely (since you prune out unlikely branches), it seems that the POMDP approach would have difficulty recovering a pruned branch that was the true state.
ReplyDelete
Replies
BrawnerOctober 20, 2013 at 1:34 PM
A major disadvantage of POMDPs, is the difficulty of solving them as the state-space increases. MDPs, and POMDPs especially suffer from the 'curse of dimensionality' because as the number of actions and states increases the number of states to be planned over increases exponentially. This is not promising for systems that desire real-time conversation to be practical. There are promising ways to limit the size of the spaces, and this paper shows that compressing the space makes underlying tradeoffs between detail and state-space size. This is a fundamental problem that will always follow the use of POMDPs. Increasing the vocabulary and dialogue history will probably continue to require tricks to limit the dimensionality issue.

The advantage of a POMDP, is that language understanding in a POMDP doesn't have to be modeled heuristically, and instead probabilistically. As shown in this paper, the belief-state space can also incorporate user's intentions, which shows promise in creating dialogue agents that might better understand ambiguous sentences.
ReplyDelete
Replies
Lixing LianOctober 20, 2013 at 1:47 PM
One advantage is that PMODP framework provides the ability to maintain a belief distribution over all states so that the system can effectively pursue all possible dialogue paths in parallel. It chooses its next action based on the probability distribution across all states but not the most likely state. There is no requirement for backtracking or specific error correction dialogues. The probability of the current most likely state is reduced when the user points out a better state or signals a problem, and the system updates the probabilities and prefers to another state with higher probability.

However, as we learned in the previous paper about PMODP, PMODP requires memory of previous actions and observations to reduce the ambiguities of the states of the world. The state-action space of a real world is extremely large, and standard POMDP methods do not scale to the complexity needed to represent a real-world dialogue system. The number of states, actions and observations can each easily be very large even in a moderately-sized system. In the paper, it indicates the space complexity several times and it would be the most difficult issue for PMODP-based dialogue to solve. The paper uses the user’s goal, the intent of the most recent user utterance and the dialogue history to factor the sate, and it does significantly reduce the POMDP model complexity, but it still uses N-best approach and factored Bayesian Network approach to support tractable real-world systems. Although the space complexity is reduced in a certain way, but the accuracy and time complexity may become worse somehow.
ReplyDelete
Replies
UnknownOctober 20, 2013 at 2:20 PM
Advantage:
In order to make progress in dialogue, the system need to figure out user's goal which is hidden behind user's utterance. In addition, when operating in noisy environments, speech recognizer is likely to introduce errors, resulting in deviation between user's real utterance and result of speech recognition. Therefore, neither user's goal nor utterance is directly observable. POMDP is particularly suitable for this kind of problem: by using belief state tracking, POMDP is able to capture the uncertainty in observation and maintain a probability distribution of multiple possible states, thus eliminates the requirement of back-tracking when the current most probable state is reduced by new observations.

Disadvantage:
Due to the enormous size of belief state space, POMDPs do not scale well to the complexity of real-world problems. Since the aim of dialogue system is to interact with humans in real time, methods that approximate solutions for POMDPs need to be made. As indicated in the review paper, there are certain techniques that could be employed to reduce the size of state space, for example, factoring the state by making certain independence assumptions, or only keeping track of the most probable N states, etc.
ReplyDelete
Replies
UnknownOctober 20, 2013 at 2:25 PM
This paper lists several strengths of using a POMDP framework for modeling spoken dialogue systems. The most apparent of which is the ability of a POMDP to model uncertainty over states. In spoken dialogue systems the agent (or robot) is not fully aware of the user's intentions. The SDS only receives information (observations) in the form of sentences, and must infer the hidden current state. A POMDP is fully capable of doing so by calculating a distribution of belief states, given an observation and knowledge of the system's last action. Also, intuitively, it makes sense to model a dialogue using a MDP framework (provided the dialogue's history is kept track of). Here the markovian assumption follows from the logical structure of a typical dialogue. For example, a conversation in which the first user makes a comment on a phrase that was uttered several minutes ago by the second user would confuse the second user.

There are also several negative aspects of using a POMDP framework for handling spoken dialogue systems. Most notably is the fact that the algorithms required to calculate solutions for the POMDP are intractable in many cases for a SDS. Because of this, in order to turn the SDS problem into a workable POMDB model, either simplifying assumptions need to be made or domain-specific knowledge has to be added. While it seems that SDS systems still achieve good results, the algorithms that are needed must be carefully implemented and in some sense hand-crafted to each situation.
ReplyDelete
Replies
UnknownOctober 20, 2013 at 2:47 PM
The paper introduces three major advantages of the POMDP model. One of them is: by maintaining a belief distribution over all states, the system can easily pursuing all possible dialogue paths in parallel choosing its next action not based on the most likely state but on the probability distribution across all states. Comparing with other methods using back0tracking or specific error correction dialogues, this method allows powerful dialogue policies to applied.

A POMDP is defined as a tuple consists of : states, actions, a transition probability, expected reward, observations , an observation probability, a geometric discount factor, an initial belief state. One disadvantage of POMDP is that the calculation of probability of next state based on last state and action is intractable. Thus approximations and tractable algorithems for performing belief monitoring and policy optimisation have been introduced to solve the problem. Approximations can be applied to policy and the spoken dialogue model. The approximation method in improving dialogue model parameter includes: expecation maximisation, expectation propagation, reinforcement learning. However as the approximation proportion increase, there may be trade-off of the accuracy of the algorithm. If the model maintaining many belief states, the calculation become intractable. If more belief states are neglected, the major effects of the POMDP will be weaken.
ReplyDelete
Replies
NGOctober 20, 2013 at 3:12 PM
I think using a Markovian assumption for dialogue system seems useful in modelling the problem, as usual nothing is strictly Markovian. With this in mind the advantages of modelling the system as a POMDP are clear, with more noise robustness and ability to backtrack on a conversation like a human would. What makes this exciting is the ability to be robust even with the errors in speech processing taken into account.
The disadvantages are many but they deal with how large the state space is, or how difficult it is to evaluate such a system, or how the provided heuristics to reduce state space can turn wrong. I guess any machine algorithm would look bad with high dimensional data so I am willing to look past most of these problems, as they can be improved upon. The only issue can be the markovian assumption, which I am comfortable making (right now,) but not sure if it is precise.
ReplyDelete
Replies
UnknownOctober 20, 2013 at 7:58 PM
A major advantage of the POMDP approach to modeling dialogue is that it can handle speech recognition errors more easily. One way it does this is that since the Markov model is based on belief states, uncertainty about the actual state you are in is a part of the model. Therefore, the fact that there is a relatively high probability of speech recognition errors is part of the model instead of an exogenous factor. Additionally, a POMDP is better able to handle corrections when an error is detected. Since it keeps a probability distribution of all states and not just the single actual/most likely state, it is easier to adjust probabilities of the next state with new information. The probability of being in each state is continuously updated with each new input, so that new information is readily incorporated in the model and improvements made in real time.
On the other hand, a significant challenge to implementing a POMDP is how computationally expensive it can be. In practice, the SDS state space is extremely large, so that computing and storing information over the entire state space is impossible. As a result, approximations must be used. It is a real trade off between accuracy and efficiency in the algorithm. Nor example, using the N-best approach only the n-most likely states are updated and remembered. This greatly reduces computation time, but breaks down if the actual state is not in the list.
ReplyDelete
Replies
Kurt SpindlerOctober 20, 2013 at 9:22 PM
I see one advantage of POMDPs as the fact that they are not trying to model language directly. In reading the Grice paper and thinking about language, it becomes more and more clear that language is incredibly mis-directional. Thus, it is important to have a model that treats the conveyed information as primary, rather than the language itself. While this is a major advantage, I do disagree with calling it the hidden state an 'interpretation' of the sentence. The word to me implies two things: that there is ambiguity in the content of what is said, and moreover, that the 'meaning' is secondary: an interpretation follows from an expression. The reality is we should be treating this expression as primary, where the natural language, tone of voice, the agent's model of its fellow interlocutor's behavior, world knowledge, the given context, cultural norms, and so on all evidence this expression that the fellow interlocutor wishes to convey.

Numerous people have noted the disadvantage of state-space explosions. I see this is not a problem so much as a symptom of a poor world model. I run the risk of repeating myself for saying so often that POMDPs seem poorly suited to this task, but hopefully developed an interesting idea or two in the following discussion. Namely, models with state-transitions playing such a heavy role seem ill-suited to the problem of natural language understanding. It seems we can have state be one of two things (or both, I suppose). One is the expression the interlocutor wishes to convey, the second is the actual state of the environment. In the POMDP setting, the agent's actions influence state transitions. (Consider the transition function T: S, A -> S). In both these cases, the agent's actions do not serve to advance state in any way (with the exception of speech acts in the latter case). Both the former and (perhaps to a lesser extent?) the latter are more akin to a (cooperative) game of Mastermind, where the agent is attempting to divine (through natural language) the expression or arrangement of knowledge that the interlocutor wishes to convey.
ReplyDelete
Replies
UnknownOctober 20, 2013 at 10:15 PM
Advantage:
Modeling a belief state, rather than a state, allows for complex uncertainty. Those states themselves could be represented by a traditional MDP with a bigger state space (allowing for some discretization), but the more important part is that it is the default way to think about a POMDP. The observation model is also distinct, because belief monitoring happens after the action is chosen and the subsequent observation is made in the new state.

The importance of modeling this uncertainty is that in the case of (inevitable) errors -- at the input (speech recognition) or understanding (disambiguation) stage -- the system can seamlessly switch to a different alternative. Indeed, much of what Young et al. focus on in evaluation is the superior performance of POMDP dialogue systems under uncertainty.

Disadvantage:
POMDPs seem to me to be a good candidate for understanding the state of a conversation -- except perhaps for their general intractability -- but the model doesn't tell you anything about how the state should be modeled in the first place. This seems like (one of several [1]) important sub-problems which might lead to interesting implications. Some maxims (in Grice's lexicon) are handled by the natural language generator. Others, I would say relation, should follow from the state of the conversation. Clearly the plain facts of a conversation, and so its state, will inform the natural language generation as well. But implementing relation depends on modeling the user's goals, and so searching for something to advance them. These are clearly outside the scope of natural language generation, and it would be interesting to see a framework for encoding the state of a conversation in light of conversational principles such as Grice's maxims.

[1] Another important one: defining the reward function.
ReplyDelete
Replies
akovacsOctober 21, 2013 at 8:44 AM
An advantage of the POMDP approach to dialogue is that it offers a flexible model that can be improved through additional knowledge and can be made robust to noisy communication and unclear statements by modeling uncertainty in the “belief state.” It avoids requiring people to construct large flow-charts that attempt to model an entire dialog. These flow-charts are brittle and it's near impossible for constructors to take into account all the different possible error states. The POMDP model with it's built in uncertainty is more likely to be able to recover from bad states and is able to adjust beliefs to account of noisy communication. For example a user could repeat a mis-understood command several times and the POMDP system should be able to adjust it's understanding of the command.

A primary disadvantage of using POMDPs in particular to solve this problem is that it does not have a clear reward function. POMDPs are most commonly associated with solving games which have very clear rewards (i.e. “if you get to the end of the maze you get 5 points”). But dialogue does not have an obvious quantifiable reward. The paper mentions you could potentially ask users to state whether the system achieved the goal they wanted when entering the dialog but this is a low-resolution reward and users might be polite or have low standards and say it was successful when it actually might easily be improved. So once optimized with the right parameters its clear there are advantages to POMDPs, but given its lack of clear reward structure it's difficult to optimize the parameters in the first.
ReplyDelete
Replies
UnknownOctober 21, 2013 at 10:59 PM
As there are still some error rate on speech recognition, an advantage of POMDP-based SDS is that it can take unreliability of the input into account and provide checking and recovery mechanisms. I did RoboCup@Home in the past and I encountered a problem just matches this situation. In a test, the robot should take several orders from a person and deliver drinks. The command is like "Give ". The person would say:
"Give me the coke"
"Give Michael the pepsi"
And the robot would repeat the command to let the person confirm the commands. But due to the problem of speech recognition, the robot mistakenly recognized "Michael" as "Michel" which they are somewhat similar with each other. So it would say:
"You said Michel want the pepsi, right?"
What made it worse is that the person didn't hear it clearly and mistakenly took this as a correct repeat and replied "yes" to the robot. And he realized this and wanted to recover from this error. For a conventional deterministic flowchart-based system, it should contain this state and manually code this into the system. Unfortunately, I didn't think of this situation and there were no states dealing with the situations. So our robot failed on this single test. But for a POMDP-based dialog system, it is able to do error recovery like this by simply saying more words to the robot.

The disadvantages I worry about it is the intractable of the POMDP states of applying it to the real world. It uses approximation techniques like N-best or Bayesian network to prune the number of states to make it tractable, but we don't know whether the needed states are pruned or not. For the test in RoboCup@Home, it needs quite precise response between robot and human.
ReplyDelete
Replies
Jun Ki LeeOctober 22, 2013 at 4:54 AM
This comment has been removed by the author.
ReplyDelete
Replies
Jun Ki LeeOctober 22, 2013 at 4:55 AM
The first advantage of using POMDP-based SDSs over conventional deterministic flowchart-based systems is that you can model belief spaces over spoken inputs which incorporates innate errors in speech recognition. Then, the entire state spaces are simultaneously considered to infer the next action. This provides robustness over errors created by speech recognizers in a real world setting. Since it does not need to backtack to previous conversations in a flowchart to correct the error.
Since its POMDP, we can use reinforcement learning mechanisms to train policies given appropriate reward functions.
One of the disadvantages in using POMDP-based systems is that histories of conversation is incorporated into states. Although adding histories of conversation into the state space was essential in defining the Markovian assumption for the system, this limits the use of past conversations in dialogues due to the POMDP’s innate intractability. If you try to incorporate complex conversations occurred in the past, the belief space will increase exponentially. You have to limit your length of conversation, or need to find an approximate algorithm to bring down the problem. Therefore, to increase the complexity of conversation, carefully designed training algorithm should exist. The paper mentions several solutions to this, however it is still on-going effort.
The system also requires to model simulated users either because the learning speed is not fast enough to do online or it is simply too hard to recruit enough people to train the model. Ideally the system wants to learn by itself only interacting with real users.
The other disadvantage is within modeling the reward function for reinforcement learning. In real world situations, it is not easy to extract reliable reward. Although, you let users to judge the entire conversation, they are often times not sincere in responding to this type of questions. The author of this paper suggest using biometric measures to infer users’ emotional states. However, the area of learning how to infer emotions from human’s biometric data itself is also on-going research which has not been fully established yet.
ReplyDelete
Replies

Add comment

Pages

Friday, October 18, 2013

Dialog

32 comments: