CSCI 2951K: Topics in Grounded Language for Robotics: Language and Gesture

Tuesday, November 12, 2013

Language and Gesture

There is a lot going on in the reading for this week. As you read the paper, focus on understanding what is given to the algorithm and what is being inferred by the algorithm. What is the input and what is the output? What are the analogs to the input and output in the language understanding domain?

Write a blog comment of about 200 words answering these questions. Please post it by 7am on Thursday morning.

16 comments:

UnknownNovember 13, 2013 at 1:19 PM
In “Generating Legible Motion,” given an input goal, a robot should perform an action that expresses its intention of its action legibly. Generating legible actions is similar to generating specific help requests in Tellex et al. When robots get stuck and want help from humans, they should generate help messages to humans. Generic requests like “help me” are not legible in that they don’t convey much information to humans. On the other hand, the message like “give the robot the white leg on the table” is legible because humans can understand what robots want easily.

In “Legibility and Predictability of Robot Motion,” Dragan et al. also introduces predictability of robot actions. Given a goal that is known by other robots or humans, a robot should perform an action most efficiently to achieve the goal. Generating very succinct sentences is similar to performing predictable actions in the language generation domain. Some ambiguities in the sentence can be cleared by prior information about the goal.
ReplyDelete
Replies
akovacsNovember 13, 2013 at 8:02 PM
In the Tellex paper the input to the language generation is an action the robot would like to perform but can't involving a verb and an object and the output is sentence considered most likely to correspond to the physical objects and actions involved. This is done by searching over sentences and evaluating them using the grounding model to find the most probable one given a model of how observers understand grounding.

In the Dragan paper the input is given a target in space output a trajectory that is “legible” which is to say that makes it's goal as clear as possible as early as possible to a human user. This is done by iteratively improving a trajectory to increase its legibility according to the model generated in the paper which attempts to model observers.

The methods are quite parallel in that they both begin with the process of modeling how an observer would reason from some concrete thing like a spoken command or a movement trajectory to an abstract concept like a grounding or a legibility value. Having built this model of the observer they use it to do the reverse which is to start with the abstract and create the concrete by trying to maximize its value to the observer.
ReplyDelete
Replies
BrawnerNovember 13, 2013 at 9:21 PM
The papers "Legibility and training robot motion" and "Generating legible motion" deal demonstrating grasps of objects in a manner that are predictable to a human observer.

The input are solely grasps to be made of a bottle. The output is an arm movement that not only picks up the bottle, but conveys adequately the object which it intends to pick up.

The parallels between this and language understanding are quite clear. One deals with conveying understanding to a human partner through spoken language, and the other through body language. The studies in both papers attempt to address how well the communication was understood.
ReplyDelete
Replies
UnknownNovember 13, 2013 at 9:31 PM
The “Legibility and Predictability” paper proposed a model for robots to evaluate how legible a motion trajectory is. The paper shows that legibility and predictability are contradictory. As we are learning legible motion, we only talk about legibility here. The input is a part of known trajectory. The output is the goal infered by the observer. The method described in the paper adds human’s interpretation of actions as input. And select the gola that gives the max probability for the trajectory. The method uses a cost function so that when the cost is minimum, the goal is the most likely. The analog for the input in language understanding is that a robot asks for a table leg. The output is the human worker selects its intent from several goals.

The “Generating Legible Motion” paper describes a way to generating legible motion based on a functional gradient optimization technique. The input contains the initial trajectory that we improve its score by later iteration. The trajectory is parametized as a vectory of waypoint configurations. The output is the expected cost c that we generate from users’ expectation. The functional gradient ascent makes the process very fast. The analog for the input in language understanding is that the robot asks for a table leg, when it adds more details to the leg, such as “white”, the score of legible increases.
ReplyDelete
Replies
dabelNovember 13, 2013 at 9:36 PM
In "Legibility and Predictability of Robot Motion", Dragan, Lee, and Srinivasa formalize the concepts of legibility and predictability in the domain of robotic motion. Additionally, they provide a means of evaluating the degree to which a particular grasp motion is either legible or predictable. There isn't an obviously outlined algorithm in this paper - the robot is given a cost function C, which is intended to model "what the observer expects the robot to optimize", as well as a goal, G. The paper discusses two separate motion evaluation/generation methods - one for legible motion, and one for predictable motion (these are effectively the 'output' in this paper).

In "Generating Legible Motion", the formalizations of legibility and predictability are expanded upon - here, we get full blown methods of generating legible actions (and in the constrained optimization case, at least bearably predictable, too). The input is a goal, and a model of how "(in)efficient" trajectories are, and the output is a *legible* grasping motion (w.r.t G, the goal, and the observer, accounted for by C).

The parallel at a high level is that the goal of legibility targets performing actions that are easily and quickly understood by an observer, something which is also sought after in language generation. The means of evaluating a particular motion's legibility is quite similar to evaluating how effective a language generation system was in communicating information.
ReplyDelete
Replies
UnknownNovember 13, 2013 at 9:51 PM
In their two papers, Dragen et al. focus on the distinction between legible and predictable actions for human-robot collaboration. Legibility is a measure of how easy it is to determine the robot’s goal, and predictability is how closely the action matches with the human’s expectation. Within a certain space of acceptable predictability, they show that maximizing legibility provides the best results. The input given is the environment and a task, and the output is the action or path that optimizes efficiency for collaboration. For the language understanding domain, this is analogous to obeying the Gricean Maximes.
ReplyDelete
Replies
UnknownNovember 13, 2013 at 9:58 PM
Tellex: takes a request for help in symbolic form (some set of unfulfilled
predicates on the state) and outputs natural language / a sentence. It infers
the most probable sentence by estimating the parameters most likely to have
the correct groundings.

Dragan: takes a goal (or I think an initial trajectory) and outputs a refined
"legible" path subject to a certain trust region.

The analogous structure appears to me to be the tradeoff between what they call
legibility and predictability. A predictable motion merely achieves the
goal, but a legible motion is achieves the goal and in the process communicates
it to an observer. Similarly, in the language domain, the S1 metric leads to
phrases which describe the desired object without regard for *communicating* it.
The S2 metric, like the legible motion, achieves the goal (description in the
language case) while also considering its communication with a model of human
language understanding.
ReplyDelete
Replies
UnknownNovember 13, 2013 at 10:30 PM
In Dragan’s paper, the goal is to infer expected motion given a goal (for predictability) and infer the goal from segments of motion (for legibility). The predictability algorithm is given a goal and cost function as input, and produces movement from the start to the goal that minimizes the cost function. The legibility algorithm takes a portion of a path and a cost function and produces a most likely goal. This shows stark similarities with the Inverse Semantics paper, with predictability mapping to determining the most likely action given a sentence and legibility relating to producing a sentence that is most likely to yield the desired action. This highlights the difficulty in producing a request that is both easy to generate and easy to understand, with requests such as “Help me” in the Tellex et al. paper seeming remarkably similar to the robot reaching directly for a goal in the example. The request was easy to produce, but the inferring the desired result is difficult. Tellex et al’s paper strikes a balance between legibility and predictability in the domain of language generation.
ReplyDelete
Replies
UnknownNovember 13, 2013 at 11:36 PM
Legibility and predictability of motion:
The inputs are the ease with which an observer can accurate guess the actual goal of an action. The output is the path which the robot should take to appear more predictable. In relation to the language understanding domain, or more specifically the study where robots asked human subjects for table legs, things can be analogized. Analogously, the input is how well a human can understand a robot’s request for help while the output is the ability of the human to correctly give help.
Generating legible motion:
The algorithm focuses around creating motion trajectories which are more understandable than predictable. This means balancing inputs around predictability and efficiency. The output is a motion trajectory which can be followed by the robot. The analogous input and output to the language: understanding domain is the same as before, testing how well human observers are able to understand and fulfill requests.
ReplyDelete
Replies
NGNovember 14, 2013 at 12:13 AM
Both the Dragen et. al paper about generating legible motion and Tellex et al. work for asking help using inverse semantics look at improving actions such that the user is "less confused" or (less uncertain) by their actions. Both papers model this uncertainty using a probabilistic model of human and their understanding of trajectory or language actions. For the Dragan et al. work the input can be a "predictable" trajectory for a task and legibility functional (that can be optimized) to generate a legible trajectory. The authors also make sure that the function is not over optimized such that the trajectories become unintuitive, such that trajectories are optimized only at locations/ way points where the predictability is high enough.
ReplyDelete
Replies
UnknownNovember 14, 2013 at 1:31 AM
The analogy between the predictability/legibility of an action and the
understandability/generation of natural langauge is strong.

Predictability: The papers define a predictable action as one which could be inferred
by an agent (human or robotic) from foreknowledge of a goal and the set of actions
available. A predictable action is one which is rational in the perspective
of the goal -- which minimizes the cost (by whatever metric... the papers suggest
time to completion or length of path) and completes the goal. This corresponds
directly to understanding natural language. The grounding (understanding) function
maximizes the likelihood of the understanding based upon foreknowledge of the
command stated. This foreknowledge is the imporant part.

Legibility: The papers make a point to contrast *legibility* with *predictability* where
a legible action is one in which the goal (not known a priori) can be inferred
simply from observation of the action. An agent trying to generate a legible action
faces a very similar task to one trying to generate natural language. The language
or action chosen for legibility should be one which maximizes the likelihood that
the recipient of the communication (be it verbal or physical) will properly infer
the information meant to be transmitted.

The insights of these papers seem to ring true, also, in that human-human communication
is a mix of verbal and nonverbal "utterances". It should then follow that analogous concepts govern the
creation and understanding of goal-driven communication no matter whether the language
be spoken or acted.

ReplyDelete
Replies
Jun Ki LeeNovember 14, 2013 at 4:42 AM
In “Legibility and Predictability of Robot Motion”, the authors constructed legibility and predictability models on robot arm manipulation. They used to score how legible and how predictable a given trajectory of arm manipulation is. They conducted a user study to prove how well the models fits to real humans’ perception.
In “Generating Legible Motion”, beyond creating metrics for measuring the two properties of motion, the authors tried to create a trajectory based on those properties. They take existing a CHOMP algorithm for motion planning and modify the algorithm to use legibility as a optimization criterion.
In comparison with a language generation scenario in Tellex’s “Asking for help with inverse semantics” paper, legibility in natural language compared to motion can be explained as a person’s ability (judgement) to infer the goal state given the text. The goal state can be a desired action to perform given the text. In case of motion, the legibility was a person’s judgement in inferring the goal position given a current trajectory of a motion.
In Tellex et. al’s work, The first generation metric (S1) which only models the environment was penalizing the length of a generating text. S2 took an inversed approach that the more the text explains (the length gets longer), the more you score in the S2 metric. This can be seen as analogous to predictability and legibility metrics which penalizes and does not penalize the length of a trajectory respectively.
ReplyDelete
Replies
UnknownNovember 14, 2013 at 4:42 AM
In legibility and Predictability of Robotic Motion, authors develop a formalism to mathematically define and distinguish predictability and legibility of motion. for legibility, the input of algorithm is the trajectory, the output of the algorithm is the goal which observer can inferred by using those trajectory. For analogue understanding, the input is such as a series of action of grabbing a apple, the output should be human not only grab the apple to robot,but also grab the right one.

In "Generating Legible Motion", authors's goal is to make robot to generate legible motion. The input of algorithm is a Hilbert space of trajectories, the output is the cost C which lower costs means more efficient trajectories. For analogue understanding, the input is trajectories and the parameter β which is identifiable by its effect on legibility as measured with users and the output should be an legible motion.
ReplyDelete
Replies
UnknownNovember 14, 2013 at 5:31 AM
A little late, but... The Dragan papers attempt to do something similar to the Tellex paper: show that a robot can cooperate with a human by modeling its actions based on how a human would understand them. It's an important observation, though it's pretty much the accepted wisdom of a few generations of dance and mime teachers. The authors do come up with a formalism to describe the difference between a direct action and one intended to be understood by others ("predictable" and "legible"). But the domain they are working in is quite constrained and somewhat artificial. It seems probable that the formalism will not work well in an environment or situation with fewer constraints. Those dance and mime teachers would have a lot to say on the subject of making an expressive gesture, and very little of it is captured by the simple trajectory of the hand reaching out to grasp something.

There is a tradition, in those movement-oriented theatre disciplines, of mask work, where you remove the expressive capacity of the face from the equation. The idea is to get the dancer to think about exactly the problem of how to make the arms, legs, and body express an intention without gaze or facial expression. What you learn is that trajectory is certainly part of the solution, but it is far from being all of it, perhaps not even the dominant part.

The approaches do all rely on the dancer/mime/actor/whatever being able to understand how another person would view his or her actions. Maybe that's the contribution of these papers: to acknowledge that there are ways of modeling just a small portion of a companion's understanding of your words and actions, rather than insisting that your companion have a complete understanding of human nature. If you can get useful improvements from an inadequate model, who's to say the model is inadequate?
ReplyDelete
Replies
UnknownNovember 14, 2013 at 6:16 AM
In legibility and Predictability of Robotic Motion, the author formalizes legibility and predictability in the context of motion. The algorithm is going to learn cost function and minimize it as well as maximizing the predictability or legibility score. And the input is the goal in the grasping environment and the output is trajectories that minimize the cost function.
In Legible Motion paper, Dragan et al, proposes an algorithm that generates legible motion which is intent expressive. The input of the algorithm is an initial trajectory and this algorithm iteratively improve its score by functional optimization of legibility.
So the analog to Tellex's inverse semantics paper is that these approaches are all trying to model the environment and the observer in order to improve trajectories or help people get more clear about the help.
ReplyDelete
Replies
Kurt SpindlerNovember 14, 2013 at 7:07 AM
Given is a starting state S and a goal G_R within a Hilbert trajectory space, and inferred is a trajectory from S - G-R. This inference is achieved by scoring against a cost function (distance being the most significant component thereof), and then that result is compared to a cost function that combines the classical cost and the probability of correct inference by a viewer.

The analog to the language generation domain:
Given some desired goal G (e.g. table leg in hand), output the text that communicates that goal. (e.g. please put the table leg in the robots hand). In the Tellex paper, this inference is most successfully achieved with the exact parallel of the 'legible' metric - find the sentence that is least ambiguously understood by the listener.
ReplyDelete
Replies

Add comment

Pages

Tuesday, November 12, 2013

Language and Gesture

16 comments: