CSCI 2951K: Topics in Grounded Language for Robotics: Applications

Friday, November 15, 2013

Applications

This week the class will focus on applications. We will read three papers, one from the supply/warehouse environment, one from a factory floor environment, and one from a household environment. Each paper describes a robot designed for the corresponding environment. These papers give a sense of the complexity of an end-to-end robotic system, and also some visions of the ways in which robots will impact the world.

11/19/13 (Tues.)

S. Teller, Matthew R. Walter, M. Antone, A. Correa, R. Davis, L. Fletcher, E. Frazzoli, J. Glass, J.P. How, A.S. Huang, J.H. Jeon, S. Karaman, B. Luders, N. Roy, and T. Sainath. A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments. In Proc. IEEE Int’l Conf. on Robotics and Automation (ICRA), pages 526–533, 2010.
R. A. Knepper, T. Layton, J. Romanishin, and D. Rus. IkeaBot: An autonomous multi-robot coordinated furniture assembly system. In Proc. IEEE Int’l Conf. on Robotics and Automation (ICRA), Karlsruhe, Germany, May 2013.

11/21/13 (Thurs.)

M. Bollini, S. Tellex, T. Thompson, N. Roy, and D. Rus. Interpreting and executing recipes with a cooking robot. In 13th International Symposium on Experimental Robotics, 2012.
Jeremy Maitin-Shepard, Marco Cusumano-Towner, Jinna Lei and Pieter Abbeel. Cloth Grasp Point Detection based on Multiple-View Geometric Cues with Application to Robotic Towel Folding. In the proceedings of the International Conference on Robotics and Automation (ICRA), 2010.

By Sunday night at 5pm, post a comment discussing how language can fit into these systems. Try to identify a problem that language input or output could solve. Then discuss a technique you might use to solve it.

By Monday night at 5pm, post a reply to someone else's comment. Ask a question; expand on an idea; suggest a related citation.

39 comments:

UnknownNovember 17, 2013 at 11:27 AM
In IkeaBot, authors use a new object oriented language, ABPL for representing symbolic planing problems. IkeaBot's planning model uses the ABPL for its raw input and output data. The symbolic planing can discover the preconditions of each action are satisfied and also the postcondition of future actions. Symbolic planner only relies on the blueprint which is generated by geometric preplanner. ABPL is based on the object oriented approach to data structure design. It allows the user to organize objects within the environment rather than express every element of OOP structures (PDDL). ABPL allows user to define object's type which can be assigned properties and sub objects. One problem which can meet when input language is that such as when we want robot to assemble a chair and the chair has four legs. Those legs can be put in any sequence. One way to solve this problem is using "Groups" in ABPL. Groups can let the planner to reason one type of symmetry more efficiently. After referencing a member of the group, the planner will know the rule can also be suitable for other group member.

In Seth Teller's paper, they describe the development of a multi-ton robotic forklift. The communication between people and robot in this paper is different with IkeaBot. People use real natural language and gesture to guide robot. In this paper, authors use a multimodal tablet which enable supervisor to use speech and pen-based gesture to tell forklift what the task is. They develop an interface which is based on the spoken commands and gesture made on a handheld tablet computer. Supervisors make the complex commands into sub-tasks in order to make the robot toe finish complex pallet handling tasks. By using the tablet, robot can recognize spoken command and gestures. They use SUMMIT library to handle speech recognition for summoning. By using input spoken command is to reduce the supervisor's burden, robot can carry out higher level directives. However, the problem for spoken command now is , supervisor can only use limited command to guide robot movement. One way to solve this problem now is using circling gesture. As the authors mentioned in their paper, the user interface echoes gesture as a closed shape and the gesture is context dependent.
ReplyDelete
Replies
dabelNovember 17, 2013 at 12:48 PM
[Forklift] There are a ways in which the integration of natural language could benefit the forklift system. As is discussed in the closing sections of the forklift paper, one could imagine removing the need for a GPS-delineated map by equipping the forklift with the ability to generate topological maps in an on-line way (e.g via a narrated tour, as the paper suggests, or though an annotated gesture approach - this wouldn't require actually moving with the forklift, which might be convenient in some domains). Another extension would be replacing the default "I'm stuck" behavior mentioned in section II with a more robust response, similar to the inverse semantics system demonstrated with the U-bots (e.g utterances could indicate why the forklift is stuck, or how exactly the operator can help). Additionally, we could imagine an augmented system that learns from demonstration (with natural language annotations) - a human operator could perform a task and provide narration during the task in such a way that the forklift is then able to reason about more complicated task execution (this might allow for the execution of "expert level tasks" mentioned in the paper). Finally, we could use natural language as a means of defining high level tasks in terms of low level tasks (i.e. task composition).

[IkeaBot] Beyond the extensions already made with the IkeaBot system (i.e. the inverse semantics), it seems like the proposed planning language (ABPL) provides a host of opportunities for natural language 'injections' to this system. Most prominently, natural language descriptions of type definitions, groups their properties, high/low level tasks and their preconditions, or really any other form of planning knowledge could be derived from a human coordinator's language. This might be useful in achieving the genarality suggested in the Conclusion. Furthermore, the coordination of the task could be expanded to include a human participant of a certain degree of laziness/ability. For instance, if a person was willing to help out with portions of the assembly task that were trivial for a human but difficult for a robot (e.g. an entire new gripper was required to achieved the screwing operation, we can imagine that in future furniture assembly scenarios, other manipulation limitations arise), the human would be willing to help out. Then, there would be a constant dialogue between the robotic task force and a human participant, about what tasks the human is willing to help out with.
ReplyDelete
Replies
UnknownNovember 17, 2013 at 1:03 PM
The forklift system would greatly benefit from increased interaction with language. If more complex commands could be interpreted, it may be possible to reach a point where people only need to be trained on the end goal of warehouse organization and not on the usage of robots to achieve that end goal. Gestures could be interpreted alongside commands for greater clarity, such as the command “pick that up” when pointing to something.
Expanding this further, and this also applies to the IkeaBot system, larger sets of procedures can be packaged into simple commands. I imagine both systems having some sort of “API” which is executed through speech. For example, the short command “prepare today’s air shipments in there” could trigger an entire abstracted set of procedures based around “air”, “today”, and “there”. Language essentially becomes an alternative UI for customizing the robots. This is accomplished to a small degree by the papers, but the robot functionalities could be extended further given intricate speech inputs.
Given the limited environments that they are in, this is entirely possible once relevant understanding is pre-programmed into the robots. Given a SHRDLU-esque approach, it may even be possible to “program” newer commands on the fly through speech by mapping complex actions into the limited set of simple actions given the environment.
ReplyDelete
Replies
Bowei (Brad) WangNovember 17, 2013 at 1:11 PM
Teller's paper on the construction of a voice-command able forklift addresses the inconvenience of having the user input every task as a series of subtasks for the forklift. The more subtasks the user has to input through voice and gestures, the more room for error there is. The research propose to build a system that would identify the necessary subtasks to perform every time a command is given to it.

I think incorporating a dialogue system may help build an accurate subtask identification module. We have seen systems that can identify subtasks without dialogue with the user, such as the backward chaining planner in SHRDLU's backwards planning module. However, SHRDLU's block environment consists of only simple physics, while the forklift has to operate in complicated, real life settings. Therefore a task given to the forklift might have many ways of being broken down into subtasks. If dialogue could be carried out withe user using techniques such as POMDPs to guess the user state, the machine may be able to learn what types of subtasks each user would want it to perform in order to complete a request.

A simple scenario to illustrate the idea would be if the forklift realized it needed to remove some garbage before it could go to where it was instructed to go. It would ask the user where to put the garbage, and overtime it would learn the user's preference for future situations.
ReplyDelete
Replies
Lixing LianNovember 17, 2013 at 1:13 PM
The Teller et al. paper introduces that the forklift is able to perform rudimentary pallet manipulation outdoors in an unprepared environment. It uses a tablet to recognizes spoken commands and sketch gestures. However, the spoken commands are limited to a small set of utterances directing movement. The Knepper et al. paper provides manual-generated geometric/CAD files as the input. The input ﬁle provides a list of part types as well as the number of instances of each part. For each part type, it is specified with ‘file’, ‘center’ and ‘holes’. For each hole, it defines ‘diameter’, ‘position’, ‘direction’ and optional with ‘pass_through’. For the Bollini et al. paper, the BakeBot is able to collect recipes online as plain-text format, and then parse the recipes into a sequence of baking primitives to execute that correspond to the recipes and execute them for the beneﬁt of its human partners. If the BakeBot encounters unsupported baking primitives, it will ask the human for help executing it.

For the Teller et al. paper, since the forklift may work outdoors and may be affected by external noises, I am curious what will happen if the input spoken commands are not clearly passed to the robot. Moreover, if the forklift executes wrong and terminated by the user, I think it will still execute wrong for the same command in the future. Alternatively, I think we can use supervised data to train the system so that the system is able to convert the commands into specific low-level forms. If the commands are not clear, the robot should ask questions to clarify the commands (by using the design in previous paper we read before). For each command execution, the system record if it is correct or not. If yes, add this record as the supervised training data or record the mistakes otherwise. In this case, the mistakes may not happen again for the same command and the correctness of the system will improve as the training data increases.
ReplyDelete
Replies
UnknownNovember 17, 2013 at 1:29 PM
Language can easily fit into the forklift system, since it interacts heavily with humans in an environment. Voice commands instead of circling is the first thing that would be highly beneficial, since the current voice recognition seems to only support summing. The G3 framework would be an excellent way of extending this. Beyond simple commands, you could also reinforce the robot's vocal output to allow for better interaction with human's attempting to take control, as well as asking people to move if they are in the way. This would allow the robot to ask a person in the way or crossing its path to move to a direction, rather than simply stopping while within a certain proximity to a person. This system could be built much like the builder robots help requests from the previous paper. These two major improvements would be highly beneficial to the forklift's domain.
The IkeaBot domain seems much less benefitted by language, since it seems to be solely robots interacting with robots based on a furniture schematic. Since some minor human assistance was required to be successful in all trials, help requests similar to the Tellex et al. paper would likely be beneficial. Other than that, the only other major use of language I could see would be to have building instructions given verbally, rather than requiring a schematic for construction, which might be helpful, but seems incredibly difficult to achieve.
ReplyDelete
Replies
akovacsNovember 17, 2013 at 1:39 PM
A clear opportunity for adding language to the forklift robot is by replacing the tablet drawn gestures with actual gestures. Drawing regions of interest on tablet is obviously not how humans naturally signal localities; we point at things. To recognize the “language” of pointing we would need the robot to be aware of the positioning and posture of the operator's body. One relatively simple one would be to equip the operator with an accelerometer embedded in a glove such that the robot could read the orientation and positioning of the operator's hand directly. But this is also not ideal because we don't don special equipment to communicate normally. In order to be more seamless the robot would need to be able to model the user's body using depth readings, but that's tricky because a pointing gesture looks very different depending on the angle you're looking at it. But presumably we could train a classifier to figure out that there's a protrusion from the upper torso of the body jutting out in one direction for a few feet if we model the human body as a simple stick-figure.

The ability for the IKEABot to figure out how to assemble a piece of furniture without instructions is impressive, but may not be feasible for sufficiently complicated instructions. It would more naturally relate to how humans approach this problem if we built language understanding systems to understand the human written instructions that often come with furniture. There is literature on this that we have read by Branavan for using RL to understand manuals with a relatively high degree of accuracy.
ReplyDelete
Replies
UnknownNovember 17, 2013 at 2:08 PM
The forklift could be more easy to use if it had an ability to understand more complex natural language sentences. Then users wouldn't need training to know what certain words the system understands, and they could just talk to the forklift without using tablet interfaces. Telling it what to do is much easier for people to interact with the forklift than using gesture interfaces. And asking helps would help users figure out what the system wants when it needs an external help to proceed.

IKEA bots could plan better with an ability to understand human languages. A table could come with instructions or an user could tell them what to do. With instructions, the planner could constrain the search space and possibly ambiguous assemblies could be resolved by language instructions. As search inherently approximates the optimal solution, incorrect plans arise from time to time. Then users could tell them that they are doing wrong when their plan isn't quite right. And asking help requests would help them to finish assembling fast.
ReplyDelete
Replies
UnknownNovember 17, 2013 at 2:16 PM
The Teller's paper describes the design and implementation strategy involved consultation with the intended users. The mechanisms that are used include hierarchical task-level autonomy, gestures and annunciation of intent, continuous detection of shouted warnings, the last two of which involve using languages. The languages are used to describe the robot’s current state and other actions, such as “paused”. This makes it possible for robots to act close proximity to people. The examples in the paper are about states and intent, information from people. Most commands are short and simple sentences. The robot needs to understand human command. The writers are collecting a larger corpus to make robots to learn human languages better.

The IkeaBot paper introduces a system where a team of heterogeneous robots collaborate in an assembly task. The system takes in geometry data in the form of a set of CAD files consist of three parts: file, center, holes. And then it generates ABPL which is an object-oriented symbolic planning specification language. The language instructions given by people before and during the execution of the robots can help robots to improve the planning. What is more, if a robot is uncertain about something, it can ask for human help. This is a great improvement that can help robots to work in conditions that are not familiar.
ReplyDelete
Replies
UnknownNovember 17, 2013 at 4:28 PM
Language can be used to make these systems more general and improve coordination. One extension for the forklift case is, as in last week’s paper, to have the robot ask for specific help instead of just getting stuck and waiting for a person to decide what to do. Also, they could give more informative announcements about their intentions and understand specific shouted commands, instead of just stopping automatically every time. Furthermore, instead of being given commands by circling objects on the tablet, the human user could give natural language commands.
For the Ikeabots, I am not entirely certain how they communicate with each other, and whether speaking to each other would be useful to help them collaborate. Just as in the papers last week about legibility so that humans know what the robot is doing, and the forklift announcing its intentions, multiple Ikeabots need to know what the other is doing and planning on doing. Natural language could fill that role, although might not be the most efficient way for communication between robots. However, it would allow a human collaborator to be easily integrated.
ReplyDelete
Replies
BrawnerNovember 17, 2013 at 10:08 PM
I think both systems share a need for disseminating plans and goals to the robots that could ideally be parsed from natural language commands or text. The obvious example in the IKEA bot system would be for the robots to plan from the horrible, horrible instructions included with the disassembled furniture. In the paper, the plan was written in a manner that could generate a set of required conditions about the objects present in the room. Because, the instructions by IKEA are generally sparsely detailed for most furniture, it might prove impossible to plan from beginning to end without human intervention. I think a grounded language interface would be capable of handling most language provided in an instruction set. However, I think that an autonomous system would struggle to label the objects without human input.

I think the demonstrated situation for the forklift, operating independently at forward operating bases, implies that it would be regularly put in dangerous situations. If accomplishing forklift tasks is so important, but the environment is too dangerous, I think that you would need to rely on language to solve a lot of possible failure scenarios. Having clear commands and ways to describe clearing debris, restacking pallets, and moving objects outside of pallets would be crucial.
ReplyDelete
Replies
Jun Ki LeeNovember 18, 2013 at 1:16 AM
In the Teller et. al paper, users can send voice commands via his/her tablet device for a robot forklift. However only small set of utterances directing movement was allowed such as ‘come to receiving’. As was in Tellex’s work, natural language commands in this system can be expanded to give more specific commands to the robot. It can contain spatial relationships and more detailed descriptions for commands. Other uses of natural language were in stopping the robot’s motion. When a person near the robot shouts, the robot was built to stop its motion entirely. However, authors suggest that only the volume of a shouting can be used instead of its context. In this paper, authors mentions that the system lacks the ability to process higher level tasks. Using planning strategies and any natural language to formal language framework such as the SPF or Mooney’s, it is possible for the system to take a declarative sentence such as ‘I want all the pallets to be moved to the receiving area.’ and act accordingly.

In Knepper et. al’s work, the system takes the geometric/CAD data and autonomously calculates plans for multi-robots to assemble a IKEA furniture. In Tellex’s work, the G3 system was used to generate natural language help requests for the robots to recover from their failures. I would like to know if we can make humans annotate furniture assembly sequences and use this data to further plan a human robot collaboration scenario. Even in Tellex’s work, humans were only asked to help the robot when there is a failure. Since in most real world cases, the robots are limited in its capability to fully assemble IKEA furnitures, I would like to know if we can model a human robot collaboration plans from natural language sequences of furniture assembly procedures.
ReplyDelete
Replies
NGNovember 18, 2013 at 6:11 AM
The Teller et. al work describes a robotic forklift that is controlled by a tablet using gestures and limited speech input. The robot has path planners, object detection framework, obstacle avoidance features. The robot recognizes using task planning and path planning outputs when no further action is possible and asks for human intervention. This is one place where they can add a natural language output with the robot expressing where it is stuck. Another place can be as mentioned giving verbal commands to the robot using G3 as has already been demonstrated Tellex et. al 2011. The paper itself suggests an interesting idea of giving narrated tours to the robot instead of marked GPS maps, but I am not sure which is more easier vs which is more human like.
With the Knepper et al. work has Kuka youBots assembling furniture and given CAD files, information about center of mass and holes where parts fit. There is a multistep planning done, first to fit parts together such that sub-assemblies are created and rated as plausible and then these sub-assemblies are further joined together reducing degrees of freedom until all components are used up and no holes remain. This is given to a symbolic planner to actually plan actions for the bot. Here is where NL can be used, may be the search spaces can be pruned by a narration, such the planner already knows what good sub assemblies are. It has already been demonstrated that language generation and asking for help by robots in these scenarios can help the robot solve complex tasks.
ReplyDelete
Replies
UnknownNovember 18, 2013 at 7:12 AM
Forklift Domain:

Something that is interesting to me about this sort of supervised navigation task (despite only being mentioned once close to the end of the paper) is the idea of a human-guided tour in order to orient a robotic agent with the layout of a space. This came up much earlier in the semester with the video from MIT where a human guides a robot around a region as a tourguide would a human. I'm not sure why, but this idea really struck a chord with me. It seems that this task addresses two disparate problems at once.

1) Having a robot autonomously learn about an environment takes a long time and is prone to error, especially if the environment is complicated or, as in the forklift domain, areas of the world are specifically correlated with a class of task (ex: a "loading" and "unloading" zone for held objects). Having a human guide would allow for the robot to learn much quicker and with greater fidelity.

2) An idea which has come up both in this paper and last week's legibility paper is the idea that human/robot cooperation is limited not just by technological failings, but by human uncomfortability around robotic assistancs. Something as simple as teaching the robot to navigate the environment could serve as an "introduction" or "icebreaker" of sorts between the human and robot, especially if the robot can make very clear when it understands, when it is unsure, and when something previously unsure has been clarified.

After a guided tour, the human is much more familiar with the way the robot behaves and interacts, and the robot is much more familiar with its environment.

Ikea Domain:

It's sorely tempting to say "The robots should ask for help when they're confused!" but I've got a sort of deja vu feeling about that idea that I can't shake.

I guess I'll stick with the "asking for help" task and relate it to something other than physical construction scenarios in which the robots are stuck. Instead, perhaps language can be suited to the task of deciding based on the unassembled pieces what the final form of the constructed object should be.

In this case, maybe a robot when faced with many possible "matings" of holes and pieces could ask clarifying questions of the human such as

"Is the object I'm trying to build taller than 2 feet"

The answer to which would rule out or include an entire class of possible constructions by a common trait. In this way, natural langauge could be used to expedite that search process.
ReplyDelete
Replies
Kurt SpindlerNovember 18, 2013 at 9:46 AM
We've already discussed the advances in both the forklift domain (commands) and the Ikea domain (help requests) Still, language could augment these systems in many ways. I think both these systems would benefit greatly from a dialog system. If a command is unclear, the robot could say for example, 'Which tire pallet?' in order to clarify ambiguity. The Ikea bots could understand requests said to it ("I don't understand what you want."), and then reply appropriately. Then there's also language applied to knowledge acquisition, and language used as actions. The knowledge acquisition problem starts to be addressed by the Walter 13 paper about tour following, wherein a map of the environment is built and improved using natural language. The action usage of language is addressed in the followup Ikea papers, of when getting stuck, requesting help. However, it might make more sense for the language to be one choice of action, rather than a fallback mechanism. Meaning to say, whenever a robot is choosing the next action, it might choose between moving along its trajectory, changing trajectories, or speaking. Consider if a human is standing in the path of the robot. It could deviate from the path and go around it, or it could ask the human to move, which might be less energy expenditure. In my mind, this is a generalization of the 'language as error throwing' paradigm of the Ikea bots, where they only speak if they get stuck.

There are a number of specific problems that this work would address. With regard to the the dialog systems, even the updated Ikea system cannot further clarify its requests for help, leading to scenarios where it just gets stuck because neither robot nor human either know what to do next or are unable to do it. Having language inform knowledge would allow the system to understand novel titles for things and potentially allow for small generalizations in the robots capabilities.
ReplyDelete
Replies
UnknownNovember 18, 2013 at 11:56 AM
In Teller's paper, users can use speech and pen-based gestured to assign tasks to the forklift. They use a tablet to recognize spoken commands and gestured. But the spoken commands is limited to a small set of utterances. So it requires user to break down complex commands to high-level tasks which is burdensome to people. If the forklift can understand complex natural language commands, the users will narrate it much easier and more natural. This system can also signal it is stuck and it needs human intervention. So it is better to use a dialogue system to let people operate the forklift. Human can abandon the the current task using verbal instruction or he can clarify which tire the forklift needs to get. And for "shout-to-stop" commands, dialog system can also make things clear by human clarify to robot why it should stop after shouting. And the robot can learn from this situation in order to stop by itself in the same situation if human needs.

In Knepper's paper, an autonomous robotic system is proposed to assemble a piece of IKEA furniture. It takes a set of CAD files as the input of a geometric preplanner to ABPL which is an object-oriented planning language and then the symbolic planner will generate a sequence of primitive actions for robots to execute. So language can be used to generate the states of the planning domain as a supplementary tool to geometric preplanner. For example, to a new IKEA furniture with minor differences, human can tell the robot the difference by language in order to generate the goal. The other thing, robots may fail in the process of execution, so language can be used to make requests to human for help. This is what last Telllex's paper talking about.
ReplyDelete
Replies
UnknownNovember 18, 2013 at 2:24 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownNovember 18, 2013 at 3:46 PM
In both papers, speech could be incorporated both to augment the robot's knowledge of the world and to interact with humans (ie. ask for help). We have already seen videos of the Ikea youBots asking humans to help in particular tasks, which seems to be quite effective when used with the G3 framework. Such a system could be further extended to having the robot explain what steps it did. This way it could quickly fill in a human (supervisor) about its recent actions.

On the other hand, adding speech as input would be extremely effective in Teller's paper. For example, using a framework like G3 humans could give the robot much higher-level subgoals to complete such as "Place the pallet on the truck". This would avoid requiring the human to specify all of the several subgoals required for such an action. In addition, if a dialogue system is also included more advanced commands (ie. "do that again") could be possible, which would avoid repetition. Finally, another useful system would be one in which sub-goals can be grounded in natural language. This would allow a human to "create" a sub-goal that is specific to a task at hand. For me, this last bit is reminiscent of SHRDLU being able to understand block compositions.
ReplyDelete
Replies
UnknownNovember 19, 2013 at 7:08 AM
Forklift:

It would be interesting to look at whether it's possible to specify language
tasks at a higher level than some of the ones I've seen (pick up *this* pallet,
go to *this* truck), and refine as necessary with dialouge. This more closely
resembles the relationship that a human manager of forklifts would have with
an operator: "please unload the last truck that came in." (Confusion ensues,
resolved by discussion.) It would be even more interesting to take it to the
next level by specifying policies with language: "whenever a truck comes in,
unload it." This, too, could be refined, and maybe confirmed before an
important action was performed.

I think I would try to approach both of these in a way which mapped the higher
level command to existing commands, so "unload the truck" to:

Pallet A -> Shelf A
Pallet B -> Shelf B
...

This would require some additional knowledge about how to enumerate pallets,
but a lot of the work is already implemented -- each sub-goal is theoretically
already dealt with, in addiiton to the lower level perception processes (e.g.
knowing where "the truck" is in the first place.)

I think the set of higher level commands in the forklift domain is relatively
limited (to be fair, so is my knowledge about forklifts), but in more complex
domains it would be more practical to learn how to break down commands into
their subgoals. You might do this with some kind of supervised corpus, or
something more online, since we're figuring on dialog anyways.
ReplyDelete
Replies

Add comment

Pages

Friday, November 15, 2013

Applications

39 comments: