Friday, November 15, 2013

Applications

This week the class will focus on applications.  We will read three papers, one from the supply/warehouse environment, one from a factory floor environment, and one from a household environment.  Each paper describes a robot designed for the corresponding environment.   These papers give a sense of the complexity of an end-to-end robotic system, and also some visions of the ways in which robots will impact the world. 

11/19/13 (Tues.)
11/21/13 (Thurs.)

By Sunday night at 5pm, post a comment discussing how language can fit into these systems.  Try to identify a problem that language input or output could solve.  Then discuss a technique you might use to solve it. 

By Monday night at 5pm, post a reply to someone else's comment.  Ask a question; expand on an idea; suggest a related citation.

39 comments:

  1. In IkeaBot, authors use a new object oriented language, ABPL for representing symbolic planing problems. IkeaBot's planning model uses the ABPL for its raw input and output data. The symbolic planing can discover the preconditions of each action are satisfied and also the postcondition of future actions. Symbolic planner only relies on the blueprint which is generated by geometric preplanner. ABPL is based on the object oriented approach to data structure design. It allows the user to organize objects within the environment rather than express every element of OOP structures (PDDL). ABPL allows user to define object's type which can be assigned properties and sub objects. One problem which can meet when input language is that such as when we want robot to assemble a chair and the chair has four legs. Those legs can be put in any sequence. One way to solve this problem is using "Groups" in ABPL. Groups can let the planner to reason one type of symmetry more efficiently. After referencing a member of the group, the planner will know the rule can also be suitable for other group member.

    In Seth Teller's paper, they describe the development of a multi-ton robotic forklift. The communication between people and robot in this paper is different with IkeaBot. People use real natural language and gesture to guide robot. In this paper, authors use a multimodal tablet which enable supervisor to use speech and pen-based gesture to tell forklift what the task is. They develop an interface which is based on the spoken commands and gesture made on a handheld tablet computer. Supervisors make the complex commands into sub-tasks in order to make the robot toe finish complex pallet handling tasks. By using the tablet, robot can recognize spoken command and gestures. They use SUMMIT library to handle speech recognition for summoning. By using input spoken command is to reduce the supervisor's burden, robot can carry out higher level directives. However, the problem for spoken command now is , supervisor can only use limited command to guide robot movement. One way to solve this problem now is using circling gesture. As the authors mentioned in their paper, the user interface echoes gesture as a closed shape and the gesture is context dependent.

    ReplyDelete
  2. [Forklift] There are a ways in which the integration of natural language could benefit the forklift system. As is discussed in the closing sections of the forklift paper, one could imagine removing the need for a GPS-delineated map by equipping the forklift with the ability to generate topological maps in an on-line way (e.g via a narrated tour, as the paper suggests, or though an annotated gesture approach - this wouldn't require actually moving with the forklift, which might be convenient in some domains). Another extension would be replacing the default "I'm stuck" behavior mentioned in section II with a more robust response, similar to the inverse semantics system demonstrated with the U-bots (e.g utterances could indicate why the forklift is stuck, or how exactly the operator can help). Additionally, we could imagine an augmented system that learns from demonstration (with natural language annotations) - a human operator could perform a task and provide narration during the task in such a way that the forklift is then able to reason about more complicated task execution (this might allow for the execution of "expert level tasks" mentioned in the paper). Finally, we could use natural language as a means of defining high level tasks in terms of low level tasks (i.e. task composition).

    [IkeaBot] Beyond the extensions already made with the IkeaBot system (i.e. the inverse semantics), it seems like the proposed planning language (ABPL) provides a host of opportunities for natural language 'injections' to this system. Most prominently, natural language descriptions of type definitions, groups their properties, high/low level tasks and their preconditions, or really any other form of planning knowledge could be derived from a human coordinator's language. This might be useful in achieving the genarality suggested in the Conclusion. Furthermore, the coordination of the task could be expanded to include a human participant of a certain degree of laziness/ability. For instance, if a person was willing to help out with portions of the assembly task that were trivial for a human but difficult for a robot (e.g. an entire new gripper was required to achieved the screwing operation, we can imagine that in future furniture assembly scenarios, other manipulation limitations arise), the human would be willing to help out. Then, there would be a constant dialogue between the robotic task force and a human participant, about what tasks the human is willing to help out with.

    ReplyDelete
    Replies
    1. We can definitely use the techniques used in Walter et. al (2013)'s work to build a semantic map learned from natural language descriptions. Could you also explain a little bit more on using natural language as a means of defining a high-level task. I suggest using natural language to provide a declarative sentence rather than a descriptive sentence to give a high-level command to a robot so that a robot could figure out a detailed plan by itself. humans can also annotate each primitive actions so that robots could use these actions in its planning stage.

      Delete
    2. Re: using natural language as a means of defining a high-level task

      I was mainly interesting in establishing the preconditions/subgoals for higher-level tasks. I think this parallels the thought you had in your other comment about providing narrations of action sequences for a particular furniture construction goal. Since the ABPL language possesses the infrastructure to reason about subgoals, we can add new subgoal/goal relations on the fly (and not need to bake all of them into the system from the get go). Ultimately I think this boils down to a similar affect as "the operator provides natural language assistance in failure cases" - or at least the way I was envisioning things; if the planner fails, we can construct novel precondition/action relations (effectively subgoals) on the fly and retain that knowledge in future runs (e.g. "screw the table leg" requires the precondition of having a robot with a Torq Gripper - thus, we could imagine that a human operator could dictate these preconditions to the robot fleet, so that any screwing job is immediately allocated to a Torq Gripper bot)

      Delete
  3. The forklift system would greatly benefit from increased interaction with language. If more complex commands could be interpreted, it may be possible to reach a point where people only need to be trained on the end goal of warehouse organization and not on the usage of robots to achieve that end goal. Gestures could be interpreted alongside commands for greater clarity, such as the command “pick that up” when pointing to something.
    Expanding this further, and this also applies to the IkeaBot system, larger sets of procedures can be packaged into simple commands. I imagine both systems having some sort of “API” which is executed through speech. For example, the short command “prepare today’s air shipments in there” could trigger an entire abstracted set of procedures based around “air”, “today”, and “there”. Language essentially becomes an alternative UI for customizing the robots. This is accomplished to a small degree by the papers, but the robot functionalities could be extended further given intricate speech inputs.
    Given the limited environments that they are in, this is entirely possible once relevant understanding is pre-programmed into the robots. Given a SHRDLU-esque approach, it may even be possible to “program” newer commands on the fly through speech by mapping complex actions into the limited set of simple actions given the environment.

    ReplyDelete
  4. Teller's paper on the construction of a voice-command able forklift addresses the inconvenience of having the user input every task as a series of subtasks for the forklift. The more subtasks the user has to input through voice and gestures, the more room for error there is. The research propose to build a system that would identify the necessary subtasks to perform every time a command is given to it.

    I think incorporating a dialogue system may help build an accurate subtask identification module. We have seen systems that can identify subtasks without dialogue with the user, such as the backward chaining planner in SHRDLU's backwards planning module. However, SHRDLU's block environment consists of only simple physics, while the forklift has to operate in complicated, real life settings. Therefore a task given to the forklift might have many ways of being broken down into subtasks. If dialogue could be carried out withe user using techniques such as POMDPs to guess the user state, the machine may be able to learn what types of subtasks each user would want it to perform in order to complete a request.

    A simple scenario to illustrate the idea would be if the forklift realized it needed to remove some garbage before it could go to where it was instructed to go. It would ask the user where to put the garbage, and overtime it would learn the user's preference for future situations.

    ReplyDelete
    Replies
    1. I really like the idea of using language for breaking down subtasks. It adds two improvements, both is terms of solving needing to break down commands into subtasks and also allowing commands to be given directly without the use of the tablet. I wonder if something like the Ikeabot way of planning just using the components without knowing the end goal could be used in reverse. For the forklift, the robot would know the endgoal and be trying to break it down into the subgoals.

      Delete
  5. The Teller et al. paper introduces that the forklift is able to perform rudimentary pallet manipulation outdoors in an unprepared environment. It uses a tablet to recognizes spoken commands and sketch gestures. However, the spoken commands are limited to a small set of utterances directing movement. The Knepper et al. paper provides manual-generated geometric/CAD files as the input. The input file provides a list of part types as well as the number of instances of each part. For each part type, it is specified with ‘file’, ‘center’ and ‘holes’. For each hole, it defines ‘diameter’, ‘position’, ‘direction’ and optional with ‘pass_through’. For the Bollini et al. paper, the BakeBot is able to collect recipes online as plain-text format, and then parse the recipes into a sequence of baking primitives to execute that correspond to the recipes and execute them for the benefit of its human partners. If the BakeBot encounters unsupported baking primitives, it will ask the human for help executing it.


    For the Teller et al. paper, since the forklift may work outdoors and may be affected by external noises, I am curious what will happen if the input spoken commands are not clearly passed to the robot. Moreover, if the forklift executes wrong and terminated by the user, I think it will still execute wrong for the same command in the future. Alternatively, I think we can use supervised data to train the system so that the system is able to convert the commands into specific low-level forms. If the commands are not clear, the robot should ask questions to clarify the commands (by using the design in previous paper we read before). For each command execution, the system record if it is correct or not. If yes, add this record as the supervised training data or record the mistakes otherwise. In this case, the mistakes may not happen again for the same command and the correctness of the system will improve as the training data increases.

    ReplyDelete
  6. Language can easily fit into the forklift system, since it interacts heavily with humans in an environment. Voice commands instead of circling is the first thing that would be highly beneficial, since the current voice recognition seems to only support summing. The G3 framework would be an excellent way of extending this. Beyond simple commands, you could also reinforce the robot's vocal output to allow for better interaction with human's attempting to take control, as well as asking people to move if they are in the way. This would allow the robot to ask a person in the way or crossing its path to move to a direction, rather than simply stopping while within a certain proximity to a person. This system could be built much like the builder robots help requests from the previous paper. These two major improvements would be highly beneficial to the forklift's domain.
    The IkeaBot domain seems much less benefitted by language, since it seems to be solely robots interacting with robots based on a furniture schematic. Since some minor human assistance was required to be successful in all trials, help requests similar to the Tellex et al. paper would likely be beneficial. Other than that, the only other major use of language I could see would be to have building instructions given verbally, rather than requiring a schematic for construction, which might be helpful, but seems incredibly difficult to achieve.

    ReplyDelete
    Replies
    1. These are very good points. I originally thought that language would benefit the IkeaBot greatly as well, but it may be true that customizing the bot to handle requests would not be worth it simply because of the use case. Given the narrower use case of the IkeaBot, pre-programmed sequences or a visual UI would make more sense.

      I agree with your comments on the forklift system, however. Since the robots are regularly commanded by people who may not be trained specifically for this job, it makes sense to increase accessibility through language understanding. It didn't occur to me to think about robot speech output either, but perhaps this is more efficiently handled through simpler lights and sounds.

      Delete
    2. I think there's still a lot of room for improvement in the ikeabot situation. Requests for help could be further clarified using a dialog system. Commands could be given verbally instead of in instructions. On a basic level, it could do things like pausing its 60s timer if he hears the human say, 'Hold on a moment'. In a more complicated fashion, imagine the human is working parallel to the robots and would like to request help in the same way robots do now. The human says, 'hey, can you hold this a second', or "can you bring me the screwdriver", and the robot is able to respond and execute based on those. This could have immediate practical applications if say, you have back problems and have severe difficulty bending down, robots like these could pick things up when you drop them and verbally request help. I think the cost is a bit prohibitive for such a single unique use case, but once the robot is able to operate a bit more generally, that's entirely something practical and useful. I think there's a lot of room for growth even within this small domain of furniture assembly, and even crazier growth in home assistance more generally.

      Delete
  7. A clear opportunity for adding language to the forklift robot is by replacing the tablet drawn gestures with actual gestures. Drawing regions of interest on tablet is obviously not how humans naturally signal localities; we point at things. To recognize the “language” of pointing we would need the robot to be aware of the positioning and posture of the operator's body. One relatively simple one would be to equip the operator with an accelerometer embedded in a glove such that the robot could read the orientation and positioning of the operator's hand directly. But this is also not ideal because we don't don special equipment to communicate normally. In order to be more seamless the robot would need to be able to model the user's body using depth readings, but that's tricky because a pointing gesture looks very different depending on the angle you're looking at it. But presumably we could train a classifier to figure out that there's a protrusion from the upper torso of the body jutting out in one direction for a few feet if we model the human body as a simple stick-figure.

    The ability for the IKEABot to figure out how to assemble a piece of furniture without instructions is impressive, but may not be feasible for sufficiently complicated instructions. It would more naturally relate to how humans approach this problem if we built language understanding systems to understand the human written instructions that often come with furniture. There is literature on this that we have read by Branavan for using RL to understand manuals with a relatively high degree of accuracy.

    ReplyDelete
    Replies
    1. I really like the idea of the gesture recognition for more seamless instructions for the forklift. The only downside is that this requires that the operator be much closer to the forklift to instruct it, however, you can simply pair this with the tablet interface for a decent remote system while massively improving the instruction interface for close range interactions.

      Delete
    2. Depending on what kind of sensing hardware is being used I'm not sure if you'd have to be really close. You'd obviously have to have a line of sight to the forklift, but I bet you could find a motion capture/3D scan system that would allow you to be at a safe distance.

      Delete
  8. The forklift could be more easy to use if it had an ability to understand more complex natural language sentences. Then users wouldn't need training to know what certain words the system understands, and they could just talk to the forklift without using tablet interfaces. Telling it what to do is much easier for people to interact with the forklift than using gesture interfaces. And asking helps would help users figure out what the system wants when it needs an external help to proceed.

    IKEA bots could plan better with an ability to understand human languages. A table could come with instructions or an user could tell them what to do. With instructions, the planner could constrain the search space and possibly ambiguous assemblies could be resolved by language instructions. As search inherently approximates the optimal solution, incorrect plans arise from time to time. Then users could tell them that they are doing wrong when their plan isn't quite right. And asking help requests would help them to finish assembling fast.

    ReplyDelete
    Replies
    1. The question that I am curious about is that combining with Context Free Grammar and other machine learning methods, we can translate complex sentences accurately. In the G3 paper, writers use some grounding methods for robots to learn. That is similar to this situation. Why the project did not apply these methods? Is it because that we lack the accurate hardware to transfer from voice to text?

      Delete
  9. The Teller's paper describes the design and implementation strategy involved consultation with the intended users. The mechanisms that are used include hierarchical task-level autonomy, gestures and annunciation of intent, continuous detection of shouted warnings, the last two of which involve using languages. The languages are used to describe the robot’s current state and other actions, such as “paused”. This makes it possible for robots to act close proximity to people. The examples in the paper are about states and intent, information from people. Most commands are short and simple sentences. The robot needs to understand human command. The writers are collecting a larger corpus to make robots to learn human languages better.

    The IkeaBot paper introduces a system where a team of heterogeneous robots collaborate in an assembly task. The system takes in geometry data in the form of a set of CAD files consist of three parts: file, center, holes. And then it generates ABPL which is an object-oriented symbolic planning specification language. The language instructions given by people before and during the execution of the robots can help robots to improve the planning. What is more, if a robot is uncertain about something, it can ask for human help. This is a great improvement that can help robots to work in conditions that are not familiar.

    ReplyDelete
  10. Language can be used to make these systems more general and improve coordination. One extension for the forklift case is, as in last week’s paper, to have the robot ask for specific help instead of just getting stuck and waiting for a person to decide what to do. Also, they could give more informative announcements about their intentions and understand specific shouted commands, instead of just stopping automatically every time. Furthermore, instead of being given commands by circling objects on the tablet, the human user could give natural language commands.
    For the Ikeabots, I am not entirely certain how they communicate with each other, and whether speaking to each other would be useful to help them collaborate. Just as in the papers last week about legibility so that humans know what the robot is doing, and the forklift announcing its intentions, multiple Ikeabots need to know what the other is doing and planning on doing. Natural language could fill that role, although might not be the most efficient way for communication between robots. However, it would allow a human collaborator to be easily integrated.

    ReplyDelete
    Replies
    1. For the forklift, I agree that supporting natural language commands must make the system more powerful but using the tablet may save lots of work to do, though there is a limitation. And I also could not understand how the IkeaBots cooperate their works. What I can imagine is that their communications are highly relied on the system design. Expressing the meaning is different to understanding it. A robot may understand the command meaning but when it tries to represent it again, the meaning may be changed. I am wondering if the robots are able to communicate to each other just like communicating with human. But it seems it’s quite difficult to do so because they are using the same system design, if a robot can solve the problem, I think all of them are also able to do so. Moreover, if one of them gets stuck, maybe it will result in a chaos status for all robots.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. I think it is a good idea to apply natural language to Ikeabots. As you said, it's more easier for human know what can they do for help robot. In first paper, authors use tablet to communicate with robot, but I do not think it's a good way for robot to robot communication, cause for human tablet is easy to use, but maybe not robot. The technique in Ikeabots spoken command maybe a way which can used on robot, however, the problem now is we only can use limited command. So my idea is whether we can apply gestures to robot.

      Delete
  11. I think both systems share a need for disseminating plans and goals to the robots that could ideally be parsed from natural language commands or text. The obvious example in the IKEA bot system would be for the robots to plan from the horrible, horrible instructions included with the disassembled furniture. In the paper, the plan was written in a manner that could generate a set of required conditions about the objects present in the room. Because, the instructions by IKEA are generally sparsely detailed for most furniture, it might prove impossible to plan from beginning to end without human intervention. I think a grounded language interface would be capable of handling most language provided in an instruction set. However, I think that an autonomous system would struggle to label the objects without human input.

    I think the demonstrated situation for the forklift, operating independently at forward operating bases, implies that it would be regularly put in dangerous situations. If accomplishing forklift tasks is so important, but the environment is too dangerous, I think that you would need to rely on language to solve a lot of possible failure scenarios. Having clear commands and ways to describe clearing debris, restacking pallets, and moving objects outside of pallets would be crucial.

    ReplyDelete
    Replies
    1. Both domains need help only when something is out of the ordinary, like having derbies in the path that was not mentioned or the table top being upside down. It just seems that the robot forklift would meet more of these failure scenarios more easily, being in a hostile environment. Also it would be interesting to see how a narration of assembly helps the bot assemble or prune the search space, like narration and SLAM helps build maps.

      Delete
  12. In the Teller et. al paper, users can send voice commands via his/her tablet device for a robot forklift. However only small set of utterances directing movement was allowed such as ‘come to receiving’. As was in Tellex’s work, natural language commands in this system can be expanded to give more specific commands to the robot. It can contain spatial relationships and more detailed descriptions for commands. Other uses of natural language were in stopping the robot’s motion. When a person near the robot shouts, the robot was built to stop its motion entirely. However, authors suggest that only the volume of a shouting can be used instead of its context. In this paper, authors mentions that the system lacks the ability to process higher level tasks. Using planning strategies and any natural language to formal language framework such as the SPF or Mooney’s, it is possible for the system to take a declarative sentence such as ‘I want all the pallets to be moved to the receiving area.’ and act accordingly.

    In Knepper et. al’s work, the system takes the geometric/CAD data and autonomously calculates plans for multi-robots to assemble a IKEA furniture. In Tellex’s work, the G3 system was used to generate natural language help requests for the robots to recover from their failures. I would like to know if we can make humans annotate furniture assembly sequences and use this data to further plan a human robot collaboration scenario. Even in Tellex’s work, humans were only asked to help the robot when there is a failure. Since in most real world cases, the robots are limited in its capability to fully assemble IKEA furnitures, I would like to know if we can model a human robot collaboration plans from natural language sequences of furniture assembly procedures.

    ReplyDelete
    Replies
    1. RE: "I would like to know if we can model a human robot collaboration plans from natural language sequences of furniture assembly procedures."

      Are you envisioning a team of humans narrating their actions while constructing a piece of furniture, and using the narration (or both the narration and action transcript, e.g. a video) as input to SPF to be fed to the IKEA bots?

      Delete
    2. Yes, instead of robot autonomously calculating the assembly sequence by itself, I was thinking humans annotating assembly sequences in natural language and we can grab human annotated assembly plans (translated into formal language) and grounded meanings as well for later use.

      Delete
  13. The Teller et. al work describes a robotic forklift that is controlled by a tablet using gestures and limited speech input. The robot has path planners, object detection framework, obstacle avoidance features. The robot recognizes using task planning and path planning outputs when no further action is possible and asks for human intervention. This is one place where they can add a natural language output with the robot expressing where it is stuck. Another place can be as mentioned giving verbal commands to the robot using G3 as has already been demonstrated Tellex et. al 2011. The paper itself suggests an interesting idea of giving narrated tours to the robot instead of marked GPS maps, but I am not sure which is more easier vs which is more human like.
    With the Knepper et al. work has Kuka youBots assembling furniture and given CAD files, information about center of mass and holes where parts fit. There is a multistep planning done, first to fit parts together such that sub-assemblies are created and rated as plausible and then these sub-assemblies are further joined together reducing degrees of freedom until all components are used up and no holes remain. This is given to a symbolic planner to actually plan actions for the bot. Here is where NL can be used, may be the search spaces can be pruned by a narration, such the planner already knows what good sub assemblies are. It has already been demonstrated that language generation and asking for help by robots in these scenarios can help the robot solve complex tasks.

    ReplyDelete
    Replies
    1. It's a good idea to use NL as a narration for good sub assemblies. But I'm not sure by using natural language as a narration for good assemblies can prune the search spaces because by making inference of language to groundings, it is already time consuming. So it may indeed influence the total efficiency.

      Delete
  14. Forklift Domain:

    Something that is interesting to me about this sort of supervised navigation task (despite only being mentioned once close to the end of the paper) is the idea of a human-guided tour in order to orient a robotic agent with the layout of a space. This came up much earlier in the semester with the video from MIT where a human guides a robot around a region as a tourguide would a human. I'm not sure why, but this idea really struck a chord with me. It seems that this task addresses two disparate problems at once.

    1) Having a robot autonomously learn about an environment takes a long time and is prone to error, especially if the environment is complicated or, as in the forklift domain, areas of the world are specifically correlated with a class of task (ex: a "loading" and "unloading" zone for held objects). Having a human guide would allow for the robot to learn much quicker and with greater fidelity.

    2) An idea which has come up both in this paper and last week's legibility paper is the idea that human/robot cooperation is limited not just by technological failings, but by human uncomfortability around robotic assistancs. Something as simple as teaching the robot to navigate the environment could serve as an "introduction" or "icebreaker" of sorts between the human and robot, especially if the robot can make very clear when it understands, when it is unsure, and when something previously unsure has been clarified.

    After a guided tour, the human is much more familiar with the way the robot behaves and interacts, and the robot is much more familiar with its environment.

    Ikea Domain:

    It's sorely tempting to say "The robots should ask for help when they're confused!" but I've got a sort of deja vu feeling about that idea that I can't shake.

    I guess I'll stick with the "asking for help" task and relate it to something other than physical construction scenarios in which the robots are stuck. Instead, perhaps language can be suited to the task of deciding based on the unassembled pieces what the final form of the constructed object should be.

    In this case, maybe a robot when faced with many possible "matings" of holes and pieces could ask clarifying questions of the human such as

    "Is the object I'm trying to build taller than 2 feet"

    The answer to which would rule out or include an entire class of possible constructions by a common trait. In this way, natural langauge could be used to expedite that search process.

    ReplyDelete
    Replies
    1. Instead of asking for help from a human user in a nearby room, maybe the robot can attempt to find the solution to the problem through web search. Assuming the answer is not clearly findable in the provided instructions, maybe people have already answered it online. A web search, coupled with some Branavan style tutorial parsing of forum entries could possibly provide answers to help the robots plan.

      Delete
    2. Perhaps these kind of clarifications could also have a physical component and lead to a kind of collaboration. Like perhaps the robot would pick up one of the legs and line it up with a hole on the bottom of the table and say "Does this leg go here?" or more simply "Does this look right?. And then it could use that information to prune the search pretty quickly.

      Delete
  15. We've already discussed the advances in both the forklift domain (commands) and the Ikea domain (help requests) Still, language could augment these systems in many ways. I think both these systems would benefit greatly from a dialog system. If a command is unclear, the robot could say for example, 'Which tire pallet?' in order to clarify ambiguity. The Ikea bots could understand requests said to it ("I don't understand what you want."), and then reply appropriately. Then there's also language applied to knowledge acquisition, and language used as actions. The knowledge acquisition problem starts to be addressed by the Walter 13 paper about tour following, wherein a map of the environment is built and improved using natural language. The action usage of language is addressed in the followup Ikea papers, of when getting stuck, requesting help. However, it might make more sense for the language to be one choice of action, rather than a fallback mechanism. Meaning to say, whenever a robot is choosing the next action, it might choose between moving along its trajectory, changing trajectories, or speaking. Consider if a human is standing in the path of the robot. It could deviate from the path and go around it, or it could ask the human to move, which might be less energy expenditure. In my mind, this is a generalization of the 'language as error throwing' paradigm of the Ikea bots, where they only speak if they get stuck.

    There are a number of specific problems that this work would address. With regard to the the dialog systems, even the updated Ikea system cannot further clarify its requests for help, leading to scenarios where it just gets stuck because neither robot nor human either know what to do next or are unable to do it. Having language inform knowledge would allow the system to understand novel titles for things and potentially allow for small generalizations in the robots capabilities.

    ReplyDelete
    Replies
    1. The idea of telling humans what to do in order to (in your example) save energy is enticing, but I'm not sure that it's a trade-off people would be willing to make -- at least not if it appeared explicit. I think you would want to have the robot say something which gives the human the information needed to make the relevant choice, rather than having the robot choose for itself.

      Delete
  16. In Teller's paper, users can use speech and pen-based gestured to assign tasks to the forklift. They use a tablet to recognize spoken commands and gestured. But the spoken commands is limited to a small set of utterances. So it requires user to break down complex commands to high-level tasks which is burdensome to people. If the forklift can understand complex natural language commands, the users will narrate it much easier and more natural. This system can also signal it is stuck and it needs human intervention. So it is better to use a dialogue system to let people operate the forklift. Human can abandon the the current task using verbal instruction or he can clarify which tire the forklift needs to get. And for "shout-to-stop" commands, dialog system can also make things clear by human clarify to robot why it should stop after shouting. And the robot can learn from this situation in order to stop by itself in the same situation if human needs.

    In Knepper's paper, an autonomous robotic system is proposed to assemble a piece of IKEA furniture. It takes a set of CAD files as the input of a geometric preplanner to ABPL which is an object-oriented planning language and then the symbolic planner will generate a sequence of primitive actions for robots to execute. So language can be used to generate the states of the planning domain as a supplementary tool to geometric preplanner. For example, to a new IKEA furniture with minor differences, human can tell the robot the difference by language in order to generate the goal. The other thing, robots may fail in the process of execution, so language can be used to make requests to human for help. This is what last Telllex's paper talking about.

    ReplyDelete
    Replies
    1. I think shouting stop commands is more for safety than efficiency. The forklift should be very careful so that it would not hurt anyone. Clarifying should increase the efficacy of the forklift while it does not decrease the safety of it. As a baseline, the forklift could stop always when it hears some shouting and use clarifying commands to infer what to do next when it is told to go on.

      Delete
    2. Agree with the above about stop commands being more for safety -- a proposal: Perhaps when the robotic agent detects a shouted command (even if it cannot properly parse the phrase shouted), perhaps it can examine its environment state in order to determine possible causes for the alarm and, in the future, avoid states which are similar. In this way, perhaps it could slowly learn which situations are immediately risky to the point that a human would become loudly alarmed.

      Delete
  17. This comment has been removed by the author.

    ReplyDelete
  18. In both papers, speech could be incorporated both to augment the robot's knowledge of the world and to interact with humans (ie. ask for help). We have already seen videos of the Ikea youBots asking humans to help in particular tasks, which seems to be quite effective when used with the G3 framework. Such a system could be further extended to having the robot explain what steps it did. This way it could quickly fill in a human (supervisor) about its recent actions.

    On the other hand, adding speech as input would be extremely effective in Teller's paper. For example, using a framework like G3 humans could give the robot much higher-level subgoals to complete such as "Place the pallet on the truck". This would avoid requiring the human to specify all of the several subgoals required for such an action. In addition, if a dialogue system is also included more advanced commands (ie. "do that again") could be possible, which would avoid repetition. Finally, another useful system would be one in which sub-goals can be grounded in natural language. This would allow a human to "create" a sub-goal that is specific to a task at hand. For me, this last bit is reminiscent of SHRDLU being able to understand block compositions.

    ReplyDelete
  19. Forklift:

    It would be interesting to look at whether it's possible to specify language
    tasks at a higher level than some of the ones I've seen (pick up *this* pallet,
    go to *this* truck), and refine as necessary with dialouge. This more closely
    resembles the relationship that a human manager of forklifts would have with
    an operator: "please unload the last truck that came in." (Confusion ensues,
    resolved by discussion.) It would be even more interesting to take it to the
    next level by specifying policies with language: "whenever a truck comes in,
    unload it." This, too, could be refined, and maybe confirmed before an
    important action was performed.

    I think I would try to approach both of these in a way which mapped the higher
    level command to existing commands, so "unload the truck" to:

    Pallet A -> Shelf A
    Pallet B -> Shelf B
    ...

    This would require some additional knowledge about how to enumerate pallets,
    but a lot of the work is already implemented -- each sub-goal is theoretically
    already dealt with, in addiiton to the lower level perception processes (e.g.
    knowing where "the truck" is in the first place.)

    I think the set of higher level commands in the forklift domain is relatively
    limited (to be fair, so is my knowledge about forklifts), but in more complex
    domains it would be more practical to learn how to break down commands into
    their subgoals. You might do this with some kind of supervised corpus, or
    something more online, since we're figuring on dialog anyways.

    ReplyDelete