CSCI 2951K: Topics in Grounded Language for Robotics: Tuesday, September 10, 2013: Language Grounding and Artificial Intelligence

Thursday, September 5, 2013

Tuesday, September 10, 2013: Language Grounding and Artificial Intelligence

This Tuesday we will read one of the first systems for understanding natural language:

Terry Winograd. Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. PhD thesis, Massachusetts Institute of Technology, 1971. (pages 1-52)

Terry Winograd's thesis was a system for understanding language about a table top filled with blocks of different shapes and colors. The whole thesis is more than 300 pages long; so only read the first two chapters. Pay particular attention to Section 1.3, the sample dialog. Please post a response of about 500 words¹ to the following questions in the forum:

How is SHRDLU representing word meanings? How is it combining word meanings together to understand entire sentences?
What are the strengths and weaknesses of Winograd's approach for representing word meaning? Especially think about running on a real robot instead of simulation. For additional contex, skim Harnad's paper on the symbol grounding problem:

Stevan Harnad. The symbol grounding problem. Physica D, 43:335–346, 1990.

What techniques and algorithms are necessary to support the types of linguistic behaviour that appear in Section 1.3? Brainstorm!

1. I'm looking for 500 words total, not 500 words for each question.

18 comments:

dabelSeptember 9, 2013 at 2:13 PM
This comment has been removed by the author.
ReplyDelete
Replies
BrawnerSeptember 9, 2013 at 3:49 PM
[1] SHRDLU uses a combination of word definitions and grammatical syntax to understand full sentences. It appears, the programs use a non-statistical classification scheme for relating words to understanding of the scene and its internal model. Each word is stored with a set of syntactical features - how its use works with other words - and an actual definition that can be interpreted by code.

To understand complete sentences, it breaks sentences into smaller groups of words, which are understood through grammatical syntax. Semantic features are used to categorize phrases, which allow the system to select relevant meanings from overloaded words, particularly pronouns. The words are mapped to very specific components and actions in the computer’s model. Small amounts of ambiguity are solved based on historical context.

[2] The words in Winograd’s system must be mapped directly to components, descriptions and actions in the system’s world. The classification and understanding of words is likely quite particular, and sensitive to how the person defining words chooses to define them.

To resolve the issue of grounding complexity, the word list is quite small (~200 words), which allows the system to solve queries rather quickly but limits the complexity of the input.

[3] One feature we noticed was the ability to add new definitions to specific words. In this case, the word ’like’. The definitions added were classifiers for objects in the scene. This didn’t give the word any particular meaning, but the classifiers made it possible for the system to know that some objects were liked and others were not.

Another thing we noticed was that different levels of the system parse whether an action is sensible and possible. It might be linguistically impossible ("can the table pick up blocks"), or programmatically impossible ("stack up two pyramids"). In many cases, helpful feedback can be provided.

The system also separates the block arranging logic and the queriable logic. That is, when asked the question, "Can a pyramid support a pyramid" it responds, "I DON’T KNOW". Then when asked to stack two pyramids, it fails and says, "I CAN’T". It checks if any examples are provided in the scene, not if it can actually simulate such a move. So a question that asks if something impossible can be done could never be answered in the negative. It would always be answered with "I DON’T KNOW"
ReplyDelete
Replies
laneSeptember 9, 2013 at 4:35 PM
[1]
SHRDLU was flexible in that it "understood" that words have multiple meanings and used the context of words to deduct words with ambiguous meanings (or it would ask questions if the word's meaning/phrasing was too ambiguous). For example, the use of pronouns can be highly ambiguous at times since a pronoun references an object mentioned beforehand, so SHRDLU would go through previous commands to determine a pronoun's meaning.

Furthermore, it could deduce human perception of a given word by other words used in context; for example, from the expressions "Is there a green block on the table?" "What color is it", SHRDLU would interpret "it" as the table since the block's color was already given by the communicator. Understanding the meaning and attributes of one word (like acknowledging that the color green was attributed to the block) gives context to what an ambiguous word's meaning may be, and thus help SHRDLU understand the entire sentence.

[2]
One of the big strengths I saw was the system's ability to learn new objects based on what it created (e.g. the steeple). Even though the system was in a small controlled environment with limited objects, the potential to define new objects based on the combination of previously defined objects could potentially allow such a system to be expanded into a larger environment with objects that weren't hardcoded.

Another strength was the system's ability to go through prior commands to infer a word's meaning. For example, when asked to "grasp the pyramid", SHRDLU did not know which pyramid to grasp since a specific pyramid was not referenced prior to that command; however, when asked "what is the pyramid supported by", SHRDLU deduced that the pyramid was the one seen in the last transaction. It's a relatively simple concept, but a very important one since human vernacular can be very ambiguous at times.

Another cool strength was its deductive system and how it was able to answer questions after performing a task. SHRDLU knew a "box could support a pyramid" since it found an example indicating that a box could indeed support a pyramid. However, it did not know if a "pyramid could support a pyramid" since there was no example in the scene. When prompted to "stack two pyramids", it learned that it could not stack two pyramids. However, if asked again after it failed, it would most likely repeat that it "does not know" since one failure does not necessarily mean the task is impossible.

[3]
Going back to the stacking two pyramids example--even after the system tried and failed to support a pyramid on top of another pyramid, it most likely would not definitively state that a pyramid could not support a pyramid. Even though a failed attempt could be included in the system's logic in order to increase its knowledge and ability to answer particular questions, there are too many factors to consider and one failed attempt is not conclusive.
ReplyDelete
Replies
UnknownSeptember 9, 2013 at 5:43 PM
(1)
SHRDLU has a collection of definitions called dictionary for individual words. For simple words, their meanings are defined in a straight forward way as semantic definition. For more complex words, SHRDLU can call on arbitrary amounts of computation to integrate their meaning into the sentence.
Since word may have different meanings under different categories, SHRDLU will use semantic features and any aspect of the sentence, the discourse or the world to select the correct meaning for the word. It also keeps record of current environment to resolve ambiguities and determine the meaning of references in discourse.

(2)
From what I have seen, one biggest advantage of this approach is that it greatly reduces work that needs to be done by robot. After manually create dictionary, word network and other preparation work, the robot only needs to do some selection job without much computation work. I think this can greatly reduce the response time and memory consumption.
The quality of its results relies heavily on the dictionary. It calls for a great deal of manual work to create a perfect dictionary and corresponding semantic features. Also it can only be used in one single domain and its scalability is not very good.

(3)
Object recognition. I think to make it actually work, the robot must at least be able to identify different objects and have an “understanding” of the surroundings.
Caching. Since we need to keep track of the whole environment, we need a good caching method to recording as much information as we can within limited memory.
Learning techniques. As we can see from the examples, the program should be able to learn from previous dialogs and be able to applied new knowledge to subsequent communications.
Control techniques. If we want to implement a robot in reality, we must grant it the ability to grasp or control real objects.
ReplyDelete
Replies
NGSeptember 9, 2013 at 7:33 PM
1)The word meanings themselves are stored in a "semantics language", which seems like a symbolic representation, such that word meanings can be connected and inferred. To understand word meanings SHRDLU is as already mentioned by the others using a combination of "semantic networks". This is done by making connections between words, for e.g. Block is a physical object and the word "support" in "The red block supports the pyramid." has a physical meaning, but the word "bloc" has a political meaning and the meaning of support in "The red bloc support Egypt." is not exactly physical, and association of the word support with other words in the sentence would solve this confusion. Moreover, grammatical heuristics with memory are involved in understanding references to prepositions and other words like "one" and "too". Additionally, in a small way (may be a heuristic) it understands that a preposition should not just refer to the immediate previous object discussed. To understand whole sentences it does a parse and checks for semantic correctness of the parsed structure as per its understanding of its environment, stored meanings and heuristics.
2) Strengths: It can resolve ambiguities with connections, e.g. support with block vs support with bloc. It asks questions to resolve ambiguities - e.g.: question 24, about objects on top of the green cube. As previously mentioned it can also learn new word meanings ("steeple") and remember new features to a known object ("like").
Weakness: Some of the weaknesses have been discussed. The symbol grounding problem, where the system is only learning from previously described symbols, and doing symbol manipulation. It does not seem to have this hybrid approach between connectionism and symbolic representations that Harnad describes. The lower level problem solving, like object recognition (and unrelatedly trajectory mapping and grasping) are completely overlooked. The system has only one model of a cube and it is not presumed that it can identify a cube from any perspective. It already knows that an object is a cube symbolically. This training/ teaching is impossible in real world, where generalization exists at lower levels of cognition, e.g. we can recognize that Hobbes from Calvin and Hobbes is a tiger, even if he is walking on two legs and handing out life lessons.
3) To solve such a problem, rudimentarily, I assume (bad word), needs objects and their definitions/ associations stored and the ability to search across these pairs. Logical correctness is being tested and implemented based on defined spatial rules. Even when it infers a particular meaning of a word, both meanings are already stored, and resolving depends upon the semantic meaning of other words in the sentence. What I intend to say is that this is just a better heuristic solver of problems, one with more rules than the traditional solver.
ReplyDelete
Replies
UnknownSeptember 9, 2013 at 7:55 PM
SHRDLU represents its word meanings through procedures rather than through symbol definitions, avoiding the infinite regress you might get with symbols defined only in terms of symbols. I didn't read far enough to understand what were the inputs and outputs of these procedures. The sentences themselves are broken down using grammatical rules that also seem to have been represented as procedures, into pieces that can then be parsed with its collection of word procedures.

The effort to ground SHRDLU's symbols with both a real "understanding" of the nouns and verbs it manipulates, as well as the acknowledgement that meaning is a fairly complex concept is the major strength of his approach, and it is puzzling that so little of the subsequent work in language sought to extend it. Presumably this is partially explained by the downsides, which must be dominated by the practical difficulties in giving machines what would amount to "visceral" knowledge of anything in particular. Perhaps we should be having them discuss flavors of electricity, or something they can be said to understand at a basic level. Beyond that, symbol grounding will require a robot to have an acquaintance with the world about which it will speak (and listen) and we don't really have robots adept enough to do that in any but the most constrained environments.

Another downside would be in the training. Statistical models of language have instant access to tens of billions of words that can be associated, analyzed, scanned, and processed in many different ways. Because the statistical associations in language are strong, you can go a long way with such models, and the appeal of making progress attracts researchers and research dollars. Training a real-world program on a problem space such as SHRDLU's will require a great deal of robot up-time, as well as researcher attention. (Training a toddler, a processor controlling machinery apparently designed for such tasks, takes years of constant attention and instruction.)

ReplyDelete
Replies
UnknownSeptember 9, 2013 at 8:10 PM
[1]
SHRDLU combines both the semantic definitions and syntactic features to represent word meanings. Since there may be more than one definition per word, a network is uses the syntactic features to perform a disambiguation task. Additional semantic analysis of the word is performed by using information gained from the discourse or world to choose the most appropriate meaning.

This is an example of Winograd's vertical structure, where data from one program can influence the output of another. When given a sentence, Winograd's system will break it down into its grammatical constituents. These parts are used by an inference program to determine if the current assignments are logically plausible (given a knowledge of the previous dialogue and the subject matter).

[2]
One of the primary strengths of Winograd's representation of word meaning is the vertical manner in which data is processed. A naïve program could simply take the word's dictionary definition to be its meaning. However, as is often the case, the word's meaning is grammatically context dependent. In addition to being grammatically context dependent, the word meaning can also be situationally context dependent. In this regard the system uses knowledge of the dialogue to further narrow down word meaning. An good example of this deduction is the analysis of pronouns; when asked “what color is it” the program will look back through the previous dialogue to determine the object “it” most likely refers to. If the last command was “move the green pyramid”, the program uses its knowledge of human reasoning to infer that you already know the pyramid is green and that you would not logically ask for the color of an object whose color is known.

On the other hand, Winograd's program is limited to a relatively small vocabulary. As size of the vocabulary grows the task of disambiguation and assigning meaning to words and sentences grows dramatically. With a larger vocabulary the robot would require a much more complete understanding of its own world. However, in the real world the robot's physical domain would also be significantly larger. This means that the robot will need access to much more information just in order to attain the same complete understanding of its environment as before.

[3]
In recreating the linguistic behavior that appears in Section 1.3 several different problems must be tackled. The first of which would be a way to define physical objects. For example, the robot should know how to identify a block, and be able to tell the difference between a block and a pyramid. In addition, the robot needs to have knowledge of what these physical objects are capable of. A pyramid can stacked on top of a block, but not the other way around. Finally, the ability to reason between physical attributes and properties is essential. Perhaps a basic version of this reasoning would be a mapping between attributes and properties. This would allow the robot to realize that a certain set of attributes (ie. flat surface) lend the object certain attributes (ie. stackable). Hopefully it would be possible to use this mapping to infer attributes from previously unknown objects, allowing the robot to learn from its environment.
ReplyDelete
Replies
UnknownSeptember 9, 2013 at 8:29 PM
1) I found this section a fair bit confusing, but here's my interpretation of how SHRDLU works. It represents the meaning of words in a dictionary of root words paired with meaning and syntactic features. To create and understand sentences, it uses a grammar and semantic features to create possible output sentences. Using this, it then weeds out nonsensical meanings of sentences (generally using word adjacencies to determine sentences that wouldn't work), and uses past information and requests to replace pronouns such as 'it'. By using these replacements and weeding out nonsense sentences, it can then either parse a sentence down to a command upon which it can act or determine that it needs to ask further questions to fully carry out an action.
2) The major weakness of this system is that it fails to connect to new actions. New things can only be defined by concepts that are already defined by the dictionary. Anything new that is added must have some basis in the dictionary-action relation already created. This weeds out many possible actions and relations that are not predefined. For example, a robot in the real world encountering an object that was not a pyramid or a block would have no idea how to interact with or define it, preventing real action, with a chair for example, unless it was already coded into the robots understanding. This severely limits real world application of this system.
The major strength of Winograd's system is that it gives concrete definition-action relations between sentences that the robot can understand in its world, allowing for new definitions that fit into the world defined. By creating such a concrete world, interpretation of sentences is much easier, and avoids such ambiguities as found with the crate lifting robot discussed in lecture on Thursday. This allows the robot to interact with its environment and respond to commands in a meaningful and accurate way, creating much more natural actions and responses.
3) Techniques/Algorithms necessary:
Natural language recognition - sentence pattern recognition, word meanings, correlation between meaning and actions, recognition of nonsensical sentences (i.e. when certain meanings of words would make no sense when adjacent to each other).
A way to bind past nouns to new pronouns (infer meaning through relative nearness of reference and other descriptions).
Object recognition (size, color, shape)
Ability to expand object definitions and understanding (i.e. adding steeple to known objects)
ReplyDelete
Replies
UnknownSeptember 9, 2013 at 9:18 PM
1. What I understand of this is the system described in the paper attempts (ambitiously) to move beyond an algebraic understanding of words/language (noting that previous attempts to parse natural language with the intent of producing lucid responses often focused solely on the syntactic/grammatical structure of statements, and the manipulation thereof). SHRDLU tries to understand words programmatically in (broadly defined) context by mixing dictionary definition with syntactic as well as semantic clues from not only the immediate grammatical context, but with regards to what *meaning* maximizes logical sense based on facts known about the environment (including the inferred knowledge of the partner in conversation). Sentences are understood to be made up of nested structural units (groups, clauses, etc..), each of which contributes meaning to the sentence as a whole. Less importance is given to the strict grammatical structure of the sentence, and more to the sentence as an organizational tool to convey meaning.

2. I was honestly amazed by the results of this system in the small scale example -- the robotic participant was able to lucidly navigate a conversation involving newly defined objects, ambiguous references, and an understanding of goals/planning/intent (I'm particularly impressed by the answers supplied to "Why did you..." questions). The system seems to be robust in the face of such edge cases. A downside seems to be the sheer amount of information required about the specific subject of discourse. If the robot hadn't been pre-programmed with a thorough understanding of its (highly limited) environment, it seems like that conversation would have consisted mostly of clarifying questions and definitions supplied by the human participant. I would be worried about how well this approach could scale to novel tasks, and planning in unfamiliar situations.

3. Some important tasks which would need to be accomplished algorithmically are A) A way to bind a word to a meaning in context, either in training data, or on the fly. B) A way to formulate clarifying questions, and to sort through nonsensical or illogical data. C) A way to plan to reach a goal based on known predicates, possible actions which can be taken from the current context, and the logical outcomes of these actions. Among other tasks.
ReplyDelete
Replies
akovacsSeptember 9, 2013 at 9:42 PM
(1)The system uses several related components to represent words and their meanings. The DATA and BLOCKS components of the system contains a pre-defined set of objects with sizes, shapes, colors, and locations and a representation of the robot's understandings of the relationships between them so that references to these objects may be understood. The DICTIONARY component stores a words along with with parse labels associated with each word as well as a "semantic definition" which is a program to be executed by the SEMANTIC FEATURES component which is made up of a network of categories which are used to disambiguate words in context.

Thus as it parses the sentence it makes sense of it using the SEMANTIC FEATURES in order to give it clear meaning and then relates it to the DATA and DICTIONARY and other modules in order to connect the words with their meanings and with the state of the world.

(2) It's strengths lie in the sheer flexibility and number of features when making references to the world contained in the DATA. This flexibility is demonstrated in its ability to learn new facts about this world. This feature shows that despite being built on deterministic procedures, it has a great deal of flexibility compared to the kinds of rules-based approaches that would have been common at the time.

Its weakness lies in being able to scale this methodology up to anything other than this block world. All this flexibility comes at the cost of rigidity in what and how much it can know about the world, because most of this is apparently provided in hand-written theorems about the block world. We can't rely on a hand-written DATA module for a sufficiently complex world and it would be very difficult to generate a general-purpose set of this data. Additionally, the number of ambiguities grows immensely in the real-world case, which might cause this methodology to fail or take an unreasonable amount of time while it searches through the network of semantic features.

(3) In order to implement such a system we need a model for storing relationships between objects in the world. Then a given world's relationships could be human-defined or some sort of training could be done on human-labeled worlds in order to learn how to label the relationships between objects. In the case of this block world the relationships would generally be about positioning. Our parser would be able to turn the sentence into a structured format, but it would then need to be mapped onto the world by some kind of model that attempted to draw relationships between certain sentence structures and certain relationships in the world. This could be a probabilistic model based on data where humans would be asked to describe a variety of scenes in the block world. The parsing could be biased towards sentences that correspond to what the program knows about the current world.

The process of trying to map the sentence onto the model of the world would have to recognize that it has low confidence or no single mapping from phrase to relationship to the actual world and would have to be able to generate questions about the particular part of the phrase that failed to map.
ReplyDelete
Replies
UnknownSeptember 9, 2013 at 10:35 PM
1. SHRDLU represents the word meaning and the whole sentence by using a language understanding system which combines syntactic, semantic and inference part together. For the single word, the system uses syntactic structure and semantic definition to represent it. In the syntactic parsing part, SHRDLU first performs morphemic analysis to convert each word in the sentence into its root form. Then it parses the sentence into some basic groups using syntactic features of word. And the semantic part of SHRDLU works in coordination with syntactic part in different phases. Due to the disambiguation, reasoning part is also called to make deduction in interpretation. Furthermore, there is also a network of semantic features in the system. It is used to categorize objects and actions for semantic part to make a correct meaning.

2. The first strength of this system is it integrates three parts together. It can use different kinds of information such as various information about a sentence, topic in disclosure and common sense knowledge to interpret natural language. It is indeed a great idea to consider the context and general information to help interpret natural language. A good example of this is that when the people say "pick up a big red block" "find a block which is taller than the one you are holding and put it into the box", this system is able to figure out what "one" and "it" represents in the second command.

The second strength is that this system can make deduction based on its knowledge. In the 1.3 sample dialog, the system is able to deduce negation, spatial relationships, size relationships and quantity relation.

The weakness of this system is that although it performs quite well in its own defined blocks world model, it can only process a small amount of words(200). And it needs a large amount of knowledge handcrafted into the system in advance such as syntactic structure and semantic definition of words, robot's mental world and blocks world model. When the scale expands, the handcrafted knowledge will increase enormously which make it impossible write them into the system. Even if it is possible to handcraft all the knowledge, the efficiency of the system will drop dramatically due to the structure of this system(integration of three parts).

3. To implement a linguistic behavior system on a real robot in Section 1.3, we need
(1) perception
The robot should be able to perceive the real world by itself. So it should have a good object recognition system which is able to recognize the shape and color of the object and the spatial relationships among objects like "the red block is in front of the blue block" . Especially, it should figure out support order of the blocks(e.g. two red blocks stack together) which is very difficult.
(2) linguistic model
parse sentence into grammar tree, definition of the word, semantically parse the sentence, inference ability, context aware analysis
(3) manipulation
For picking up and placing blocks, it needs very precise robot manipulation which should put some constraints on the robot arm motion planner to restrict the gesture of the robot gripper in order to grasp the block of different shapes.

ReplyDelete
Replies
UnknownSeptember 9, 2013 at 11:18 PM
[1]
SHRDLU represents word meanings with a combination of a dictionary and a program form. The dictionary contains syntactic features and semantic definitions of each word (e.g., "block" is a physical object, "two" is the integer 2) that can be used to construct semantic networks. Moreover, when the dictionary is not enough to resolve ambiguities or determine references, SHRDLU will use a form of a program (which differentiates it from other language understanding attempts) to deduce the answer from previous discourses or the knowledge model.
To understand entire sentences, SHRDLU parses the sentence with its grammar program, using the word meanings to guide the direction of parsing procedure and to reject nonsensical parsing attempts.

[2]
A major strength of Winograd's method for representing word meaning is its flexibility. The language that human could use to communicate with robot is not constrained to a limited form but is more like plain natural language. The language need not to be precise at the first time since the robot is able to ask questions in order to resolve ambiguity when the order is not clear enough, and human are free to add new concepts, etc.
A weakness that I can think of for now is that in SHRDLU, the DATA module is intrinsic for simulation. In real world, the robot need to perceive the external world using sensors and construct the corresponding module using the information it perceived. In this process, it could encounter concepts that neither exist in its knowledge model nor can be explained by using its previous knowledge, which might cause a problem.

[3]
Some of the techniques are already mentioned in comments of the dialog: coreference resolution, example based reasoning, logic connective, morphemic analysis, etc.
In addition, since the robot is able to break down a high level task into several low level tasks and conduct each one sequentially (e.g., in order to stack up two blocks and a cube, it first picked the cube that meet the requirement, then cleared up the block, finally stacked up all the objects), it probably utilized the classic planning algorithm.
ReplyDelete
Replies
UnknownSeptember 10, 2013 at 12:01 AM
(1) It sounds like it uses several phases to get at the meaning. It's got a
dictionary with morphology for the basics, and then passes stripped down
words to PROGRAMMAR. The most interesting part of it to me is using procedures
as "definitions" for words, allowing for flexibility and the power to resolve
the meaning of words in context. It makes a lot of sense to build the
algorithms to, e.g., resolve antecedents into the part which "understands" that
word.

The phrases are put together by trying to run a PROGRAMMAR parse (of various
kinds) on a sentence. Success is restricted by requirements of different
systems of sentence and is defined on the word, group and sentence level. On
the word and group levels it can also actively disambiguate the meaning of
words and phrases with questions.

It's also significant I think, although I'm not sure I understand how it works,
that they "[use] semantic guidance at all points... rather than trying all of
the syntactic possibilities." This I would guess comes out of the "vertical"
structure and seems like a big win from an effeciency standpoint.

(2) I think the strength is PROGRAMMAR -- it handles context-sensitive parsing
with something that is actually Turing-complete.

It's certainly true that SHRDLU has an easier time of understanding without
the additional problem of perception and physical interaction. It would seem
though that an appropriate abstraction of these, such as the one provided by
the graphics in SHRDLU, would address the problem neatly -- at least as far
as things hard-coded in its vocabulary.

It does present a problem for learning new objects: these have to be grounded
in existing terms (maybe not such a problem with an appropriate set of built-
in vocabulary), and more importantly algorithms of perception and interaction
have to be "learned" simultaneously. I don't think this is a problem with
his approach to word meaning, however, just a hard problem it brings up.

I suspect, and Wikipedia hints, that the systems for knowledge and
and disambiguation don't scale to larger domains. Computational complexity
aside, the cost of implementing all the ways many words in English can be used
would be an enormous amount of work. I like the comparison Tom made earlier:
a toddler needs several years of devoted attention, and that in a relatively
high-bandwidth format which comes naturally to us.

(3) The first thing that occurs to me is: a lot of hard-coded knowledge. I don't
say that in a derisive way though -- clearly the system has to start somewhere,
and the project is starting from scratch in its universe. A lot of these are
probably reusable to some extent, but domain-dependent.

One big item is a planner & deductions system analogous to PLANNER. Knowledge
isn't much use if you can't combine and act on it. My limited understanding of
planning is that performant systems exist and work well, *if* you know the
facts to give it and the goal you want from it.
ReplyDelete
Replies
Kurt SpindlerSeptember 10, 2013 at 4:53 AM
1. There are several facets to word meanings in the SHRDLU system. It uses a very simple vocabulary, hardcoded with an initial set of 200 words. While this is a huge limitation, the system is remarkably robust with that set of 200 words.

First off is set of symbolic properties. Most simply is the part of speech, but more complex properties are used as well: 'Pyramids can't support objects' or having different classifications according to color or shape. Then there is also a programmatic definition. Winograd uses the example of the word 'one': it is in part defined by a program which reviews previous sentences to determine the referenced object in a given utterance. These word meaings then assign validity to a given parsing and grounding, which can then be revised based on that validity.

2. The strengths of the system are self-evident in the sample dialogue given in the thesis. It is some of the most sophisticated (simulation-)world-based reasoning I've ever seen, and it's from 1971. Most impressive is the conversation- and situation- aware disambiguation in symbol grounding. *Give the command 'Grasp the pyramid', the person specifies which pyramid, then it knows henceforth what 'the pyramid' refers to in the context of that situation.*

That being said, all of this sophisticated reasoning seems hard-coded as opposed to learned, and therein lies its major weakness. It can expand it's vocabulary with simple composition (a "steeple" is a pyramid and cube stack), but we haven't seen it reason about more complex things (say, 'a cup is an upside-down pyramid cut from the top of the cube'). I'm also trying to think of some way to define even a sphere within the limited vocabulary demonstrated in the dialogue and can't think of one. So the world is really, really limited. It's use of symbolic manipulation in place of natural language processing is also evidence in dialogue line 3: "By 'it', I assume you mean the block which is taller than the one I am holding." Whereas a human would likely shorten this to a single adjectival descriptor ("You mean the *taller* block?" leaving the rest implied). Frankly, though, a human would have found no ambiguity in the command. There would be no motivation to find the taller block other than to put *it* in the box. It shows some of the boundaries of the reasoning capability of the system, and presumably many other things break down outside the narrow focus of the dialogue in the thesis.

One last major weakness is that, in contrast with the real world, this simulator has omnipotence. It is not performing computer vision, segmenting objects, and then reasoning about them.

3. One type of reasoning is symbol grounding: for each parsed symbol, it can answer the question, "Does this symbol correctly describe this object?" This technology actually seems fairly similar to automated proof checking programs in that it uses its reasoning tools as best it can; if it runs into any ambiguity though or unknown concept, it fails. For example, a human, based on real-world experience and geometric reasoning, would hypothesize that a pyramid cannot support a 2nd pyramid (line 11), but the robot has no idea. The robot also has a system for translating natural language definitions into symbolic ones (learning the meaning of a steeple), and thereafter remembering that and other observations given by a human (*I like the pyramid.*). It also performs parsing and natural language processing to come up with hypothesis symbols that the grounding program attempts to ground (and then revising the grounding if symbol grounding fails).
ReplyDelete
Replies
Lixing LianSeptember 10, 2013 at 6:25 AM
1. SHRDLU integrates syntax, semantic and inference as a close way to understand the words. It parses the input sentences for grammatical analysis, gets its detailed structure and abstracts the features of the linguistic components. For each single word, SHRDLU understands the word’s syntactic and semantic definition from dictionary. Its semantic system sets up simple types of semantic networks and uses deduction for semantic analysis (e.g, it knows ‘block’ is a physical object while ‘bloc’ is a political object).

SHRDLU represents knowledge in the form of procedures. The parser is able to call semantic routines to see whether the line of parsing it is following makes any sense, and the semantic routines can call deductive programs to see whether a particular phrase makes sense in the current context. It exams previous dialogs to have a thorough knowledge of entire discussion in order to see which reference assignments are logically plausible, and asks questions to resolve ambiguities.

2. One of the strength is that it asks questions if there are several interpretations to narrow down the ambiguities. Also it keeps track of earlier sentences to understand questions (e.g, deduces from memory to figure out what ‘it’ refers to) and absorbs new information as part of its own knowledge (e.g, remembers the user said the blue pyramid is nice and records its previous actions, understands what ‘steeple’ means, etc.). The self-learning and deductive system guarantees better understanding of the words.

However, it relies on limited size of dictionary and limited actual examples and it’s difficult to scale. If there’s no actual example, there is no easy way to tell without being able to examine the robot’s programs. And in the paper the author didn’t specify the way to record information of previous dialogs, I assume it’s easy to meet the memory bottleneck as the increase of the dialogs.

3. It should have object recognition technology to identify the object’s features, such as shape and color to execute the commands correctly, as well as the accurate operations to ‘put a small block onto the green cube’. It is able to identify what a ‘block’ is and how ‘stack up’ acts in the real world.

Meanwhile, it knows how to record the earlier dialogs and how to associate them with future dialogs, which is not only to store them simply but also to create connections among all dialogs.
ReplyDelete
Replies
UnknownSeptember 10, 2013 at 6:49 AM
[1] SHRDLU’s semantic system represents words by types in semantic networks and definitions. Given a word, the semantic system can retrieve the word’s types and definitions from SHRDLU’s memory.

SHRDLU has a program that extracts syntactic information from sentences. Using syntactic and semantic information, SHRDLU runs an inference program to understands sentences.

[2] SHRDLU essentially works by following instructions consist of a finite set of rules. SHRDLU would work well with examples that it has already memorized. If SHRDLU had an access to an infinite amount of data (all kinds of English words, phrases, and sentences), it would be able to deal with any kind of commands. In practice, with a finite amount of data SHRDLU would not be so useful.

SHRDLU would not generalize. It just would not know how to handle unseen commands. For example, if it encounters a word that is not registered in its system, it would not be able to continue its instructions.

[3] Two complex systems are required: one that understands sentences and another that generates sentences.

Building such complex systems require many small systems:
language models
parsers
word representing models
systems that understand their environments
systems that validate input/output sentences
and many more
ReplyDelete
Replies
Jun Ki LeeSeptember 10, 2013 at 1:38 PM
(1) SHRDLU represents word meanings in ‘programs’ especially designed for syntax, semantics, and reasoning instead of using rules, patterns, and formulas. This makes the system easy to relate syntax, semantics, and reasoning (deductive logic) to each other. INPUT handles typos and other errors by looking up the dictionary and GRAMMAR and SEMANTICS work together to interpret the sentence. ANSWER creates responses heuristically and also using the past discourse. PROGRAMMAR take ‘programs’ into account in parsing the sentence. BLOCKS, SEMANTIC FEATURES, and PLANNER helps Semantics for better inferences. The benefits of using ‘programs’ in processing the natural language is that you can take semantics and inferences into account. If the word has two meanings or meanings related to the discourse it can refer other systems that deals with semantics and inferencing with memory.
(2) One of its strength is that it can learn new relationships about word meanings in the discourse. In the sample discourse, it was able to learn the relationship between the verb, ‘like’, and the objects, ‘blocks’. When it could not understand what the word ‘on’, it knows how to disambiguate by asking whether it meant ‘directly on the surface of’ or ‘anywhere on top of’. It also remembers the past discourse and tries to understand what pronouns (it, they) in a given sentence meant. This wouldn’t be possible if we just run a parser or syntactic analysis. Semantics and reasoning need to come in to play a role to ask relevant questions to disambiguate the meanings.
One of the weakness in this approach is that it assumes that the current semantic map of the world are already pre-programmed. The SHRDLU system assumes that there is a magical ‘eye’ that can translate the real world into the symbols that the computer can understand. In the real world, you need to build a semantic map from a camera feed which can be either in 2D or 3D. Not only the object detection is necessary, but also each object’s physical relationships with other objects has to be resolved as well.
The next weakness is within the system itself. The preprogrammed meanings in the system called ‘programs’ need humans to produce themselves. They are not self-taught. When the system gets more sophisticated, it is not clear how we can deal with this complexity within the ‘programs’. It may be almost impossible for one to come up with all the possible disambiguation cases and resolve the conflicts with the ‘programs’. To advance this system, we need to know how to collect all the word meanings while dealing with all the disambiguations.
(3)
an algorithm which captures pronouns and either retrieves information from the past discourse or asks questions to disambiguate them.
an algorithm that can plan to achieve the user’s goal when the current state of the world model is given.
It needs to know where to grab necessary blocks and place them in the right spot. The system also needs to remember what actions had been taken to accommodate the user’s goal
a deduction algorithm that can infer a complex phrases like ‘the one which I told you to pick up’ or ‘find a block larger than the one that you are holding’.

ReplyDelete
Replies

Add comment

Pages

Thursday, September 5, 2013

Tuesday, September 10, 2013: Language Grounding and Artificial Intelligence

18 comments: