At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search

An Introduction to Natural Language Theory

This tutorial provides a brief introduction to the content and method of natural language processing (NLP). More detailed expositions on NLP will be soon to come. A natural language processing program may consist of the following subprocesses:

Syntactic Understanding

  • Acquiring knowledge about the grammar and structure of words and sentences.
  • Effective representation and implementation of this allows effective manipulation of language in respect to grammar.
  • This is usually implemented through a parser.

    The Parser

    A parser assigns phrase markers (or grammatical objects) to words, such as verbs, adverbs, nouns, etc. It breaks down sentences into grammatical objects. For example, the IQATS parser is context-free. That is, recursion is allowed in the parsing of words, thus multiple levels of embedding will be allowed for grouping words with their respective phrase markers. This makes much more intricate parsing possible.

    A parsed sentence can be easily represented in tree form.

    (subject (Jack) (predicate (verb (ate)) (direct-object (a frog))))

    (s (np (jack)) (vp (verb (ate)) (np (a frog))))

    Below the trees are representations of parsings in lists. This is how the data structures for parsed sentences may look like in a list processing language like LISP.

    Semantic Understanding

  • This includes the literal meaning of words in language.
  • The inferences they can make.
  • The conclusions we can draw from them.
  • Generally the most difficult process to develop in NLP.

    Theory for Acquiring Semantic Knowledge

    Semantic Memory

  • Associates, and defines objects with other interconnected objects in a tree structure.

              mammal
              /     \
             /       \
           bird    sealife
           /   \        \
          /     \        \
        parrot  sparrow  whale
    
  • For example, from this simple tree structure we can define a parrot as a bird and a mammal, but not a sparrow or a fish or a goldfish. Most natural language researchers nowadays do not consider the model for semantic memory to be an adequate way to emulate human understanding. However it is certainly very convenient in terms of development.

  • The IQATS implementation of representing semantics is closest to this.

    Conceptual Dependency (CD)

  • CD relies on a more intricate network of primitives, representation for actions, states etc. CD is dependent on scripts for its 'real-world' knowledge base.
  • CD has been implemented in script-appliers (can be thought of as story understanders) such as SAM (Script Applier Mechanism)
  • Programs implementing CD can infer or "read between the lines" of sentences (see example below) by drawing real-word information from scripts. The primitive actions defining the actions of the real word are chained together in scripts. A restaurant script for example may describe that a person should pay a tip before leaving, or if he is not satisfied with the service, he should not be expected to pay a tip. A script on reading books may stipulate that one must lay one's eyes on a book before actually drawing information from words. Scripts play an essential role in enabling programs using CD to draw inferences.

    Here is a small example of SAM (Script Applier Mechanism) paraphrasing a story (notice the inferences):

    Input: John went to a restaurant. He sat down. He got mad. He left.
    Paraphrase: JOHN WAS HUNGRY. HE DECIDED TO GO TO A RESTAURANT. HE WENT TO ONE. HE SAT DOWN IN A CHAIR. A WAITER DID NOT GO TO THE TABLE. JOHN BECAME UPSET. HE DECIDED HE WAS GOING TO LEAVE THE RESTAURANT. HE LEFT IT.

    Submitted: 23/01/2000

    Article content copyright © Samuel Hsiung, 2000.
  •  Article Toolbar
    Print
    BibTeX entry

    Search

    Latest News
    - Generation5 10-year Anniversary (03/09/2008)
    - New Generation5 Design! (09/04/2007)
    - Happy New Year 2007 (02/01/2007)
    - Where has Generation5 Gone?! (04/11/2005)
    - NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

    What's New?
    - Back-propagation using the Generation5 JDK (07/04/2008)
    - Hough Transforms (02/01/2008)
    - Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
    - Modelling Bacterium using the JDK (19/03/2007)
    - Modelling Bacterium using the JDK (19/03/2007)


    All content copyright © 1998-2007, Generation5 unless otherwise noted.
    - Privacy Policy - Legal - Terms of Use -