At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search

An Introduction to Natural Language Processing

A natural language is a language used in human culture such as Chinese, English or Bulgarian. They can be either spoken or written. So, what is Natural Language Processing?

What is Natural Language Processing?

NLP (Natural Language Processing) is a field in AI involving anything that processes natural language. Extensive research in NLP over the past decade has brought us one of the most useful applications of AI: machine translation. If we could one day created a program that could translate (for example) English text to Japanese and vice versa without need of polishing by a professional translator then bridges of communication could be significantly widened. Our current translation programs have not yet reached this level, but they may do so very soon. In particular, NLP research also deals with speech recognition. Currently, programs that convert spoken speech into text have been widely used and are fairly dependable. Another field of study in NLP are story understanders, as well as chatterboxes. During the 1960s Joseph Weizenbaum created ELIZA. ELIZA created a storm of public interest in AI, as it helped thousands overcome their personal problems. ELIZA was a psychiatrist, particularly one that posed analytical questions for every answer the user gave it. Though sometimes they may have seemed ambiguous, people actually felt ELIZA could take care of their needs just as well as any other therapist. They became emotionally involved with ELIZA, even Weizenbaum's secretary demanded to be left alone with the program. The following is a transcript of ELIZA chatting with another program PARRY:
Parry: I don't understand your motives.
Eliza: Why don't you understand my motives?
Parry: Shouldn't I be?
Eliza: What did you say just now?
Parry: That's just the way it is.
Eliza: Suppose that's just the way it were not.
Parry: So?
Eliza: I am not sure I understand you fully.
Parry: You harp on the same questions too much.
Eliza: What does this similarity suggest to you?
Parry: I don't get you.
Eliza: Do you wish that I you did get I?
Parry: It's about time you showed some interest in my feelings.
Eliza: Oh? I showed some interest in your feelings?

When people had started calling ELIZA intelligent, Joseph Weizenbaum went into an uproar. Technically, ELIZA was actually unable to understand people's personal problems to the depth of any other human being. ELIZA could only manipulate syntax (grammar), and check for some key words. Certainly, if someone had no knowledge of ELIZA being a program, one could easily conclude that it behaved like a human conversing, although it never really neccessary understood everything to the detail that humans do.

Coincidentally, ELIZA creates questions to help people's personal problems, while IQATS (Intelligent Question and Answer Test Summarizer), a program written by Sam Hsiung, creates questions for test-making purposes. Unlike ELIZA, IQATS is able to learn how to ask new questions, if it is given a sample question and answer. Yet, like ELIZA, it knows and will learn only how to manipulate syntax. It will be able to ask a question about what the capital or Saudi Arabia is, however if it were given something a bit more complex, such as Martin Luther King's 'I have a dream...' speech, it would not be able to come up with questions that force people to draw inferences (Ex.: Under what context was this speech given in?); neither does it really understand what it is asking.

Many researchers realized this limitation, and as a result conceptual dependency (CD) theory was created. CR systems such as SAM (Script Applier Mechanism) are story understanders. When SAM is given a story, and later asked questions about it, it will answer many of those questions accurately. (Thus showing that it "understands") It can even infer. It accomplishes this through use of scripts. The scripts designate a sequence of actions that are to be performed in chronological fashion for a certain situation. A restaurant script would say that you would need to sit down by a table before you are served dinner.

The following is a small example of SAM (Script Applier Mechanism) paraphrasing a story (notice the inferences):

Input: John went to a restaurant. He sat down. He got mad. He left.
Paraphrase: JOHN WAS HUNGRY. HE DECIDED TO GO TO A RESTAURANT. HE WENT TO ONE. HE SAT DOWN IN A CHAIR. A WAITER DID NOT GO TO THE TABLE. JOHN BECAME UPSET. HE DECIDED HE WAS GOING TO LEAVE THE RESTAURANT. HE LEFT IT.

Scripts allow CD systems to draw links and inferences between things. They are also able to classify and distinguish primitive actions. Kicking someone, for example could be a physical action that institutes 'hurt', while loving could be an emotional expressiong that implies 'affection'.

The Legendary Turings Test and It's Weaknesses

Let's move on to a controversial subject involving understanding in natural language systems. How can we be sure whether or not a machine actually understands something? In 1950, Dr. Alan Turing, a British mathematician who is now considered the father of AI proposed the Turing's test for intelligence. Rather simply, the Turing's test boils down to the question: "Can this machine trick the human to think that its human". Specifically, the machine is a natural language system that converses with human subjects. In the Turing's test, a human (the judge) is placed in one room, and the machine/or another human is placed in another. The judge may ask questions or answer questions posed by the computer/or another human. All communication is done through a terminal, input is done by typing. The judge is not aware whether or not the subject that he/she is talking to is either a human or a computer before the conversation begins. Supposing that the judge was conversing with a computer, during and after the conversation, he/she must be "fooled" into thinking that the machine is a human in order for the machine to pass the Turings test. There are actually very many pitfalls to the Turings test, and it is in fact, not very widely accepted as a test for true intelligence.

Today, the Loebner Prize is a modern version of the Turings test. The criticisms surrounding the Loebner prize deals with how the Turings Test is carried out. The goal of the contestant is to fool or trick the judge into thinking that his program is a human. Such a prospect does not encourage the advancement of AI> For example, messages are transmitted via text, as the subject (human or computer) types, the judge sees the text that is being typed, live. Thus, many contestants have been forced to emulate typing conditions of humans, i.e. text that is outputed comes out at varied speeds, sometimes words must be misspelled and corrected, incorrect punctuation is often used etc. Even then, the programs in the contest usually talk about only one subject (to talk about everything present in our culture is simply impossible- at least for a natural language system that understands only words, syntax and semantics and not really what they look like, what some objects really do etc.- which will be discussed later in other essays). If the judge picks another subject to discuss, the programs usually try to divert the attention of the judge. Programs have even tried to use vulgarity or an element of surprise, to get the judge excited (Truly, no computer could be vulgar or unpredictable could it?). For example, you may want to see a transcript of Jason Hutchen's program which competed for the Loebner Prize. You can also read an interview with Robby Glen Garner, winner of the 98/99 Loebner Prize.

In summary, the outcome of the test is too dependent on human involvement, and so also is the question of whether a certain system is really intelligent or not. Such a question is actually quite trivial and shallow. As Tantomito puts it, We should be asking about the kinds, quality and quantity of knowledge in a aystem, the kinds of inference that it can make with this knowledge, how well-directed its search procedure is, and what means of automatic knowledge acquisition are provided. There are many dimensions of intelligence, and these interact with one another.

Submitted: 19/12/1999

Article content copyright © Samuel Hsiung, 1999.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- The Latest (03/04/2012)
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -