At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search

An Introduction to Markov Models

Markov models are excellent ways of abstracting simple concepts into a relatively easily computable form. Markov models are used in everything from data compression to sound recognition. In artificial intelligence, they are used in natural language processing and sound recognition (which becomes an NLP problem once the sound has been converted into text-based data). This essay will introduce you to simple Markov Models and their uses.

Basic Theory

In natural language, most sentences have some form of pattern to them. They are simply a sequence of patterns, albeit it a relatively complex sequence at times. Look at the directed graph below, with 3 nodes and a start and an end.

From this graph we can create sequences such as:
N1 N2 N3
N1 N2 N2 N2 N3 N3 N3 N3 N3
N1 N1 N2 N2 N3
The thing is, in most sequences (especially complex ones like natural language), different paths have different probabilities of occurring. To accommodate for this, we can assign probabilities to the different paths (or arcs):

Now we can analyze the probability of certain sequence happening, using our previous examples:
N1 N2 N3                   = 0.4 X 0.8 X 0.5 = 0.16
N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x
                             0.5 x 0.5 x 0.5 x 0.5 = 0.0008
N1 N1 N2 N2 N3             = 0.6 x 0.4 x 0.2 x 0.8 x 0.5 = 0.192
The directed graph augmented with probability scores is called a Markov Model. Surprisingly simple as this may seem, we can see how powerful this can be when applied to something like sound recognition.

Applications to Sound Recognition

In sound recognition, a computer has to be able to deal with tens of thousands of words all of which can be pronounced differently. A brute force search is simply too inefficient and time-consuming to be of any use. In comes the Markov Model! Imagine that those three nodes shown above (N1, N2, and N3) were actually small chunks of data that which put together in various ways produced phonemes (small parts of words, such as iy (heat)) and that Markov Model represented different ways that those three data chunks could produce valid phonemes (along with their likelihood of occurring). Could we take that paradigm up a level and see how phonemes could be put together to form words? Of course, look at this example of the word tomato:

This accommodates for pronunciations such as:
t ow m aa t ow   - British English
t ah m ey t ow   - American English
t ah mey t a     - Possibly pronunciation when speaking quickly
What about increasing the paradigm even further? How could you link words together to produce a valid sentence? This is also just as plausible:

With sentences such as:
I like apple juice        - Very probable
I like tomato juice       - Very improbable!
I hate apple juice        - Relatively improbable
I hate tomato juice       - Relatively probable
You can see how using Markov Models, speech recognition systems can ascertain "intelligently" what was said. For example, look at these two phrases:
James's school...
James is cool
When these are broken down into phonemes they are very similar, especially when spoken quickly. When the computer checks it's Markov Model is would find that the first phrases is (unfortunately) much more likely to appear than the second, so it uses as a base for the rest of the sentence. Yet, if the phrases where these sentences:
James's school drives monkey feet
James is cool because he plays the guitar
It would switch over to the second sentence, because the first sentence quickly turned meaningless, while the second was both semantically and syntactically correct. This sort of "understanding" is imperative for a speech recognition engine, and especially one that has to deal with continuous speech.

Does the usefulness of Markov Models stop there? No! Markov Models can be arranged into a hierarchy that enables that computer to relatively quickly turn the speech data into meaningful sentences (probabilities omitted):

The computer would use this tiered Markov model firstly to extract phonemes from speech data, words from phonemes and sentences from words!

Conclusion

The remaining question that stands is where do all the probabilities come from? Normally, this is a brute force exercise of being fed a bunch of data (encyclopaedia text, Shakespeare etc) and generating a list of probabilities. In fact, some Markov Models are not directed like shown, but simply connect all words to each other and allow the probabilities to govern the connections (normally called a bigram model).

You can see how useful Markov Models are, both in speech recognition and natural language processing. While relatively simple in nature, they can prove incredibly productive when applied properly.

Submitted: 10/01/2001

Article content copyright © James Matthews, 2001.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -