At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Articles » Uncertainty Handling » Artificial Inference

The Shape of Inference: Using LPA's AI Toolkits

By Clive Spenser & Charles Langley

Inference is at the core of Knowledge-Based Artificial Intelligence. Different inference techniques require slightly different rules and give somewhat different behaviour.

In this document we contrast brittle production rule inference with more sophisticated methods by setting up four separate KBSs constructed using snippets of code from LPA’s AI toolkits. To keep it simple, the examples all have two inputs and one output. This has the advantage of allowing us to show the results of such inference by means of simple graphs.

Production Rule Inference

In the early days of Knowledge Based systems inference methods were brittle. That is to say, a rule representing an item of knowledge either fired or did not fire. Yes or No.

Example One Production Rule Inference

rule d_1 if i1 <   25              and i2 <   25             then output becomes   0.
rule d_2 if i1 <   25              and i2 >= 25 and i2 =< 75 then output becomes   0.
rule d_3 if i1 <   25              and i2 >   75             then output becomes   0.
rule d_4 if i1 >= 25  and i1 =< 75 and i2 <   25             then output becomes   0. 
rule d_5 if i1 >= 25  and i1 =< 75 and i2 >= 25 and i2 =< 75 then output becomes  50.
rule d_6 if i1 >= 25  and i1 =< 75 and i2 >   75             then output becomes 100.
rule d_7 if i1 >   75              and i2 <   25             then output becomes   0.
rule d_8 if i1 >   75              and i2 >= 25 and i2 =< 75 then output becomes  50.
rule d_9 if i1 >   75              and i2 >   75             then output becomes 100.
To show the characteristic shapes of the various inference techniques we hold one of the two inputs, I1, constant at 0, 33, 66 and 100 and vary the other from 0 to 100.

Given the magnitude of the two inputs, the magnitude of the output is shown on the vertical axis.


So far we have only seen two-dimensional slices of the output data. Here is the full 3D graph:

In the 3D graph 100 datapoints are plotted on each axis rather than 10 in order to counteract the sloping shown on the 2D graphs. This sloping is an artefact of the graphing itself. The 3D graph shows that we really have steps rather than slopes. The number of such steps can be increased simpy by increasing the number of production rules used.

Reasoning under Uncertainty

In the next three examples we shall be dealing with inference under uncertainty. There are two main sources of uncertainty, evidential uncertainty and semantic ambiguity (imprecision of language).

Fuzzy Logic

In the example below we use fuzzy logic, which deals with the ambiguity of terms such as low, medium and large.

Example Two - Fuzzy Logic Inference

fuzzy_variable input1 ;
   ranges from 0 to  100 ; 
   fuzzy_set  small is  \ shaped and linear at      0,  50 ; 
   fuzzy_set medium is /\ shaped and linear at 25, 50,  75 ; 
   fuzzy_set  large is /  shaped and linear at     50, 100 .
fuzzy_variable input2 ;
   ranges from 0 to  100 ; 
   fuzzy_set  small is  \ shaped and linear at      0,  50 ; 
   fuzzy_set medium is /\ shaped and linear at 25, 50,  75 ; 
   fuzzy_set  large is /  shaped and linear at     50, 100 .
fuzzy_variable output ;
   ranges from 0 to  100 ; 
   fuzzy_set  small is  \ shaped and linear at      0,  50 ; 
   fuzzy_set medium is /\ shaped and linear at 25, 50,  75 ; 
   fuzzy_set  large is /  shaped and linear at     50, 100 ; 
   defuzzify using all memberships and mirror rule and shrinking .

fuzzy_rule d_1 if input1 is small   and input2 is small   then output is small .
fuzzy_rule d_2 if input1 is small   and input2 is medium  then output is small .
fuzzy_rule d_3 if input1 is small   and input2 is large   then output is small .
fuzzy_rule d_4 if input1 is medium  and input2 is small   then output is small .
fuzzy_rule d_5 if input1 is medium  and input2 is medium  then output is medium .
fuzzy_rule d_6 if input1 is medium  and input2 is large   then output is large .
fuzzy_rule d_7 if input1 is large   and input2 is small   then output is small .
fuzzy_rule d_8 if input1 is large   and input2 is medium  then output is medium .
fuzzy_rule d_9 if input1 is large   and input2 is large   then output is large .

Note that the fuzzy logic program contrasts with the expert system in that rather than using fixed values for the input and output: 0 for small, 50 for medium, 100 for large, the fuzzy logic program defines three fuzzy sets, small, medium and large using value ranges 0-50, 25-75 and 50-100. Fuzzy inference then determines the degree of membership of each input and corresponding output in these three sets. As can be seen from the graphs, this results in a smoother distribution of output values.


An example fuzzy set


What is interesting here is the contrast with the graphs from the Production Rule expert system.

With Production Rules the graphs for I1=33 and I1=66 are the same, since each of these values is greater than 25 yet less than 75, hence medium in both cases.

The reason that we get variations on the shape of the graphs from the fuzzy system for values i1=33 and i1=66, is that the fuzzy system interprets these values as different degrees of being medium whereas for the production rule expert system, medium is just plain medium.

Here is the 3-D graph showing all slices fron the fuzzy system:

Bayesian inference

Our next example, Bayesian inference, deals with evidential uncertainty. Thus we interpret the two inputs as being evidence for the output. To implement a Bayesian knowledge base all we need to do is to attach an affirms and a denies weight to each of the conditions of the rules.

The affirms weight is calculated as:

                         A = P(E | H)
                             ---------
                             P(E | ~H)
and the denies weight is calculated as:
                         D = P(~E | H)
                             ---------
                             P(~E | ~H)
where P(E | H) is the conditional probability that H is true given that E is true, P(E | ~H) is the conditional probability that H is false given that E is true, P(~E | H) is the conditional probability that H is true given that E is false and P(~E | ~H) is the conditional probability that H is false given that H is false.

The advantage of using conditional probabilities to represent uncertainty is that their values can all be obtained from databases of previous cases. Note that when the affirms weight is less than the denies weight, this means that the evidence disconfirms the hypothesis.

Example Three - Bayesian Inference

uncertainty_rule d_1
  if      i1 is not high ( affirms 0.895; denies 3.20 ) 	
     and  i2 is not high ( affirms 0.895; denies 9.00 ) then output is high .
uncertainty_rule d_2
  if      i1 is not high ( affirms 0.895; denies 3.20 )
     and  i2 is     high ( affirms 9.00; denies 0.895 ) then output is high .
uncertainty_rule d_3
  if      i1 is     high ( affirms 3.20; denies 0.895 )
     and  i2 is not high ( affirms 0.895; denies 9.00 ) then output is high .
uncertainty_rule d_4
  if      i1 is     high ( affirms 3.20; denies 0.895 ) 
     and  i2 is     high ( affirms 9.00; denies 0.895 ) then output is high .


And here is the full 3D graph:

Notice that Bayesian inference is highly non-linear. This is useful in modelling dynamic systems with sudden transitions, such as water temperature being raised to boiing point.

Certainty Theory

The final inference method we shall look at, Certainty Theory, is a simplified adaptation of Bayesian inference which was incorporated into the famous early expert system shell EMYCIN.

In Certainty Theory rules have the following structure:

If then with certainty_factor CF

where CF ranges from -1 to +1 such that:

C(H)=1 corresponds to P(H)=1 (true)
C(H)=0 corresponds to P(H) is at its a prior value (unknown)
C(H)=-1 corresponds to P(H)=0 (false)

Example Four: Certainty Theory Inference

uncertainty_rule d_1
  if input1 is not high and input2 is not high then 
    output is high with certainty factor -0.9 .
uncertainty_rule d_2
  if input1 is not high and input2 is high then 
    output is high with certainty factor  0.1 .
uncertainty_rule d_3
  if input1 is high and input2 is not high then 
    output is high with certainty factor  0.1 .
uncertainty_rule d_4
  if input1 is high and input2 is high then
    output is high with certainty factor  0.9 .


Here is the full 3D graph:

Conclusion

We have seen how rule-based systems dealing with uncertainty result in a different inferential topology to standard production rule systems. Instead of step-wise inference we get either smooth slopes or curves.

There are also contrasts between the differing uncertainty handling paradigms. Both Fuzzy and Bayesian systems result in curves, whereas systems based on Certainty theory result in non-curved slopes.

As for which technique is appropriate in which cases, if uncertainty is involved then either Fuzzy, Bayesian or Certainty Theory will be more accurate. Also Fuzzy techniques are useful for cases where the uncertainty is semantic and either Bayesian or Certainty Theory approaches are appropriate for cases involving evidential uncertainty.

In the latter cases, Bayesian networks have the advantage of mathematical rigour and the corresponing Affirms/Denies weights can be derived from databases of previous cases if these are available. Bayesian techniques are also more appropriate in dynamic domains which contain sudden transitions. On the other hand, Certainty Theory has the advantage that the Certainty Factors are more easily estimated. In the end, pragmatic considerations such as these are the only guide to what works best in a particular domain.

Submitted: 09/11/2003

Article content copyright © Clive Spenser & Charles Langley, 2003.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -