At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search

SAPI 5.0 Tutorial IV: Inline Dictation and Advanced Grammar Modifiers

This is the fourth installment of the SAPI5 Tutorial set, and will cover more on dynamic grammar rules as well as a few of the more advanced grammar modifiers: ?, ..., *, + and, -. Unlike the last tutorial, this will have some downloadable code!

More on Dynamic Grammar...

In the last tutorial we looked at creating rules from scratch using the AddWordTransition and CreateNewState functions. While this is good for some situations, it can be awkward for others. For example, if we knew that most of the rule was constant apart from one small segment, it would be handy to be able to dynamically create that one rule part. In our example program, we use this to "name" the computer. For example, we have a XML rule that looks like this:
    <RULE ID="VID_GetAttention" TOPLEVEL="ACTIVE">
            <P><L>
                <p>hey ?there</p>
                <p>...</p>
            </L></P>

            <RULEREF NAME="VID_ComputerName"/>

            <O><L>
            <p>listen ?up</p>
        </L></O>	
    </RULE>

    <RULE NAME="VID_ComputerName" DYNAMIC="TRUE">
        <P>placeholder</P>
    </RULE>
Nevermind about the question mark and the ellipsis for now, we'll cover those soon. Notice the VID_ComputerName rule. It is is just one word, marked "placeholder". Notice that the VID_GetAttention references VID_ComputerName. Now, if we were to change VID_ComputerName the VID_GetAttention would update automatically. The code required to change the name is very similar to the code in Tutorial 3:
void CSapiTutorial03Dlg::NameToGrammar(CString name)
{
    SPSTATEHANDLE	hDynamicRuleHandle;
	
    g_cpCmdGrammar->GetRule(L"VID_ComputerName", NULL, SPRAF_Dynamic, 
                            FALSE, &hDynamicRuleHandle);
    g_cpCmdGrammar->ClearRule(hDynamicRuleHandle);
    g_cpCmdGrammar->Commit(0);

    g_cpCmdGrammar->AddWordTransition(hDynamicRuleHandle, NULL, CSpDynamicString(name),
                                        L" -.", SPWT_LEXICAL, 1.0, NULL);
    g_cpCmdGrammar->Commit(0);
    g_cpCmdGrammar->SetRuleIdState(VID_ComputerName, SPRS_ACTIVE ); 

    m_pszInformation = "Computer renamed to " + name;

    UpdateData(FALSE);
}
Basically, the rule is loaded, cleared (removing 'placeholder' as the name), and then reactivated. So, essentially we are creating a rule from scratch, but since the rule is not "toplevel active" we are essentially modifying "VID_GetAttention".

Now, here is the interesting question: how do we rename the computer? Using NameToGrammar, you could simply pass a name and it would be loaded, but that defeats our speech interface!

Inline Dictation

You may specify a certain part of the grammar will be dictated to the engine. I call this inline dictation, since it isn't dictation in so much as you are reading a passage of text, or a letter. Instead, inline dictation normally consists of only a few words. Inline dictation is easy to specify in XML:
    <RULE ID="VID_RenameComputer" TOPLEVEL="ACTIVE">
        <P PROPNAME="Rename Computer to">Rename computer to *+</P>
    </RULE>
The * denotes a word will be dictated. The *+ denotes that more than one word will be dictated. You might want to call your computer. I chose to use multiple dictation since you might want to name your computer something like "My Computer". Note that the phrase has a PROPNAME. This causes the SR engine to create a properties structure that will contain the speech that is spoken (which we will need to read).

When this rule is triggered, we will want to read the text and extract the name. Here something similar to the code used in the example program:

void CSapiTutorial03Dlg::SetComputerName(ISpPhrase *pPhrase)
{
    SPPHRASE *pElements;

    WCHAR *wszCoMemNameText = NULL;
                
    pPhrase->GetPhrase(&pElements);  

    // This is a little tacky, since we use prior knowledge of the grammar rule
    // to ascertain when the name should be. Note the 3 below? We know the spoken
    // phrase must consist of "Rename computer to" and nothing else. Therefore, the
    // name would be the fourth word spoken (and beyond).

    if (pElements->Rule.ulCountOfElements < 3) return;

    if (SUCCEEDED( pPhrase->GetText( 3, pElements->Rule.ulCountOfElements - 3, FALSE,
                   &wszCoMemNameText, NULL))) {
        if (wszCoMemNameText) {
            m_pszComputerName = wszCoMemNameText;
            NameToGrammar(m_pszComputerName);
				
            CoTaskMemFree(wszCoMemNameText);
        }
    }
}
As the comment notes, this code uses prior knowledge of the grammar rule. Therefore, if this rule used any sort of optional words they would have to be detected. Of course, another easier way around it to make sure that the word prior to the dictation is constant.

Anyway, after the number of elements have been checked, the words are retrieved. GetText takes the start word, number of words to read, and the string to copy the words to (note these are all wide character strings). After checking the the word string is valid, we copy it to our CString and call NameToGrammar which inserts the name into our grammar rule. We must remember to free up the memory GetText allocated by calling CoTaskMemFree.

A lot easier than you might imagine, isn't it?! The dictation generally provides good results, although I tried "Arnold Schwarzenegger" and managed to rename my computer to "unload squash and anger"!

Advanced Grammar Rules

* and *+ are only two of a couple of more advanced grammar modifiers that can be used. Here are some more:

... - The Wildcard

The ellipsis (...) signifies that the words that are muttered don't really matter. For example, in our VID_GetAttention rule the user can say "Hey there computer", or perhaps "Hey, what's up computer". The ellipsis helps expand the variations on a particular phrase. Unlike the dictation modifier, the words are not stored in any SAPI structure returned by the SR engine.

? - Optional

The optional modifier is an excellent to specify a word as optional. For example, instead of:
<P>Hey <O>there</O> computer</P>
You can specify:
<P>Hey ?there computer</P>
Use the optional modifier (?) for single word modifications, and the <OPTIONAL> for multiple optional words or other structures.

+/- - Confidence Modifiers

The + and - modifiers increase the engine confidence required to recognize the word. For example, the quit command in the example program uses the following rule:
    <RULE ID="VID_Quit" TOPLEVEL="ACTIVE">
        <P><L><p>+quit</p><p>+exit</p></l></p>
        <O>-program</o>
    </RULE>
This makes sure that "Quit" or "Exit" are definitely recognized, but "program" is both optional and doesn't require a very high confidence.

SapiTutorial03

The example program demonstrates the dynamic grammar covered here, as well as all of the grammar modifiers. It is also a nice little program in it's own right. It allows you to run programs by saying a certain command. All the rules are specified in rules.dat. The format of the file is simple:
Line 1: Commmand to say
Line 2: Program to call
Line 3: Any parameters to pass to the program
You can specify up to 256 of these commands. The program comes with a few, but most of them will be tailored to my computer (path names and programs), but you get the idea. Here is a screenshot:

For people interested in taking this further, it'd be nice to have multiple command support. Say "Play DVD" and the computer will run the DVD player and turn off the lights.

Submitted: 29/04/2001

Article content copyright © James Matthews, 2001.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -