At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search

SAPI 5.0 Tutorial III: Dynamic Grammar

This is the third installment of our SAPI5 tutorial series. This will be another relatively short tutorial looking at how to dynamically add grammar to your programs. The example program I tested it with went through my MP3 directories and created a list of all artist and album names. As you can see, all music is organized by artist then by album, so it was case of recursively going through the directories.

I then wanted to be able to say "play alien love secrets" and Winamp would load up and play "Alien Love Secrets". On the other hand, I wanted to be to say "play steve vai" and Winamp would load all my Steve Vai albums and play them.

So how to do this?

Dynamically Adding Rules

This function below is the only new code you need to add rules!
void CIntelliEnvironmentDlg::AddDynamicRules() 
{
    m_iNumDirs = 0;
    AddMusicDirectories("C:\\My Music");
This is obviously specific to the example, but for your reference, AddMusicDirectories creates a list of the directories in a structure called m_sMDInfo, which contains two fields a pszName field (the name of the artist or album) and a pszFullDir field (the full directory).
    SPSTATEHANDLE	hDynamicRuleHandle;
	
    g_cpCmdGrammar->GetRule(NULL, DYN_TTSVOICERULE, 
                            SPRAF_TopLevel | SPRAF_Active | SPRAF_Dynamic, 
                            TRUE, &hDynamicRuleHandle);

    g_cpCmdGrammar->ClearRule(hDynamicRuleHandle);

    // Commit the changes
    g_cpCmdGrammar->Commit(0);
The call to GetRule is a little misleading, since we're not actually getting a rule - the TRUE in parameter 4 tell GetRule to create a new rule is DYN_TTSVOICERULE is not found. DYN_TTSVOICERULE was defined by me (assigned the arbitrary value 1001). The call to ClearRule is just precautionary, in case you'd actually set up DYN_TTSVOICERULE prior to calling AddDynamicRules. The Commit function commits the changes to the SR engine - we can now start making changes.
    SPSTATEHANDLE hPlayState;
    g_cpCmdGrammar->CreateNewState(hDynamicRuleHandle, &hPlayState);
    g_cpCmdGrammar->AddWordTransition(hDynamicRuleHandle, hPlayState, L"play", 
                                      L" ", SPWT_LEXICAL, 1, NULL);
Instead of adding a rule that said "play ", I created a new state and added a transition. This meant that I could (and have done, they're just omitted here for brevity) additional words such as "load" that the SR engine would understand.

Now, the CreateNewState simply takes a handle to a state handle in the same grammar rule, so we pass it hDynamicRuleHandle. Next, AddWordTransition is the function we're most interested in since it adds new words to the grammar rule. AddWordTransition parameters are as following: the 'from' state, to 'to' state, the word(s), the separators, word type, the weight and a property information structure.

Now, you can see that we add the word "play" as a transition from the beginning of the grammar rule (hDynamicRuleHandle) to the next state (hPlayState). The rest of the arguments are trivial for the moment. Now we must add the music directories:

    for (int i = 0; i < m_iNumDirs; i++ ) {
        CSpDynamicString ds(m_sMDInfo[i].pszName);
		
        SPPROPERTYINFO prop;
        prop.pszName = L"Id";
        prop.pszValue = L"Property";
        prop.vValue.vt = VT_I4;
        prop.vValue.ulVal = i;

        g_cpCmdGrammar->AddWordTransition(hPlayState, NULL, ds, L" -.",
                                          SPWT_LEXICAL, 1.0, &prop);
    }
Here we loop through the directories and add them one by one to the grammar. We use the CSpDynamicString (this is a SAPI feature) since SAPI requires a WCHAR* format, and the data is stored as char*. We create a property structure for each item too, so we can identify it at in our recognition function. The "ID" and "Property" tags are meaningless for us, it is the vValue.ulVal that is important - it specifies the index in the m_sMDInfo structure. Again, how you choose to do this in your application is up to you.

The important thing is the call to AddWordTransition. You can see that we start at hPlayState and end the grammar rule (with a NULL). The separators here are important, since the directories my include additional separators ("Static-X", for example).

    g_cpCmdGrammar->Commit(0);
	
    g_cpCmdGrammar->SetRuleIdState( DYN_TTSVOICERULE, SPRS_ACTIVE ); 
}
Finally, we commit the changes again and activate the rule!

Wrapping Up

For my recognition function, all I had to then do is the following:
case DYN_TTSVOICERULE:
{
    char temp[_MAX_PATH];
    strcpy(temp, "\"");
    strcat(temp, m_sMDInfo[pElements->pProperties->vValue.ulVal].pszFullDir);
    strcat(temp, "\"");

    ::ShellExecute(this->m_hWnd, "open", "c:\\program files\\winamp\\winamp.exe",
                   temp, NULL, SW_SHOW);
} break;
As you can see, most of the code involved putting double quotation marks around the directory name. Below are two screenshots, the left is me saying "play alien love secrets", the right "play steve vai".

I found that the SR engine sometime had difficulty with strange pronounciations of band names. For example, Korn (corn), Linkin Park (Lincoln Park), Californication and I gave up with "Al DiMeola, John McLaughlin and Paco De Lucia"!

I've yet to figure out how to successfully get the engine to recognize numbers well. For example, "Matchbox 20" is said "Matchbox twenty" but "Blink 182" is said "Blink one eight two". The SR engine though did a good job overall, recognizing things like "play Play" (as in the Moby album).

IntelliMusic

12/01/02: Released some scrap code I'd written based on this article. Check out IntelliMusic here.

Submitted: 16/04/2001

Article content copyright © James Matthews, 2001.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -