At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search

SAPI 5.0 Tutorial V: Voice Control

This is the fifth (and final) installment of the SAPI5 Tutorial set looks at controlling SAPI5 voices. This includes retrieving a list of voices, setting the rates and volumes as well as displaying their user interfaces (microphone and training in most cases).

Retrieving Voices

Here is the function used to collect the voices installed on the system. The reason the function looks rather complicated is that it determines what voice is the current voice as well as caching the object token values (like a unique identifier for each SAPI object) and names for all voices:

void CSapiTutorial05Dlg::PopulateVoices()
{
    USES_CONVERSION;
    ASSERT(m_cpVoice);

    m_cVoiceList.DeleteAllItems();

    // Code based on the coffee6 example.

    CComPtr<IEnumSpObjectTokens>    cpEnum;
    CSpDynamicString*                szDescription;
    ISpObjectToken                  *pToken = NULL;

    WCHAR *pszCurTokenId = NULL;
    ULONG ulIndex = 0, ulNumTokens = 0, ulCurToken = -1;

    // Get the current voice token.
    m_cpVoice->GetVoice(&pToken);
    pToken->GetId(&pszCurTokenId);
USES_CONVERSION is a macro used to include COM string function such as T2W used at the bottom of the function. After all the variable declarations, we retrieve the system voice token.
    HRESULT hResult = SpEnumTokens(SPCAT_VOICES, NULL, NULL, &cpEnum);

    if (hResult == S_OK) {
        hResult = cpEnum->GetCount( &ulNumTokens );
        
        if (SUCCEEDED(hResult) && ulNumTokens != 0) {
            szDescription = new CSpDynamicString [ulNumTokens];
            m_ppszTokenIds = new WCHAR* [ulNumTokens];

            ZeroMemory(m_ppszTokenIds, ulNumTokens * sizeof(WCHAR *));

            while (cpEnum->Next(1, &pToken, NULL) == S_OK) {
                hResult = SpGetDescription(pToken, &szDescription[ulIndex]);
                _ASSERTE(SUCCEEDED(hResult));

                hResult = pToken->GetId(&m_ppszTokenIds[ulIndex]);
                _ASSERTE(SUCCEEDED(hResult));

                if (_wcsicmp(pszCurTokenId, m_ppszTokenIds[ulIndex]) == 0) {
                    ulCurToken = ulIndex;
                }

                ulIndex++;                
                pToken->Release();
                pToken = NULL;
            }                   
        }
    }
This does all the main work, initializing the arrays and retrieves both the token ID and name. SpEnumTokens is a function in <sphelper.h> that enumerates the voices into one array. The _wcsicmp compares the current voice's token with the last token stored in the token array! If true, we have our current voice.
    UINT iIndex;
    for (UINT i=0; i<ulNumTokens; i++) {
        iIndex = m_cVoiceList.InsertItem(i, CString(" ") + 
		         W2T(szDescription[i]), (i == ulCurToken) ? 0 : 1);

        m_cVoiceList.SetItemData(iIndex, i);
    }

    delete [] szDescription;
}
The final section of the function add the voices to the main list. The function uses a different icon for the currently active voice. The item data is set as the voice's index in the token array.

SetVoice

In our example program, we are caching the voice using their token IDs, since the voices themselves can be looked up quickly and efficiently without storing the object itself. <sphelper.h> also contains a very useful function to do just that:
  ISpObjectToken *pToken = NULL;

  HRESULT hResult = SpGetTokenFromId(m_ppszTokenIds[vi], &pToken, FALSE);
  m_cpVoice->SetVoice(pToken);

SetRate, SetVolume

These two function do exactly what they say. SetRate merely takes a value between -10 and 10 whereas SetVolume takes a value between 0 and 100.

Calling SAPI UIs

Calling the SAPI user interfaces (UIs) is equally simple. DisplayUI is called to display a given SAPI UI. Different SAPI objects support different UIs, similarly different SAPI implementations may or may not support the various SAPI UIs. Therefore, it is considered good coding practice to call IsUISupported first. Here is a snippet from our sample program that displays the training dialog:
void CSapiTutorial05Dlg::OnTraining() 
{
    BOOL bSupported;
    USES_CONVERSION; 

    m_cpEngine->IsUISupported(SPDUI_UserTraining, NULL, 0, &bSupported);

    if (bSupported) {
  	  m_cpEngine->DisplayUI(this->m_hWnd, 
                                T2W("Generation5"), 
                                SPDUI_UserTraining, NULL, NULL);
    }
}
DisplayUI takes 5 parameters: a handle to the parent, the title (this is often ignored), most importantly the type of UI, and two extra data parameters which, for our purposes, are often NULL. This method works the same with the other types of UI, simply replace the necessary SPDUI_*.

SPDUI_AddRemoveWordAdd/Remove a word dialog
SPDUI_UserTrainingDisplay the user-training dialog (used above).
SPDUI_MicTrainingSet the microphone up.
SPDUI_RecoProfilePropertiesDisplay the user profile properties dialog.
SPDUI_AudioPropertiesAudio properties.
SPDUI_AudioVolumeAudio volume.
SPDUI_EnginePropertiesThe engine properties dialog.

Download

Conclusion

Well, this ends the SAPI5 tutorial series. Hopefully, later articles will look at more advanced aspects of SAPI, automation (SAPI5.1), and perhaps tying together SQL and SAPI. Remember to try the SapiWizard, an AppWizard for Visual C++ that sets up a SAPI MFC application as described in these articles.

Good luck speech developing!

Submitted: 17/12/2002

Article content copyright © James Matthews, 2002.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -