| ||||||||||||||
| ||||||||||||||
|
||||||||||||||
|
SAPI provides the programmer with a very powerful feature - viseme notification. A viseme refers to the mouth position currently being "used" by the speaker. SAPI 5 uses the Disney 13 visemes:
Everytime a viseme is used, the SAPI5 engine can send your application a notification which it can use draw the mouth position. Microsoft provides an excellent example of this with its SAPI5 SDK, called TTSApp. TTSApp is written using the standard Win32 SDK and has additional features that bog down the code. Therefore, I created my own version using MFC that is hopefully a little easier to understand.
The CodeI'll cover the viseme code first and only touch on the graphical side. Firstly, lets look at the initialization code since it is slightly different to the initialization code normally used:
HRESULT hRes;
hRes = g_cpVoice.CoCreateInstance(CLSID_SpVoice);
if (FAILED(hRes)) {
TRACE0("Error creating voice.\n");
return FALSE;
}
hRes = g_cpVoice->SetInterest(SPFEI(SPEI_VISEME), SPFEI(SPEI_VISEME));
if (FAILED(hRes)) {
TRACE0("Error creating interest...seriously.\n");
return FALSE;
}
hRes = g_cpVoice->SetNotifyWindowMessage(m_hWnd, WM_RECOEVENT, 0, 0);
if (FAILED(hRes)) {
TRACE0("Error setting notification window.\n");
return FALSE;
}
Notice that no grammar nor recognition context is created here. The voice can have SetInterest and SetNotifyWindowMessage called for it. Also note that we are using SPEI_VISEME within SetInterest as opposed to SPEI_RECOGNITION. Something interesting to note is the creation of the voice - notice how g_cpVoice is treated as an object not as a pointer when calling CoCreateInstance. Not doing so will generate a compile error.
Beyond this, the only thing to do is capture the SPEI_VISEME message. I did this in the OnRecoEvent handler:
LRESULT CSapiSpeakDlg::OnRecoEvent(WPARAM wParam, LPARAM lParam)
{
CSpEvent event;
while (event.GetFrom(g_cpVoice) == S_OK) {
switch (event.eEventId) {
case SPEI_VISEME:
m_iVisemeBmp = g_iMapVisemeToImage[event.Viseme()];
InvalidateRect(m_cCharRect, false);
break;
}
}
return 0;
}
Notice here, all that is required to get the viseme is event.Viseme(). In SapiSpeak (the example program), the viseme bitmaps are stored in a look-up array since some bitmaps represent more than one of the visemes (13 bitmaps mapped onto 22 visemes).
Graphical BasicsAs mentioned, I will just touch on the graphical side of it. Microsoft achieves the talking character by using a 128x128 imagelist and using overlay images to change the mouth and eye states. Here are three of the images used (15 in total):
CBitmap bmp; m_cCharList.Create(CHARACTER_WIDTH, CHARACTER_HEIGHT, ILC_COLOR32 | ILC_MASK, 1, 0); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICFULL)); m_cCharList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH2)); m_cCharList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); // Other 13 omitted... m_cCharList.SetOverlayImage(1, 1); m_cCharList.SetOverlayImage(2, 2); // Other 13 omitted...Remember that the OnRecoEvent handles the viseme bitmap selection and invalidates the character rectangle. Here is the OnPaint() handler:
CPaintDC dc(this);
CDialog::OnPaint();
// Draw into memory DC
m_cCharList.Draw(&dc, 0, m_cCharRect.TopLeft(), INDEXTOOVERLAYMASK(m_iVisemeBmp));
if (m_iVisemeBmp % 6 == 2) {
m_cCharList.Draw(&dc, WEYESNAR, m_cCharRect.TopLeft(), 0 );
} else if (m_iVisemeBmp % 6 == 5) {
m_cCharList.Draw(&dc, WEYESCLO, m_cCharRect.TopLeft(), 0 );
}
This is relatively simple. The complete mic character is drawn with the necessary viseme masked onto it. The WEYESNAR and WEYESCLO and images of the eye's narrowing and closed. This is just to make the character look more realistic by blinking or squinting while talking.
ConclusionVisemes are a powerful feature and have quite a few applications. The applications in gaming are obvious - once TTS becomes realistic enough to use in games, a game engine could texture 3D characters with the appropriate mouth position as they spoke!
Submitted: 28/10/2001 Article content copyright © James Matthews, 2001.
|
|
|||||||||||||
All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -