At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search

Visemes: Representing Mouth Positions

SAPI provides the programmer with a very powerful feature - viseme notification. A viseme refers to the mouth position currently being "used" by the speaker. SAPI 5 uses the Disney 13 visemes:

typedef enum SPVISEMES
{
                        // English examples
                        //------------------
    SP_VISEME_0 = 0,    // silence
    SP_VISEME_1,        // ae, ax, ah
    SP_VISEME_2,        // aa
    SP_VISEME_3,        // ao
    SP_VISEME_4,        // ey, eh, uh
    SP_VISEME_5,        // er
    SP_VISEME_6,        // y, iy, ih, ix
    SP_VISEME_7,        // w, uw
    SP_VISEME_8,        // ow
    SP_VISEME_9,        // aw
    SP_VISEME_10,       // oy
    SP_VISEME_11,       // ay
    SP_VISEME_12,       // h
    SP_VISEME_13,       // r
    SP_VISEME_14,       // l
    SP_VISEME_15,       // s, z
    SP_VISEME_16,       // sh, ch, jh, zh
    SP_VISEME_17,       // th, dh
    SP_VISEME_18,       // f, v
    SP_VISEME_19,       // d, t, n
    SP_VISEME_20,       // k, g, ng
    SP_VISEME_21,       // p, b, m
} SPVISEMES;
Everytime a viseme is used, the SAPI5 engine can send your application a notification which it can use draw the mouth position. Microsoft provides an excellent example of this with its SAPI5 SDK, called TTSApp. TTSApp is written using the standard Win32 SDK and has additional features that bog down the code. Therefore, I created my own version using MFC that is hopefully a little easier to understand.

Using visemes is relatively simple, it is the graphical side of it that is the hard part. This is why, for demonstration purposes, I used the microphone character that was used in the TTSApp.

The Code

I'll cover the viseme code first and only touch on the graphical side. Firstly, lets look at the initialization code since it is slightly different to the initialization code normally used:
   HRESULT hRes;
	
   hRes = g_cpVoice.CoCreateInstance(CLSID_SpVoice);
	
   if (FAILED(hRes)) {
      TRACE0("Error creating voice.\n");
      return FALSE;
   }
	
   hRes = g_cpVoice->SetInterest(SPFEI(SPEI_VISEME), SPFEI(SPEI_VISEME));		

   if (FAILED(hRes)) {
      TRACE0("Error creating interest...seriously.\n");
      return FALSE;
   }
	
   hRes = g_cpVoice->SetNotifyWindowMessage(m_hWnd, WM_RECOEVENT, 0, 0);
	
   if (FAILED(hRes)) {
      TRACE0("Error setting notification window.\n");
      return FALSE;
   }
Notice that no grammar nor recognition context is created here. The voice can have SetInterest and SetNotifyWindowMessage called for it. Also note that we are using SPEI_VISEME within SetInterest as opposed to SPEI_RECOGNITION. Something interesting to note is the creation of the voice - notice how g_cpVoice is treated as an object not as a pointer when calling CoCreateInstance. Not doing so will generate a compile error.

Beyond this, the only thing to do is capture the SPEI_VISEME message. I did this in the OnRecoEvent handler:

LRESULT CSapiSpeakDlg::OnRecoEvent(WPARAM wParam, LPARAM lParam) 
{
   CSpEvent event;  
	
   while (event.GetFrom(g_cpVoice) == S_OK) {
      switch (event.eEventId) {
         case SPEI_VISEME:
            m_iVisemeBmp = g_iMapVisemeToImage[event.Viseme()];
            InvalidateRect(m_cCharRect, false);
            break;
      }
   }

   return 0;
}
Notice here, all that is required to get the viseme is event.Viseme(). In SapiSpeak (the example program), the viseme bitmaps are stored in a look-up array since some bitmaps represent more than one of the visemes (13 bitmaps mapped onto 22 visemes).

Graphical Basics

As mentioned, I will just touch on the graphical side of it. Microsoft achieves the talking character by using a 128x128 imagelist and using overlay images to change the mouth and eye states. Here are three of the images used (15 in total):

The TTS example uses the SDK ImageList_* functions and HBITMAP and HIMAGELIST pointers. In my MFC-based program I use CImageList and CBitmap to setup the imagelist:
   CBitmap bmp;

   m_cCharList.Create(CHARACTER_WIDTH, CHARACTER_HEIGHT, ILC_COLOR32 | ILC_MASK, 1, 0);

   bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICFULL));
   m_cCharList.Add(&bmp, RGB(255,0,255));
   bmp.Detach();
	
   bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH2));
   m_cCharList.Add(&bmp, RGB(255,0,255));
   bmp.Detach();
	
   // Other 13 omitted...

   m_cCharList.SetOverlayImage(1, 1);
   m_cCharList.SetOverlayImage(2, 2);

   // Other 13 omitted...
Remember that the OnRecoEvent handles the viseme bitmap selection and invalidates the character rectangle. Here is the OnPaint() handler:
   CPaintDC dc(this);
   CDialog::OnPaint();

   // Draw into memory DC
   m_cCharList.Draw(&dc, 0, m_cCharRect.TopLeft(), INDEXTOOVERLAYMASK(m_iVisemeBmp)); 
		
   if (m_iVisemeBmp % 6 == 2) {
      m_cCharList.Draw(&dc, WEYESNAR, m_cCharRect.TopLeft(), 0 ); 
   } else if (m_iVisemeBmp % 6 == 5) {
      m_cCharList.Draw(&dc, WEYESCLO, m_cCharRect.TopLeft(), 0 ); 
   }
This is relatively simple. The complete mic character is drawn with the necessary viseme masked onto it. The WEYESNAR and WEYESCLO and images of the eye's narrowing and closed. This is just to make the character look more realistic by blinking or squinting while talking.

Conclusion

Visemes are a powerful feature and have quite a few applications. The applications in gaming are obvious - once TTS becomes realistic enough to use in games, a game engine could texture 3D characters with the appropriate mouth position as they spoke!

Submitted: 28/10/2001

Article content copyright © James Matthews, 2001.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -