At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Reviews » Software » Speech Technologies

Cepstral Text-To-Speech Voices

Cepstral (www.cepstral.com) have recently released their high-quality text-to-speech (TTS) voices. The voices have a fraction of the footprint that AT&T NaturalVoices do (roughly 30Mb for Cepstral's voices compared to around 500Mb for AT&T's), and can run on a variety of platforms.

Quality

The quality of the voices is good, but not quite as good as NaturalVoices. The voices sometimes sound choppy, and occassionally get pronounciation wrong. The system also doesn't seem to account for exclamation marks or questions, sometimes causing long streams of text to sound unnatural.

This isn't to say that Cepstral's voices aren't excellent - voice 'character' comes across well, there are definite distinctions between the US and UK English voices as well as sex and age group. Here are four examples of the voices I reviewed:

There seems to be quite a difference between voices at times, especially between certain words. For example, the word 'wrong' is pronounced perfectly by Emily and Lawrence, slightly strangely by Millie and Frank sounds like he's choking on something.

It is easy to point out problems or mis-pronounciations when looking for them, as one does when you're reviewing a speech product. Having said this though, if you load a text file and have your favourite Cepstral voice (Millie, in my case) read it out loud, everything seems to work very nicely. Mis-pronounciations are lost in the flow, choppy speech seems to smooth out over sentences, and generally large portions of text can be listened to without straining your ears!

SwiftTalker

All Cepstral voices come with a little utility called 'SwiftTalker'. This is a simple plain text editor, augmented for Cepstral voices:

The media buttons at the bottom allow for movement between sentences, and sentences are highlighted as they are read out. The application also allows for voice configuration - including the rate, pitch and volume as well as six special effects that can be applied to the voices.

I was a little disappointed to find these settings were only applicable to SwiftTalker and could not be modified from Windows' Speech Control Panel applet. This meant that any Cepstral voice used system-wide would use the default settings only.

Conclusion

While in terms of raw quality, Cepstral's voices do not quite match counterparts such as AT&T's NaturalVoices, their small footprint make them easily downloadable, and readily portable to mobile platforms. Furthermore, any imperfections are often diluted when large portions text are read out by the variety of distinctive voices.

Overall, the impressive technology powering the variety of voices should make realistic text-to-speech available to anyone on a tight budget.

No cover available 8.7
Price:$29.99 per voice
Liked:Small footprint, range of voices, decent quality, good price
Disliked:Quality inconsistent across voices, lacking options under Windows
Website:http://www.cepstral.com/

Submitted: 23/06/2004

Article content copyright © James Matthews, 2004.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- The Latest (03/04/2012)
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -