Technology Trends and Their Providers

Technology Trends and Their Providers

Several technology trends are speeding the emergence of voice portals. Most significant is speech technology, which has grown at a breakneck pace over the last few years. Most analysts project a continuous growth rate of 31 percent each year from 1999 to 2004.

Automatic Speech Recognition (ASR) Software
Automatic speech recognition (ASR) is rapidly entering the mainstream. Early speech applications recognized only a small vocabulary of 20 to 30 words, but the accuracy and vocabulary size of ASR engines has dramatically improved, fueled by refined algorithms, dramatic increases in processing power, and lower costs. Today's speech systems support naturally spoken phrases and do not require prior training.
Major vendors of speech-recognition software include IBM*, Nuance*, Philips Electronics NV*, and SpeechWorks International*. In the United States, Nuance and SpeechWorks are popular choices for speech-recognition software with support for multiple languages.

Continuous Speech Processing
CSP technology from the Dialogic family of products, eliminates the need for dedicated speech hardware. CSP optimizes the performance of host-based speech recognition engines by streaming preprocessed voice data between the telephony boards (analog, T1, E1) and the host computer's central processing unit (CPU). Because CSP supports hardware from the Dialogic family of products, it is used for front-end data processing; the host system is better used for speech recognition.

Systems built with CSP provide higher-density capability by offloading CPU–intensive functions to the digital signal processing (DSP) speech detection modules—including high-quality echo cancellation, voice activity detection (VAD), and a pre-speech buffering. This frees the host processor from wasteful continuous processing of irrelevant data like silence.
Supporting up to 120 ports per board, CSP software features a unified application programming interface (API) for enhanced system scalability. Developers can add hundreds of speech-enabled ports and still effectively deliver high-quality speech recognition—plus tremendous savings in infrastructure and deployment costs.

Text-to-Speech
Once information is accessed, it needs to be communicated to the user. One way to do this is with TTS. Increasingly used to speak e-mail and Web-based text to callers, TTS will play an even wider role in the future. Real-world applications like e-mail can be read over the phone by preprocessors that handle so-called "dirty" data like acronyms, contractions, and differences in intonation. Lernout & Hauspie* is a principal TTS vendor with support for multiple languages.

VoiceXML
Just as growth of the Web was catalyzed by the development of the HTML scripting standard, the acceptance of a universal standard for voice-based services is propelling the growth of voice-based services.
Voice eXtensible Markup Language (VoiceXML) is the major standard for voice-based services. It will allow providers to open up Web services to customers using voice interfaces. It will handle synthesized speech for TTS recognition of spoken input, recognition of dual-tone multifrequency (DTMF), recording of spoken input, and telephony call control. Enterprises can build automated voice services using the same technology they use to create visual Web sites, significantly reducing the cost of construction and delivery of new capabilities to telephone customers. Because established Web technologies are used, the integration with back-end databases can be shared with the HTML application.

VoiceXML, which began at AT&T Bell Laboratories*, brings together Lucent* and AT&T Markup* languages with IBM's SpeechML* and Motorola's VoxML™. Most major players in the development of speech-based players are members of the VoiceXML Forum.

New Testing Tools
The success of speech-based applications depends on factors like the phrasing of voice prompts, as well as on other behavioral factors. That makes it important to be able to easily encapsulate lessons learned into new applications.

Speech technology providers have created powerful tools to speed up deployment. One high-level applet, for example, contains much of the knowledge gained from an application's dialog design and implementation of frequently used caller interactions. This can reduce the time it takes to build a new application from 30 person-years to months or even weeks.

* Other names and brands may be claimed as the property of others.

[Back ][Contents ][Forward ]