Technology Trends and Their Providers
Several technology trends are speeding the emergence of voice portals. Most
significant is speech technology, which has grown at a breakneck pace over the
last few years. Most analysts project a continuous growth rate of 31 percent
each year from 1999 to 2004.
Automatic Speech Recognition (ASR) Software
Automatic speech recognition (ASR) is rapidly entering the mainstream. Early
speech applications recognized only a small vocabulary of 20 to 30 words, but
the accuracy and vocabulary size of ASR engines has dramatically improved,
fueled by refined algorithms, dramatic increases in processing power, and lower
costs. Today's speech systems support naturally spoken phrases and do not
require prior training.
Major vendors of speech-recognition software include IBM*, Nuance*, Philips
Electronics NV*, and SpeechWorks International*. In the United States, Nuance
and SpeechWorks are popular choices for speech-recognition software with support
for multiple languages.
Continuous Speech Processing
CSP technology from the Dialogic family of products, eliminates the need for
dedicated speech hardware. CSP optimizes the performance of host-based speech
recognition engines by streaming preprocessed voice data between the telephony
boards (analog, T1, E1) and the host computer's central processing unit (CPU).
Because CSP supports hardware from the Dialogic family of products, it is used
for front-end data processing; the host system is better used for speech
recognition.
Systems built with CSP provide higher-density capability by offloading
CPUintensive functions to the digital signal processing (DSP) speech detection
modulesincluding high-quality echo cancellation, voice activity detection (VAD),
and a pre-speech buffering. This frees the host processor from wasteful
continuous processing of irrelevant data like silence.
Supporting up to 120 ports per board, CSP software features a unified
application programming interface (API) for enhanced system scalability.
Developers can add hundreds of speech-enabled ports and still effectively
deliver high-quality speech recognitionplus tremendous savings in
infrastructure and deployment costs.
Text-to-Speech
Once information is accessed, it needs to be communicated to the user. One way
to do this is with TTS. Increasingly used to speak e-mail and Web-based text to
callers, TTS will play an even wider role in the future. Real-world applications
like e-mail can be read over the phone by preprocessors that handle so-called
"dirty" data like acronyms, contractions, and differences in intonation. Lernout
& Hauspie* is a principal TTS vendor with support for multiple languages.
VoiceXML
Just as growth of the Web was catalyzed by the development of the HTML scripting
standard, the acceptance of a universal standard for voice-based services is
propelling the growth of voice-based services.
Voice eXtensible Markup Language (VoiceXML) is the major standard for
voice-based services. It will allow providers to open up Web services to
customers using voice interfaces. It will handle synthesized speech for TTS
recognition of spoken input, recognition of dual-tone multifrequency (DTMF),
recording of spoken input, and telephony call control. Enterprises can build
automated voice services using the same technology they use to create visual Web
sites, significantly reducing the cost of construction and delivery of new
capabilities to telephone customers. Because established Web technologies are
used, the integration with back-end databases can be shared with the HTML
application.
VoiceXML, which began at AT&T Bell Laboratories*, brings together Lucent* and
AT&T Markup* languages with IBM's SpeechML* and Motorola's VoxML. Most major
players in the development of speech-based players are members of the VoiceXML
Forum.
New Testing Tools
The success of speech-based applications depends on factors like the phrasing of
voice prompts, as well as on other behavioral factors. That makes it important
to be able to easily encapsulate lessons learned into new applications.
Speech technology providers have created powerful tools to speed up deployment.
One high-level applet, for example, contains much of the knowledge gained from
an application's dialog design and implementation of frequently used caller
interactions. This can reduce the time it takes to build a new application from
30 person-years to months or even weeks.
* Other names and brands may be claimed as the property of others.