Async Professional

Home
Project Home

Manuals
Delveoper's Guide
Reference Guide
Activex

FAQ
Downloads

SAPI Overview

Async Professional provides speech synthesis, speech recognition and voice telephony capabilities. These aspects will be explored in further in the following sections.

Speech synthesis

Speech synthesis will take plain ASCII text and convert it to into digital audio. There are three primary ways this is done as described in the following table.

Scheme	Description
Concatenated Word	This technique stitches together recordings of individual words as provided by the developer of the speech synthesis engine. This method can provide high quality output, but is limited to the words that have been provided with the engine.
Subword Concatenation	This speech synthesis engine concatenates short prerecorded phonemes to spell out the words. The small pieces of audio are smoothed out to improve the quality of the spoken text.
Synthesis	The speech synthesis engine will simulate the human vocal chords when it generates the digital audio. This technique has the advantage of the ability to adjust the sound of the voice via a few simple parameters.

In Async Professional, generic speech synthesis is provided through the TApdSapiEngine component. The generated speech can either go directly to the sound card or be saved as a .WAV file for future usage.

While great strides have been made in the quality of computer-synthesized speech, it is still easily recognizable as computer generated. As of yet, a computer cannot duplicate the inflection, timing and emotion of human speech.

The speech synthesis engine will do it’s best to pronounce the text given to it. However, some words and names it will have problems with. The best way to handle this is to provide the proper phonemes to the speech synthesis engine.

Some speech synthesis engines support input using the International Phonetic Alphabet (IPA). This is a standardized system of representing phonemes in a Unicode character set. If your engine supports IPA, this can be used to precisely control the pronunciation of a word. You can verify if an engine supports IPA by checking if tfIPAUnicode is in the features set of the engine. This can be found in the Features subproperty of the SSVoices property of the TApdSapiEngine component.

Speech recognition

Speech recognition converts a spoken audio data stream into ASCII text. In Async Professional, generic speech recognition is provided through the TApdSapiEngine component.

Speech recognition requires an object known as a grammar (a list of known words and possibly how they relate to each other). In Async Professional, the grammar is handled automatically. Either a list of expected words can be provided in the WordList property, or the Dictation property can be set to True. In the latter case, the speech synthesis engine will be provided with a dictation grammar. Most speech recognition engines will then use a much larger list of known words. Additional words can be specified via the WordList property.

Using a specific word list is generally faster and more accurate than using the full dictation grammar.

As words and phrases are recognized the OnPhraseHypothesis and OnPhraseFinish events will fire to let the application know what words were spoken.

The Microsoft SAPI SDK documentation provides a full list of supported phonemes.

Voice telephony

Voice telephony support is provided via the TApdSapiPhone component. This component allows for speech synthesis and recognition to take place over a voice call. The component is a descendent of the TApdCustomTAPIDevice and provides all the functionality of a TAPI based voice call with the capabilities of the TApdCustomSapiEngine.

The voice telephony component has several methods for asking the user for information and then returning that information in a usable format. For example, the AskForDate method will prompt the user for a date and then return that date as a TDateTime parameter. Other methods exist for asking for phone numbers, times, items from a list, extensions and yes or no replies.

This document maintained by the Async Professional Project.