Async Professional provides speech synthesis, speech recognition and voice telephony capabilities. These aspects will be explored in further in the following sections.
Speech synthesis will take plain ASCII text and convert it to into digital audio. There are three primary ways this is done as described in the following table.
In Async Professional, generic speech synthesis is provided through the TApdSapiEngine component. The generated speech can either go directly to the sound card or be saved as a .WAV file for future usage.
While great strides have been made in the quality of computer-synthesized speech, it is still easily recognizable as computer generated. As of yet, a computer cannot duplicate the inflection, timing and emotion of human speech.
The speech synthesis engine will do it’s best to pronounce the text given to it. However, some words and names it will have problems with. The best way to handle this is to provide the proper phonemes to the speech synthesis engine.
Some speech synthesis engines support input using the International Phonetic Alphabet (IPA). This is a standardized system of representing phonemes in a Unicode character set. If your engine supports IPA, this can be used to precisely control the pronunciation of a word. You can verify if an engine supports IPA by checking if tfIPAUnicode is in the features set of the engine. This can be found in the Features subproperty of the SSVoices property of the TApdSapiEngine component.
Speech recognition converts a spoken audio data stream into ASCII text. In Async Professional, generic speech recognition is provided through the TApdSapiEngine component.
Speech recognition requires an object known as a grammar (a list of known words and possibly how they relate to each other). In Async Professional, the grammar is handled automatically. Either a list of expected words can be provided in the WordList property, or the Dictation property can be set to True. In the latter case, the speech synthesis engine will be provided with a dictation grammar. Most speech recognition engines will then use a much larger list of known words. Additional words can be specified via the WordList property.
Using a specific word list is generally faster and more accurate than using the full dictation grammar.
As words and phrases are recognized the OnPhraseHypothesis and OnPhraseFinish events will fire to let the application know what words were spoken.
The Microsoft SAPI SDK documentation provides a full list of supported phonemes.
Voice telephony support is provided via the TApdSapiPhone component. This component allows for speech synthesis and recognition to take place over a voice call. The component is a descendent of the TApdCustomTAPIDevice and provides all the functionality of a TAPI based voice call with the capabilities of the TApdCustomSapiEngine.
The voice telephony component has several methods for asking the user for information and then returning that information in a usable format. For example, the AskForDate method will prompt the user for a date and then return that date as a TDateTime parameter. Other methods exist for asking for phone numbers, times, items from a list, extensions and yes or no replies.
This document maintained by the Async Professional Project.