LanScape Support Forum: TTS and ASR code samples

		Active Topics Member List Search Help Register Login

LanScape VOIP Media Engine™ - Technical Support
LanScape Support Forum -> LanScape VOIP Media Engine™ - Technical Support

Topic: TTS and ASR code samples

Author

Message

<< Prev Topic | Next Topic >>

sabdullah
Intermediate

Joined: May 08 2007
Posts: 6

Posted: May 08 2007 at 11:00pm \| IP Logged

Couple of questions...

1. We would be extremely interested in obtaining a code sample of a tts and asr code sample. We want to implement this in our speech server that we want to develop. Any sample apps implementing this would really be great.

2. Would this all be possible in the .net managed code version waiting to be released?

3. Where can I find more in depth information about how your audio data samples work? Is the sample data same as the ones represented by the microsoft mmsystem.h. Basically I would like a reference to a good programming book relating to working with sound in windows.

support
Administrator

Joined: January 26 2005
Location: United States
Posts: 1666

Posted: May 09 2007 at 10:22am \| IP Logged

Hello sabdullah,

Thanks for posting your questions to this support forum.

TTS and ASR software examples:
At the moment there are no specific TTS and ASR samples. A good start would be to use the “IVR server” sample example application. It’s coded in C++ and shows what has to be done to obtain sample block data from any phone line when a call is active. It also shows how to stream sample block data to any phone line when a call is active.

Taking this sample and adding ASR and TTS would be simple. The sample block data that is received from the phone lines can usually be sent directly to the ASR engine you select. Also, your TTS generated sample block prompts can easily be sent to any of the phone lines.

Note:
For any developer not familiar with TTS and ASR, TTS means “text to speech” processing. Generally you feed a TTS “engine” man readable ASCII text and it generates the audio speech data output. Sample audio data blocks generated from a TTS engine are usually PCM data. TTS technology and VOIP applications are generally used together to build automated IVR servers and automated call attendant applications. Click this link to learn more about TTS

ASR stands for automatic speech recognition. An ASR “engine” can get its speech input directly from the host computer’s multimedia hardware (usually a microphone input) or from the application in the form of sample block data (usually sample blocks of PCM voice data). Speech recognition is also a great fit with VOIP. ASR and VOIP are generally used together to develop sophisticated automated call processing telephony applications. Click this link to learn more about ASR

Would this all be possible in the .net version?
Yes.

More Info about audio data samples:
The VOIP Media Engine uses “blocks” of sampled audio data in various sample rates and formats. Generally sample blocks each contain 20Ms of sampled audio data. As far as recommending a good book for working with sound on Windows – no good title comes to mind. We think the best place to go for information is the multimedia support in the latest Microsoft platform SDK. If you understand the basics of the sound multimedia API, working with our VOIP Media Engine will be a breeze.

If you plan on developing your application using our voip media engine, you can post your specific development questions to this support forum. We will answer to the best of our ability.

Support

support
Administrator

Joined: January 26 2005
Location: United States
Posts: 1666

Posted: May 09 2007 at 10:26am \| IP Logged

One more thing,

We have made a note of your specific request. We will try to add a sample like this in a future release.

Thanks for your request!

Support

sabdullah
Intermediate

Joined: May 08 2007
Posts: 6

Posted: May 10 2007 at 1:37am \| IP Logged

I was looking at the following code in the microsoft SAPI example "ttsapp" provided with the speech sdk. I am trying to figure out how to plug the wave buffer you guys allow us to write to with the GetOutputStream function provided by the tts engine microsoft has. I'm a pretty sharp coder but not familiar enough on audio formats to know if your stream is compatible with the ttsapp. Don't know if my terminology is correct but I want the getoutputstream to work with your following function.

Code:

TransmitInCallIvrData(pDlg->hIvrTransmit[PhoneLine],pWaveBuffer);

Basically are the two buffers really providing the same "type" of data.

case IDC_SAVETOWAV:

              {

                   USES_CONVERSION;
 
                   TCHAR   szFileName[256];

                   _tcscpy(szFileName ,  _T("\0"));

                   BOOL   bFileOpened = CallSaveFileDialog( szFileName,

                              _T("WAV   (*.wav)\0*.wav\0All Files (*.*)\0*.*\0") );

                   if   (bFileOpened == FALSE) break;

                   wcscpy(   m_szWFileName, T2W(szFileName) );

                   CSpStreamFormat   OriginalFmt;

                   hr   = m_cpVoice->GetOutputStream( &cpOldStream );

                   if   (hr == S_OK)

                   {

                         hr  = OriginalFmt.AssignFormat(cpOldStream);

                   }
 
                    else
 
                    {

                         hr  = E_FAIL;

                   }
 
                    //   User SAPI helper function in sphelper.h to create a wav file

                   if   (SUCCEEDED(hr))

                   {

                         hr  = SPBindToFile( m_szWFileName, SPFM_CREATE_ALWAYS, &cpWavStream, &OriginalFmt.FormatId(), OriginalFmt.WaveFormatExPtr() ); 

                   }
 
                    if  ( SUCCEEDED( hr ) )

                   {

                         //  Set the voice's output to the wav file instead of the speakers

                         hr  = m_cpVoice->SetOutput(cpWavStream, TRUE);

                   }
 
                if ( SUCCEEDED( hr ) )

                {

                        //  Do the Speak

                       Hand le Speak();

                }

                   //   Set output back to original stream

                   //   Wait until the speak is finished if saving to a wav file so that

                   //   the smart pointer cpWavStream doesn't get released before its

                   //   finished writing to the wav.

                   m_cpVoice->Wait UntilDone(  INFINITE );

                   cpWavStream.Releas e();

                 // Reset output

                   m_cpVoice->SetO utput(  cpOldStream, FALSE );

support
Administrator

Joined: January 26 2005
Location: United States
Posts: 1666

Posted: May 10 2007 at 7:27pm \| IP Logged

Hi sabdullah,

We have one of our team members looking into this. We may be able to send you a sample app in C++ that shows you how to do this.

Please wait a bit while we get this together.

Support

support
Administrator

Joined: January 26 2005
Location: United States
Posts: 1666

Posted: May 14 2007 at 9:01am \| IP Logged

We have made available a sample SAPI TTS sample application that may benefit you. Please see the following post:

VOIP and SAPI TTS speech sample now available:
http://www.lanscapecorp.com/forum/forum_posts.asp?TID=327&KW =tts

Support

If you wish to post a reply to this topic you must first login
If you are not already registered you must first register

Printable version

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot delete your posts in this forum
You cannot edit your posts in this forum
You cannot create polls in this forum
You cannot vote in polls in this forum