Author |
|
sabdullah Intermediate
Joined: May 08 2007 Posts: 6
|
Posted: May 08 2007 at 11:00pm | IP Logged
|
|
|
Couple of questions...
1. We would be extremely interested in obtaining a code sample of a tts and asr code sample. We want to implement this in our speech server that we want to develop. Any sample apps implementing this would really be great.
2. Would this all be possible in the .net managed code version waiting to be released?
3. Where can I find more in depth information about how your audio data samples work? Is the sample data same as the ones represented by the microsoft mmsystem.h. Basically I would like a reference to a good programming book relating to working with sound in windows.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: May 09 2007 at 10:22am | IP Logged
|
|
|
Hello sabdullah,
Thanks for posting your questions to this support forum.
TTS and ASR software examples:
At the moment there are no specific TTS and ASR samples. A good start would be to use the “IVR server” sample example application. It’s coded in C++ and shows what has to be done to obtain sample block data from any phone line when a call is active. It also shows how to stream sample block data to any phone line when a call is active.
Taking this sample and adding ASR and TTS would be simple. The sample block data that is received from the phone lines can usually be sent directly to the ASR engine you select. Also, your TTS generated sample block prompts can easily be sent to any of the phone lines.
Note:
For any developer not familiar with TTS and ASR, TTS means “text to speech” processing. Generally you feed a TTS “engine” man readable ASCII text and it generates the audio speech data output. Sample audio data blocks generated from a TTS engine are usually PCM data. TTS technology and VOIP applications are generally used together to build automated IVR servers and automated call attendant applications. Click this link to learn more about TTS
ASR stands for automatic speech recognition. An ASR “engine” can get its speech input directly from the host computer’s multimedia hardware (usually a microphone input) or from the application in the form of sample block data (usually sample blocks of PCM voice data). Speech recognition is also a great fit with VOIP. ASR and VOIP are generally used together to develop sophisticated automated call processing telephony applications. Click this link to learn more about ASR
Would this all be possible in the .net version?
Yes.
More Info about audio data samples:
The VOIP Media Engine uses “blocks” of sampled audio data in various sample rates and formats. Generally sample blocks each contain 20Ms of sampled audio data. As far as recommending a good book for working with sound on Windows – no good title comes to mind. We think the best place to go for information is the multimedia support in the latest Microsoft platform SDK. If you understand the basics of the sound multimedia API, working with our VOIP Media Engine will be a breeze.
If you plan on developing your application using our voip media engine, you can post your specific development questions to this support forum. We will answer to the best of our ability.
Support
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: May 09 2007 at 10:26am | IP Logged
|
|
|
One more thing,
We have made a note of your specific request. We will try to add a sample like this in a future release.
Thanks for your request!
Support
|
Back to Top |
|
|
sabdullah Intermediate
Joined: May 08 2007 Posts: 6
|
Posted: May 10 2007 at 1:37am | IP Logged
|
|
|
I was looking at the following code in the microsoft SAPI example "ttsapp" provided with the speech sdk. I am trying to figure out how to plug the wave buffer you guys allow us to write to with the GetOutputStream function provided by the tts engine microsoft has. I'm a pretty sharp coder but not familiar enough on audio formats to know if your stream is compatible with the ttsapp. Don't know if my terminology is correct but I want the getoutputstream to work with your following function.
Code:
TransmitInCallIvrData(pDlg->hIvrTransmit[PhoneLine],pWaveBuffer);
Basically are the two buffers really providing the same "type" of data.
case IDC_SAVETOWAV:
{
USES_CONVERSION;
TCHAR szFileName[256];
_tcscpy(szFileName , _T("\0"));
BOOL bFileOpened = CallSaveFileDialog( szFileName,
_T("WAV (*.wav)\0*.wav\0All Files (*.*)\0*.*\0") );
if (bFileOpened == FALSE) break;
wcscpy( m_szWFileName, T2W(szFileName) );
CSpStreamFormat OriginalFmt;
hr = m_cpVoice->GetOutputStream( &cpOldStream );
if (hr == S_OK)
{
hr = OriginalFmt.AssignFormat(cpOldStream);
}
else
{
hr = E_FAIL;
}
// User SAPI helper function in sphelper.h to create a wav file
if (SUCCEEDED(hr))
{
hr = SPBindToFile( m_szWFileName, SPFM_CREATE_ALWAYS, &cpWavStream, &OriginalFmt.FormatId(), OriginalFmt.WaveFormatExPtr() );
}
if ( SUCCEEDED( hr ) )
{
// Set the voice's output to the wav file instead of the speakers
hr = m_cpVoice->SetOutput(cpWavStream, TRUE);
}
if ( SUCCEEDED( hr ) )
{
// Do the Speak
Hand le Speak();
}
// Set output back to original stream
// Wait until the speak is finished if saving to a wav file so that
// the smart pointer cpWavStream doesn't get released before its
// finished writing to the wav.
m_cpVoice->Wait UntilDone( INFINITE );
cpWavStream.Releas e();
// Reset output
m_cpVoice->SetO utput( cpOldStream, FALSE );
|
|
|
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: May 10 2007 at 7:27pm | IP Logged
|
|
|
Hi sabdullah,
We have one of our team members looking into this. We may be able to send you a sample app in C++ that shows you how to do this.
Please wait a bit while we get this together.
Support
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: May 14 2007 at 9:01am | IP Logged
|
|
|
We have made available a sample SAPI TTS sample application that may benefit you. Please see the following post:
VOIP and SAPI TTS speech sample now available:
http://www.lanscapecorp.com/forum/forum_posts.asp?TID=327&KW =tts
Support
|
Back to Top |
|
|