Author |
|
hermes Junior
Joined: October 27 2006 Posts: 64
|
Posted: June 16 2008 at 4:44pm | IP Logged
|
|
|
The most of the sound cards have different DC offsets. If you are talking with a phone it isn´t any problem but for example if you are recognizing voice over VoIP it is a handicap. There are a lot of algorithms to compensate DC offset. We use a fast dynamic compensation algorithm based in FFT to do this. If you are interested in integrate this improvement feel free to ask us any doubt.
Code:
Mic --> Offset Compensation module --> Codec module --> RTP packet |
|
|
|
Back to Top |
|
|
hermes Junior
Joined: October 27 2006 Posts: 64
|
Posted: June 17 2008 at 4:17pm | IP Logged
|
|
|
Perhaps it could be better, if there was a 'Microphone Sample Callback' like 'RTP Callback' to modify wave samples.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: June 18 2008 at 9:36am | IP Logged
|
|
|
Hi Hermes,
Good post.
You know, we have not paid much attention to DC offset in local multimedia recorded sample block data. If there is a DC offset component to the signal, it must be small, right?
If you can, elaborate a bit more on the DC offset characteristics you have experienced in your development career. Also elaborate on how this affects speech recognition engines you have used. We are curious about your findings.
In the mean time, you may want to fool around with the following undocumented API procedure:
Code:
// data struct passed to a user registered callback proc. It allows application
// software to access raw recorded audio buffers.
//
typedef struct
{
AUDIO_BANDWIDTH AudioBandwidth; // represents the format and rate of the sampled data.
void *pSampleBuffer; // address of the sample buffer.
unsigned long BufferLengthInBytes; // the number of bytes in the sample buffer.
void *pUserData; // the user supplied callback data. this will be set to the same value
// that was specified when SetAudioRecordCallback() proc was called
}AUDIO_RECORD_DATA;
typedef BOOL (VOIP_API *AUDIO_RECORD_CALLBACK_PROC)(AUDIO_RECORD_DATA *pAudioRecordData);
TELEPHONY_RETURN_VALUE VOIP_API SetAudioRecordCallback(
SIPHANDLE hStateMachine,
AUDIO_BANDWIDTH DesiredBandwidth,
AUDIO_RECORD_CALLBACK_PROC pAudioRecordCallback,
void *pUserData
);
|
|
|
It will allow your app to gain access to locally recorded multimedia sample blocks just after they have been recorded and just before they get used by the media engine. The “DesiredBandwidth” parameter can be set to anything because it is ignored.
The callback that this API registered will be called every 20Ms with recorded 22k PCM sample block data. The samples are recorded by the mutimedia hardware and consist of 20Ms of 22050Hz PCM data. Samples have a data type of “short (signed 16 bit samples).
The sample block passed to the app can be modified however the app wants. This is where you could place your DC offset compensation logic.
Support
Notes:
This post discusses VOIP Media Engine undocumented API procedures that are used for internal test purposes. Do not use these API procedures in your VOIP applications.
|
Back to Top |
|
|
hermes Junior
Joined: October 27 2006 Posts: 64
|
Posted: June 18 2008 at 11:26am | IP Logged
|
|
|
Thanks a lot!
It's just what I was looking for. Undocumented API procedures are saving our life.
We´ve tested a lot of sound cards. Most of them have got a small DC component.
Voice recognition engines are trained first with a lot of different voices. This voice files are recorded without DC offset. When you want to use one of these engines is better if you pass an adaptation process before you dictate.
When you use a sound card with DC offset, voice formants are different regarding 'base formants', even you can saturate your voice. For this reason, it is a good practice compensating DC offset.
I don´t know if I have explained it clearly.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: June 19 2008 at 8:00am | IP Logged
|
|
|
Good explanation.
We did not know that voice recognition accuracy and "training" has a sensitivity to DC offset in the voice signal. You would think the speech recognition engines would “filter” out a DC offset component themselves. Hmmm…….
|
Back to Top |
|
|
hermes Junior
Joined: October 27 2006 Posts: 64
|
Posted: June 19 2008 at 9:11am | IP Logged
|
|
|
Not all speech recognition engines have got a DC compensation filter. DC isn´t critical in this aspect but it can help you.
We have worked with all kind of speech recognition engines and we prefer to compensate DC offset ourselves.
Thanks again.
|
Back to Top |
|
|