Voice Recognition is leaking memory

Voice Recognition is leaking memory

I am working with VoiceRecognition module in C#. First thing, unlike the previous version it does not work with gesture any more. So EnableVoiceRecognition() and EnableGesture() can not be part of same pipeline. So I have written a separate thread and put voice recognition in that.

Both gesture and voice is working perfectly in sync without problem. But VoiceRecognition is blocking few resources. So I have reduced the polling and now everything is fine. I was about to submit the app when I noticed this typical problem of memory leak.

Once depth and voice streams are captured simultaneously, the module is leaking memory like hell. So the App memory is growing and growing. Thanks to C#'s garbage collector , after 700 MB, garbage is cleared. But that is hell lot of memory. 

Is it permitted for the contest to write our own API for recognition? I have written a basic VoiceRecognition Module. 

1. First I have enabled SpeechSynthesis and allowed it to speak all my commands.

2. I have captured the audio of these commands and have taken 1024 point FFT. Applied Silence removal and have applied hamming window. I have taken vector quantization and saved the coefficients.

3. I have activated audio capture and performing the same steps on each of the frames. To tell you, recognition accuracy is good over 68% with my technique than VoiceRecognition class which is failing to detect many important keywords like "Escape" "Stop" etc

Just wondering, if it is allowed to use our own API for recognition or do we have to strictly stick to SDK APIs?


Also there are other issues. Like with Audio recording, all frames are kept in the memory till you save. So using it with Gesture is another tough task. We can't capture audio for long if gesture is enabled. 

All in all, the app is working the way it is expected to work. But those little optimization problem is killing system resources like battery and memory.

Please advise.

a) Stick to PerC APIs and let domo be "demo" only


b) Write our own methods to make these demos little "working apps". 

eagerly waiting for a reply.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I have answer your question about running gesture and voice recognition at the same thread at http://software.intel.com/en-us/forums/topic/419369. About memory leak issue. We will verify it and keep you post.

Hi David, thanks as usual for your quick response. I investigated the issues and here is the brief summery:

1. This problem of memory leakage actually occurs in AudioCapture init method. So if we are using audio capture with voice recognition, the problem occours. I have used semaphores to appropriately lock the resources and they work togather without any flawas. It is only this memory problem. To be more precise, as Audio and voice can not be activated at the same time I need to reattept to initialize audio. So calling Init() of audio capture from a loop till init is returning true. It is that part where junk is not getting cleared.

In SpeechSynthesis, there seems to be no way to cancel the speech. Even after Disposing or making  the Synthesizer event as null, it keeps speaking. You can recreate the scenerio by twicking the Speech_synthesizer.cs example. Inside Speak event handler, give a System.Threading.Thread.Sleep() and Dispose, make null, do whatever with instances. In case it is a long speech, No cancellation is a big problem.

If we use SpeechSynthesis with VoiceRecognition, the problem is far more severe. Synthesized speech is taken as voice command. One work around is TurningOff VoiceRecognition at the start of Speech and Turn it on at the end. But Because vsynth renders the speech in Async, there is no "End of Speech Notification". I have to use VoiceRecognition, AudioRecording and Speech Synthesis togather. I have got them working togather, but the solution is clumsy. Too many timer and semaphores. I wonder If I am missing something here.

Logically I want

Speech to Tell me what command I should speak, when I speak a "record" command, it should start recording, deactivating both speech and voice, once recording is finished, it should speak that recording has ended and activate voice.  

One more update, VoiceRecognition makes the UI unresponsive. So when voice is active, touch does not seem to work in a responsive manner. Touch response becomes resistive. If we add a delay in VoiceRecognition polling loop, it fails to recognize anything. Because module never know when the user has started speaking and polling may begin at the end or middle of a speech.

Any of Voice or Speech can't be Disposed and Reinitialized with SDK, at best they can be deactivated, but that does not stop the polling.


 So I am having to spend more time in setting right the performance bottleneck than actually polishing the App.


Though with numerous iterations, I have got them to work, the best they can work, but it is far from the best user experience. Therefore I want to ask, if it is ok to use alternative techniques/APIs for speech and voice?

Leave a Comment

Please sign in to add a comment. Not a member? Join today