Voice Recognition Inconsistancy

Voice Recognition Inconsistancy

I'm trying to run recognition on an existing WAV file.

Even though I couldn't find any example showing how to do it with the PCSDK, I managed to write something that seems to work.

The problem is, that as far as I can tell, when a WAV file has background noise or its volume is low, the recognition process isn't consistent.
When running on the same file over and over, the results change from one execution to another.

To eliminate my code from the equation, I did the following "trick".

  1. Downloaded and installed the VB-Audio virtual cable from http://vb-audio.pagesperso-orange.fr/Cable/index.htm
  2. Set the "VB-Audio Virtual Cable" as default playback and default recording device.
  3. I ran the "voice_recognition.exe" that came with the PCSDK.
  4. I launched Window Media Player and played a WAV file containing a noisy recording of the text "Climate Rear Defrost".
  5. Every time I played it I received a different recognition result as can be seen in the following JPG:

下载 Voice Recognition issues

So two things:

  1. Can anyone explain this behavior and how to improve it so at least it will behave in a deterministic way when presented with the exact same input?
  2. Does anyone have an example of how to VR on an existing wav file using UtilMCaptureFile?
    If the answer for "2" is - no then if I post the code I've created will anyone be able to advise as to its correctness?

Thanks!!!
                  Tom

 

 

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
In order to transcribe a wav file please look at this C# console application.
The filename is passed to MyUtilMPipeline constructor. 
Not sure this is the right way because the following issue: every utterance starts with a capital letter.
The audio file should be 16 Khz PCM mono


using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Transcriber
{
    class Program
    {

        public class MyUtilMPipeline : UtilMPipeline
        {
            protected uint pidx;

            public MyUtilMPipeline(PXCMSession session, string filename) : 
                base(session, filename)
            {

            }

            public void SetProfileIndex(uint pidx)
            {
                this.pidx = pidx;
            }

            public override void OnVoiceRecognitionSetup(ref PXCMVoiceRecognition.ProfileInfo pinfo)
            {
                QueryVoiceRecognition().QueryProfile(pidx, out pinfo);
            }

            public override void OnRecognized(ref PXCMVoiceRecognition.Recognition data)
            {
                if (data.label < 0)
                {
                    Console.WriteLine(data.dictation);
                }                
            }

            public override void OnAlert(ref PXCMVoiceRecognition.Alert data)
            {
                Console.WriteLine(data.label);
            }

            
            
        }

        static void Main(string[] args)
        {
            if (args.Length != 1)
            {
                Console.WriteLine("Usage: Transcriber.exe <wav fullpath>");
                return;
            }

            if (!File.Exists(args[0]))
            {
                Console.WriteLine("The file {0} doesn't exist", args[0]);
                return;
            }
            
            PXCMSession session = null;
            pxcmStatus sts = PXCMSession.CreateInstance(out session);
            if (sts >= pxcmStatus.PXCM_STATUS_NO_ERROR)
            {
                MyUtilMPipeline pp = new MyUtilMPipeline(session, args[0]);
                                
                /* Set Module */
                pp.EnableVoiceRecognition("Voice Recognition (Nuance*)");

                /* Set Language */
                pp.SetProfileIndex((uint)0);

                /* Set Dictation */
                pp.SetVoiceDictation();               

                if (pp.Init())
                {
                    
                    /* Set audio volume to 0.2 */
                    pp.QueryCapture().device.SetProperty(PXCMCapture.Device.Property.PROPERTY_AUDIO_MIX_LEVEL, 0.2f);

                    /* Recognition Loop */
                    while (true)
                    {
                        if (!pp.AcquireFrame(true)) break;
                        if (!pp.ReleaseFrame())
                        {
                            break;
                        }
                    }
                }
                else
                {
                    Console.WriteLine("Init Failed");
                }

                pp.Close();
                pp.Dispose();
                session.Dispose();
            }
        }
    }
}

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today