Menu

Keyword Spotting is not working with Microphone

Help
Charishma
2017-03-02
2017-03-02
  • Charishma

    Charishma - 2017-03-02

    Hi,

    I have implemented keyword spotting with a keyphrase "speech interface".
    
    It is working perfectly fine for synthesized audio file testSpeech.wav,
    
    pocketsphinx_continuous -infile testSpeech.wav -keyphrase "SPEECH INTERFACE" -kws_threshold 1e-30 -time yes -hmm ..\..\..\model\en-us\en-us ..\..\..\model\en-us\cmudict-en-us.dict  -logfn test.log
    
     but it is giving an error like below if iam using a microphone. I am using same acoustic model n dictionary. I couldn't figure out where i am doing wrong.
    

    ERROR: "dict.c", line 195: Line 3: Phone '(2)' is mising in the acoustic model; word 'INTERFACE' ignored

    please help me..

    I am providing a link in which there are my logcats, dictionary model and acoustic model. Kindly let me know if anything are required.

    https://drive.google.com/open?id=0Bx7zZtnMgeIlaG1EU1VHUm1yMU0

    Awaiting for your response..

     
    • Arseniy Gorin

      Arseniy Gorin - 2017-03-02

      You should drop space before (2) in your 5041.dic

      It should be:

      SPEECH  S P IY CH
      INTERFACE       IH N T ER F EY S
      INTERFACE(2)    IH N ER F EY S
      
       
  • Charishma

    Charishma - 2017-03-03

    I have changed it.. Thank you.. :)

    Still it is not recognizing.. there is no error in logcat.. but it is not recognizing..Should I change the threshold value? what will be correct threshold value for this?

     
    • Arseniy Gorin

      Arseniy Gorin - 2017-03-03

      Reducing threshold leads to better spotting but more false alarms
      You should try values from 1e-1 to 1e-100

      It also depends on the microphone, can be affected by accent, noise, etc.

      If you are not english native speaker, you can also check if more alternative pronunciations should be added to dictionary

       
  • Charishma

    Charishma - 2017-03-03

    Thank you for the fast reply. It means alot.. :)

    I have checked by varying threshold values. It is somewhat better.. It is recognizing some voices only.

    My locale was Indian English. How can I add more alternative pronounciations ?? That means you were saying me to extend the dictionary as in the tutorial ??

    If I add the pronunciations, I need to train the acoustic model also right ?

    Is there any possibility to add all kinds of English accents to Dictionary n acoustic model ??

    Please guide me how to proceed further..

     
    • Arseniy Gorin

      Arseniy Gorin - 2017-03-03

      Indian English is heavily accented. You likely need a special acoustic model (trained on quite large data set of Indian English). I think such stuff is not freely available...

      But as you need just one phrase, somehow a hack that probably somehow might help is to try extending dictionary a little bit by adding pronunciations sounding closer to your accent. It is really a hack and probably will not work well.

      It is like in your example the word interface has 2 pronunciatioons (with T omited and present). You can add more pronunciations, but you can only use sounds from the acoustic model phone set

       
  • Charishma

    Charishma - 2017-03-03

    Thank you so much for taking time to guide a noob like me :)

    I'll try out and get back to you if any further help needed..

     
  • Charishma

    Charishma - 2017-03-07

    Hi,

    Can you please explain the significance of Threshold Value? I read somewhere like it is used to reduce the false detections. Is that right ??

    And previously you told me that, reducing threshold leads to better spotting but more false alarms. what exactly false alarms mean??

    Coming to my project,
    The word "speech" is spotting good at threshold 30 and "interface" at 50 for me. But I couldn't figure out threshold value for the phrase "speech interface". kindly, could you help me out?

    Link for my Dictation and acoustic model,
    https://drive.google.com/drive/u/0/folders/0Bx7zZtnMgeIlaG1EU1VHUm1yMU0

       Thanks in Advance.
    
     
    • Arseniy Gorin

      Arseniy Gorin - 2017-03-07

      you usually need smaller thresholds for phrases.

      false alarms means if you say something weird like "blah" instead of "speech interface", the system will still say it detected "speech interface". This will happen if the threshold is too low

       

Log in to post a comment.