CMU Sphinx / Forums / Help: Poor recognition accuracy using pocketshpinx, possibly cause by bad LM.

Wojcvch - 2018-10-28

Hello,

I'm training a model for recognizing small set of commands in polish. I have created all nessesary files mentioned in tutorial, run training and got ~90% WER and SER accuracy using jfsg grammar. Before i have tried to use LM generated by LMP tool but got 100% error with it (not a signe word has been recognized). My audio data consists of 800+ audio files of commands spoken over and over by one person (about 0,8 hour).

I get only 2 sentences recognized, the rest of sentences is either falsely recognized as one of mentioned senteces or not recognized at all ("fsg_search.c", line 940: Final result does not match the grammar in frame 255).

My audio files are 16 bit little endian, 16kHz mono.I am using sphinxtrain, pocketshpinx and shpinxbase from github (however i have used latest sources downloaded from sourceforge )

The same data used with Kaldi ASR managed to train a model with a very small error (sub 10% in both SER and WER), so i'm suspecting something is wrong with my language model (both jfsg and lm), but i can't really pinpoint what it is.

What could be a possible cause of such a high error rate? Is my data/configuration garbage ?

etc.7z

logdir.7z

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-10-28
  
  use LM generated by LMP tool but got 100% error with it
  
  LMTool supports only English, you should have used srilm. Your current lm does not match the dictionary lowercase words.
  
  I get only 2 sentences recognized, the rest of sentences is either falsely recognized as one of mentioned senteces or not recognized at all
  
  Your jsgf grammar is not constructed correctly. I has has too many public rules, only the last one is actually used.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Wojcvch - 2018-10-28
    
    That was a silly question of me, sorry to waste your time on something that obvious,
    
    Thank you kindly for your help.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Poor recognition accuracy using pocketshpinx, possibly cause by bad LM.

Speech Recognition Toolkit

Forums

Help

Poor recognition accuracy using pocketshpinx, possibly cause by bad LM. document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Poor recognition accuracy using pocketshpinx, possibly cause by bad LM.