Menu

Ngram Search recognises every time but KWS is failing always

Help
Q3Varnam
2018-04-11
2018-05-04
  • Q3Varnam

    Q3Varnam - 2018-04-11

    I have created a own language model - non english. I have trained it with around 30 minutes of data. Its a small vocabulary model.

    When I run PocketSphinx continuous in ngram search mode - it detects the words spoken from the mic with 99% accuracy.

    However when I try to run it in the Keyword search mode - nothing gets detected at all.

    I have tested it with a recorded wav file to avoid any issues with ambient noise etc.

    Even with the wav file, ngram search detects it where as keyword search mode fails.

    pocketsphinx_continuous -hmm  ./model_parameters/tamilbus.ci_cont  -dict ./etc/tamilbus.dic -infile ./wav/speaker_1/kk_ticket_40.wav -kws kwlist 
    

    My keyworld list file (kwlist) has

    கண்டக்டர் டிக்கெட் போடு/1e-50/
    துறையூருக்கு ரெண்டு டிக்கெட் /1e-49/
    

    The outputs of keyword search and ngram search are attached - they are plain text files

    Please advise whether there is anything specific that needs to be done for training a keyword search model?

    EDIT: Just to clarify, the Keyword search using the supplied US-EN model is working. So is it something specific to the training mode settings or configuration parameters?

     

    Last edit: Q3Varnam 2018-04-11
    • Nickolay V. Shmyrev

      I have trained it with around 30 minutes of data. Its a small vocabulary model.

      This dataset size is way below our recommendation for model training.

      Its a small vocabulary model.

      keyword spotting requires large vocabulary model.

       
  • Q3Varnam

    Q3Varnam - 2018-04-11

    Many thanks for your prompt reply. I can live with Grammar search and implement my own keyword spotting with my small vocabulary.

    Can a jsgf file be used instead of a *.lm in the config file for training
    $CFG_LANGUAGEMODEL = "$CFG_LIST_DIR/$CFG_DB_NAME.lm";

    It is not explicitly stated in the tutorial.

    Secondly if it is allowed if there are two rules in the jsgf file will it use both the rules?

    The reason why I am asking this is, I just ran pocketsphinx_continuous detection using a jsgf file which had two rules
    <wakeup> <command1></command1></wakeup>

    the detector only picked the first occuring rule. Is this how it is supposed to work?

     

    Last edit: Q3Varnam 2018-04-11

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.