Menu

Pocketsphinx russian model: twice slower than the english one

Help
Вадим
2013-02-07
2015-03-15
1 2 > >> (Page 1 of 2)
  • Вадим

    Вадим - 2013-02-07

    Hi Nickolay!
    I am testing both English hub4wsj_sc_8k and Russian msu_ru_nsh acoustic and language models on pocketsphinx (both on Android and on PC), and have noticed that the decoder with English model is either initialized and works twice faster than the Russian one - why does this happen? Due to the bigger amount of data for making the language model? or the acoustic model is slower?
    And only 4 more questions:

    1) how did you make the msu_ru_nsh.lm.dmp? is it made from just a list of most sequent words from the dictionary, like

    < s > один < / s>
    < s > одна < / s>
    ....

    or made of various sentences?

    2) in russian acoustic model i didn't find the fillers like "noise", "cough", "knock" etc., only SIL. How can i easily add this stuff to the russian acoustic model, or maybe there is a ready russian solution? If no, perhaps i need to record these sounds and make an acoustic model adaptation? If so, do i need to add the fillers like ordinal words, except the pronouncation, which should be
    ++laugh++ +LAUGH+
    ++lipsmack++ +LIPSMACK+
    ++cough++ +COUGH+
    ++breath++ +BREATHE+
    ?

    3) how do i need to add the fillers to the language model? (to check, if these fillers are appropriate for my task, to watch the utterance and see the timestamps, where the fillers had been recognized)? Should I add them just like an ordinal word, for example jsgf:
    public = (иди вперед | иди назад | ++breathe++ | ++cough++ );

    and creating .DMP:

    < s > иди вперед < / s >
    < s > иди назад < / s >
    < s > ++breathe++ < / s >

    Is this correct?
    And are the fillers automatically detected, when lauching the ps_process_raw(), but just not shown?

    4) a bit offtopic, but too small for a new topic to create: where i can find the chinese models? Is the zh_CN model for Chinese, zh_TW for Taiwan, and the acoustic is shared chinese - zh/tdt_sc_8k?

    sorry for such a big message...

     

    Last edit: Вадим 2013-02-07
    • Nickolay V. Shmyrev

      and have noticed that the decoder with English model is either initialized and works twice faster than the Russian one - why does this happen?

      English model is semi-continuous and moreover it's compressed for mobile applications (with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.

      Actually zero Russian acoustic models are significantly better than voxforge ones, I'd recommend you to try them. Unfortunately they need a different dictionary.

      1) how did you make the msu_ru_nsh.lm.dmp? is it made from just a list of most sequent words from the dictionary, like

      I trained it on subset of http://lib.ru corpus. I have a newer model which is trained on a whole lib.ru. It's better than older lm.dmp which is more like a test model, not a real model.

      For lm training tools see http://cmusphinx.sourceforge.net/wiki/tutoriallm

      in russian acoustic model i didn't find the fillers like "noise", "cough", "knock" etc., only SIL. How can i easily add this stuff to the russian acoustic model, or maybe there is a ready russian solution? If no, perhaps i need to record these sounds and make an acoustic model adaptation? If so, do i need to add the fillers like ordinal words, except the pronouncation, which should be

      There is no easy way to add fillers into the acoustic model. Fillers are missing because they where not present in the training database. You need to add fillers in training database transcription and retrain the model

      how do i need to add the fillers to the language model?

      Fillers are added to a grammar automatically with -fsgusefiller option. You should not add them in the grammar itself.

      4) a bit offtopic, but too small for a new topic to create: where i can find the chinese models? Is the zh_CN model for Chinese, zh_TW for Taiwan, and the acoustic is shared chinese - zh/tdt_sc_8k?

      Yes

       
      • Вадим

        Вадим - 2013-02-08

        Hi Nickolay,
        Thanks very much for such a detailed response.

        with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.

        I found these scripts in sphinxtrain, but i still don't understand how do i need to run them and where to put my acoustic model as a parameter?

        Actually zero Russian acoustic models are significantly better than voxforge ones, I'd recommend you to try them. Unfortunately they need a different dictionary.

        I have tried this, it works faster, but has worse accuracy than msu_ru_nsh (if to choose zero acoustic and your msu_ru_nsh.lm.dmp. If change lm to zero's, the recognition process time multiplies a lot...

        I trained it on subset of http://lib.ru corpus. I have a newer model which is trained on a whole lib.ru. It's better than older lm.dmp which is more like a test model, not a real model.

        I have looked up for it and found a really great corpus base.. Will you make your new language model from whole lib.ru downloadable for others? or can you send it to me, i could test it)

        Fillers are added to a grammar automatically with -fsgusefiller option. You should not add them in the grammar itself.

        I made a simple jsgf file with (yes | no ) rule, enabled -fsgusefiller, but still cannot get the fillers except sil like ++cough++, ++laugh++ or else.. I would like to implement an ability to register somebody coughing/laughing to handle it somehow in my application, so i am really interested in this...

         

        Last edit: Вадим 2013-02-08
  • Вадим

    Вадим - 2013-02-08

    Hi Nickolay,
    Could you please help me with the problem in the message above?..

     
  • Вадим

    Вадим - 2013-02-09

    Nickolay, are you here?...

     
  • Nickolay V. Shmyrev

    how do i need to run them and where to put my acoustic model as a parameter?

    quantize_mixw.py mixture_weights

    I have tried this, it works faster, but has worse accuracy than msu_ru_nsh (if to choose zero acoustic and your msu_ru_nsh.lm.dmp.

    Probably you need to perform more experiments. Also you can train semi-continuous model from Voxforge data

    Will you make your new language model from whole lib.ru downloadable for others?

    Not right now

    I would like to implement an ability to register somebody coughing/laughing to handle it somehow in my application, so i am really interested in this...

    Russian acoustic models do not have filler sounds, so they are not recognized. You need to train another model with fillers if you need them or create a tool to copy fillers from English models.

     
    • John8394

      John8394 - 2015-03-14

      Hello, sir.
      Is it possible to see your tuned dictionaries?
      I've tried to translate http://www.youtube.com/watch?v=3tqBH7lhlOw (00:01:00-00:02:30)
      voxforge-ru-0.2: http://pastebin.com/x7BKfDP2
      zero_ru_cont_8k_v2: http://pastebin.com/uU2sC7w4
      Both attempts have very poor results, of course my bad knowledge with sphinx is the main problem here.
      I can use sphinx on desktop computer, no need in small device optimization.
      Thanks in advance.

       

      Last edit: John8394 2015-03-14
      • Nickolay V. Shmyrev

        Large vocabulary Russian speech recognition is not possible with current models, they are too small.

         
  • Вадим

    Вадим - 2013-02-09

    Thanks very much for your reply.

    quantize_mixw.py mixture_weights

    I will try this.

    create a tool to copy fillers from English models.

    Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..

    Russian acoustic models do not have filler sounds, so they are not recognized. You need to train another model with fillers if you need them or create a tool to copy fillers from English models.

    I meant that i tried the English model (on pocketsphinx_continuous.exe) with a
    "-fsgusefiller yes" parameter, using an easy jsgf model with (yes|no) rule - and couldn't catch the ++cough++, ++laugh++, or other fillers, only silences (sil). Perhaps i didn't set up the option well?

    my pocketsphinx_continuous.exe parameters:
    -hmm model\hmm\en_US\hub4wsj_sc_8k -jsgf model\lm\en_US\fillers.gram -samprate 8000 -dict model\lm\en_US\cmu07a.dic -fsgusefiller yes -backtrace yes -bestpath no -nbest no

    my fillers.gram file:


    JSGF V1.0;

    grammar fillers;

    public <idle> = (no | yes);


    and here are the logs:


    INFO: jsgf.c(581): Defined rule: <fillers.g00000>
    INFO: jsgf.c(581): Defined rule: PUBLIC <fillers.idle>
    INFO: fsg_model.c(215): Computing transitive closure for null transitions
    INFO: fsg_model.c(270): 2 null transitions added
    INFO: fsg_model.c(421): Adding silence transitions for <sil> to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_model.c(421): Adding silence transitions for ++NOISE++ to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_model.c(421): Adding silence transitions for ++BREATH++ to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_model.c(421): Adding silence transitions for ++SMACK++ to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_model.c(421): Adding silence transitions for ++COUGH++ to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_model.c(421): Adding silence transitions for ++LAUGH++ to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_model.c(421): Adding silence transitions for ++TONE++ to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_model.c(421): Adding silence transitions for ++UH++ to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_model.c(421): Adding silence transitions for ++UM++ to FSG
    INFO: fsg_model.c(441): Added 6 silence word transitions
    INFO: fsg_search.c(366): Added 0 alternate word transitions
    INFO: fsg_lextree.c(108): Allocated 612 bytes (0 KiB) for left and right context
    phones
    INFO: fsg_lextree.c(253): 59 HMM nodes in lextree (56 leaves)
    INFO: fsg_lextree.c(255): Allocated 6372 bytes (6 KiB) for all lextree nodes
    INFO: fsg_lextree.c(258): Allocated 6048 bytes (5 KiB) for lextree leafnodes
    INFO: continuous.c(371): bin\Release\pocketsphinx_continuous.exe COMPILED ON: De
    c 13 2012, AT: 17:37:32

    Allocating 32 buffers of 2500 samples each
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: fsg_search.c(1032): 110 frames, 1028 HMMs (9/fr), 3453 senones (31/fr), 78
    3 history entries (7/fr)

    INFO: pocketsphinx.c(851): 000000000: no (-12386)
    INFO: word start end pprob ascr lscr lback
    INFO: (NULL) -1 -1 1.000 0 0 1
    INFO: sil 0 32 1.000 314409 -317862 1
    INFO: no 33 38 1.000 5692 -6930 1
    INFO: (NULL) 38 38 1.000 0 0 1
    INFO: sil 39 109 1.000 310167 -317862 1
    000000000: no
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: fsg_search.c(1032): 53 frames, 494 HMMs (9/fr), 1665 senones (31/fr), 231
    history entries (4/fr)

    INFO: pocketsphinx.c(851): 000000001: no (-6632)
    INFO: word start end pprob ascr lscr lback
    INFO: (NULL) -1 -1 1.000 0 0 1
    INFO: sil 0 31 1.000 314404 -317862 1
    INFO: no 32 37 1.000 5627 -6930 1
    INFO: (NULL) 37 37 1.000 0 0 1
    INFO: sil 38 52 1.000 315991 -317862 1
    000000001: no


    so, if i try to cough, or laugh, or cough+yes, cough+no, etc., i receive something like this:

    INFO: pocketsphinx.c(851): 000000001: no (-6632)
    INFO: word start end pprob ascr lscr lback
    INFO: (NULL) -1 -1 1.000 0 0 1
    INFO: sil 0 31 1.000 314404 -317862 1
    INFO: no 32 37 1.000 5627 -6930 1
    INFO: (NULL) 37 37 1.000 0 0 1
    INFO: sil 38 52 1.000 315991 -317862 1
    000000001: no

    and if to laugh a bit longer:

    INFO: word start end pprob ascr lscr lback
    INFO: (NULL) -1 -1 1.000 0 0 1
    INFO: sil 0 128 1.000 296296 -317862 1
    INFO: sil 129 254 1.000 298816 -317862 1
    INFO: sil 255 286 1.000 314725 -317862 1
    INFO: yes 287 334 1.000 2324 -6930 1
    INFO: (NULL) 334 334 1.000 0 0 1
    000000010: yes
    READY....

    maybe i have made false settings?...

     

    Last edit: Вадим 2013-02-09
  • Nickolay V. Shmyrev

    "-fsgusefiller yes" parameter, using an easy jsgf model with (yes|no) rule - and couldn't catch the ++cough++, ++laugh++, or other fillers, only silences (sil). Perhaps i didn't set up the option well?

    You also need to increase fillprob something like

    -fillprob 0.1
    
     
    • Вадим

      Вадим - 2013-02-09

      wow, it works!) thanks a lot, and what about the

      Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..
      ?...

       
  • Вадим

    Вадим - 2013-02-09

    quantize_mixw.py mixture_weights

    but i receive this:
    Traceback (most recent call last):
    File "quantize_mixw.py", line 11, in <module>
    import sphinxbase
    ImportError: No module named sphinxbase

    i have built sphinxbase, pocketsphinx, sphinxtrain in one parent folder in cygwin for windows, made

    export PATH=/usr/local/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/lib
    export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

    could you give an adivse?..

     
  • Nickolay V. Shmyrev

    could you give an adivse?..

    Use Linux

     
    • Вадим

      Вадим - 2013-02-09

      )) ok, i will try on a JVM

       
    • Вадим

      Вадим - 2013-02-09

      I tried, it says "no module named numpy"...

       
  • Вадим

    Вадим - 2013-02-09

    On windows tried in sphinxbase/python/setup_win32.py build + install, and now this:
    Traceback (most recent call last):
    File "quantize_mixw.py", line 183, in <module>
    ifn, ofn = sys.argv[1:]
    ValueError: need more than 1 value to unpack

     
    • Вадим

      Вадим - 2013-02-09

      ow, it worked when i added the output file as the second parameter, didn't know... now worked

       

      Last edit: Вадим 2013-02-09
  • Вадим

    Вадим - 2013-02-10

    Thanks Nickolay, I almost got my many questons solved, and i have only some left:
    1) do i need to use prune_mixw.py for mixture1 to become mixture2, then quantize_mixw.py for mixture2 to become mixture3?

    1.1) do i need to use cluster_mixw.py? (it seems to be used also with mixtures)

    1.2) what do these procedures do with acoustic model? they make it 'semi'? and why 'semi' is appropriate for android demo - because it is special for pocketsphinx, which is used by the smartphone, while cont is for sphinx4?

    1.3) i also have noticed that 'semi' models have sendump, and no mixture_weights, and 'cont' have mixture_weights, but no sendump... where can i take sendump? or it is my "mixture3" file?

    2)

    create a tool to copy fillers from English models.

    Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..

     

    Last edit: Вадим 2013-02-10
  • Вадим

    Вадим - 2013-02-10

    i've made sendump from mixture_weights, but pocketsphinx fails when reading senone mixture weights, he still tries to look for mixture_weights...

     
  • Вадим

    Вадим - 2013-02-10

    Hi Nickolay,
    I still cannot find a way to tell the russian acoustic that it should look through the sendump file, and i still looking forward for your reply to the previous 2 messages...
    Vadim

     
  • Nickolay V. Shmyrev

    I still cannot find a way to tell the russian acoustic that it should look through the sendump file

    Sendump is used for semi-continuous models only. Semi-continuous models are better for mobile device because they are faster in decoding.

     
    • Вадим

      Вадим - 2013-02-11

      Ok.. but what do i need to do then with quantize_mixw, prune_mixw to make my russian acoustic appropriate for mobile devices?

      English model is semi-continuous and moreover it's compressed for mobile applications (with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.

       
      • Nickolay V. Shmyrev

        Those scripts reduce the model size in order to make it better fit the limitid memory of the mobile device.

        For more details you can read:

        http://www.cs.cmu.edu/~dhuggins/Publications/mixw_quant.pdf

         
        • Вадим

          Вадим - 2013-02-12

          Thanks, Nickolay, i understand now the purpose of these scripts..
          But still, can you tell me by steps, what i need to do to compress the model? Do I need the both scripts to apply, and how?
          And which two integers i need to send as an argument to prune_mixw.py?
          (both the scripts return sendump, but i need modified mixture_weights to use in msu_ru_nsh which is 'cont'...)

          P.S. i've tried prune_mixw.py mixture_weights mixture_weights2,
          then sendump.py mixture_weights2 mixture_weights3 (as i understand, sendump.py makes mixture_weights from sendump), it told

          $ python sendump.py mixture_weights mixture_weights2
          rows (model_count): 8, columns (mixture_count): 1156, features (feature_count): 1
          

          and when i try to use it in pocketsphinx (as mixture_weights):

          INFO: ms_senone.c(149): Reading senone mixture weights: model\hmm\msu_ru_nsh/mix
          ture_weights
          INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
          INFO: ms_senone.c(207): Not transposing mixture weights in memory
          INFO: ms_senone.c(266): Read mixture weights for 1156 senones: 1 features x 8 co
          dewords
          INFO: ms_senone.c(320): Mapping senones to individual codebooks
          FATAL_ERROR: "ms_mgau.c", line 134: Senones need more codebooks (1156) than pres
          ent (1153)
          
           

          Last edit: Вадим 2013-02-12
          • Nickolay V. Shmyrev

            both the scripts return sendump, but i need modified mixture_weights to use in msu_ru_nsh which is 'cont'..

            You can only compress semi-continuous model

             
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.