CMU Sphinx / Forums / Help: Pocketsphinx russian model: twice slower than the english one

Вадим - 2013-02-07

Hi Nickolay!
I am testing both English hub4wsj_sc_8k and Russian msu_ru_nsh acoustic and language models on pocketsphinx (both on Android and on PC), and have noticed that the decoder with English model is either initialized and works twice faster than the Russian one - why does this happen? Due to the bigger amount of data for making the language model? or the acoustic model is slower?
And only 4 more questions:

1) how did you make the msu_ru_nsh.lm.dmp? is it made from just a list of most sequent words from the dictionary, like

< s > один < / s>
< s > одна < / s>
....

or made of various sentences?

2) in russian acoustic model i didn't find the fillers like "noise", "cough", "knock" etc., only SIL. How can i easily add this stuff to the russian acoustic model, or maybe there is a ready russian solution? If no, perhaps i need to record these sounds and make an acoustic model adaptation? If so, do i need to add the fillers like ordinal words, except the pronouncation, which should be
++laugh++ +LAUGH+
++lipsmack++ +LIPSMACK+
++cough++ +COUGH+
++breath++ +BREATHE+
?

3) how do i need to add the fillers to the language model? (to check, if these fillers are appropriate for my task, to watch the utterance and see the timestamps, where the fillers had been recognized)? Should I add them just like an ordinal word, for example jsgf:
public = (иди вперед | иди назад | ++breathe++ | ++cough++ );

and creating .DMP:

< s > иди вперед < / s >
< s > иди назад < / s >
< s > ++breathe++ < / s >

Is this correct?
And are the fillers automatically detected, when lauching the ps_process_raw(), but just not shown?

4) a bit offtopic, but too small for a new topic to create: where i can find the chinese models? Is the zh_CN model for Chinese, zh_TW for Taiwan, and the acoustic is shared chinese - zh/tdt_sc_8k?

sorry for such a big message...

Last edit: Вадим 2013-02-07

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2013-02-07
  
  and have noticed that the decoder with English model is either initialized and works twice faster than the Russian one - why does this happen?
  
  English model is semi-continuous and moreover it's compressed for mobile applications (with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.
  
  Actually zero Russian acoustic models are significantly better than voxforge ones, I'd recommend you to try them. Unfortunately they need a different dictionary.
  
  1) how did you make the msu_ru_nsh.lm.dmp? is it made from just a list of most sequent words from the dictionary, like
  
  I trained it on subset of http://lib.ru corpus. I have a newer model which is trained on a whole lib.ru. It's better than older lm.dmp which is more like a test model, not a real model.
  
  For lm training tools see http://cmusphinx.sourceforge.net/wiki/tutoriallm
  
  in russian acoustic model i didn't find the fillers like "noise", "cough", "knock" etc., only SIL. How can i easily add this stuff to the russian acoustic model, or maybe there is a ready russian solution? If no, perhaps i need to record these sounds and make an acoustic model adaptation? If so, do i need to add the fillers like ordinal words, except the pronouncation, which should be
  
  There is no easy way to add fillers into the acoustic model. Fillers are missing because they where not present in the training database. You need to add fillers in training database transcription and retrain the model
  
  how do i need to add the fillers to the language model?
  
  Fillers are added to a grammar automatically with -fsgusefiller option. You should not add them in the grammar itself.
  
  4) a bit offtopic, but too small for a new topic to create: where i can find the chinese models? Is the zh_CN model for Chinese, zh_TW for Taiwan, and the acoustic is shared chinese - zh/tdt_sc_8k?
  
  Yes
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Вадим - 2013-02-08
    
    Hi Nickolay,
    Thanks very much for such a detailed response.
    
    with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.
    
    I found these scripts in sphinxtrain, but i still don't understand how do i need to run them and where to put my acoustic model as a parameter?
    
    Actually zero Russian acoustic models are significantly better than voxforge ones, I'd recommend you to try them. Unfortunately they need a different dictionary.
    
    I have tried this, it works faster, but has worse accuracy than msu_ru_nsh (if to choose zero acoustic and your msu_ru_nsh.lm.dmp. If change lm to zero's, the recognition process time multiplies a lot...
    
    I trained it on subset of http://lib.ru corpus. I have a newer model which is trained on a whole lib.ru. It's better than older lm.dmp which is more like a test model, not a real model.
    
    I have looked up for it and found a really great corpus base.. Will you make your new language model from whole lib.ru downloadable for others? or can you send it to me, i could test it)
    
    Fillers are added to a grammar automatically with -fsgusefiller option. You should not add them in the grammar itself.
    
    I made a simple jsgf file with (yes | no ) rule, enabled -fsgusefiller, but still cannot get the fillers except sil like ++cough++, ++laugh++ or else.. I would like to implement an ability to register somebody coughing/laughing to handle it somehow in my application, so i am really interested in this...
    
    Last edit: Вадим 2013-02-08
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Вадим - 2013-02-08

Hi Nickolay,
Could you please help me with the problem in the message above?..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Вадим - 2013-02-09

Nickolay, are you here?...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-02-09

how do i need to run them and where to put my acoustic model as a parameter?

quantize_mixw.py mixture_weights

I have tried this, it works faster, but has worse accuracy than msu_ru_nsh (if to choose zero acoustic and your msu_ru_nsh.lm.dmp.

Probably you need to perform more experiments. Also you can train semi-continuous model from Voxforge data

Will you make your new language model from whole lib.ru downloadable for others?

Not right now

I would like to implement an ability to register somebody coughing/laughing to handle it somehow in my application, so i am really interested in this...

Russian acoustic models do not have filler sounds, so they are not recognized. You need to train another model with fillers if you need them or create a tool to copy fillers from English models.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John8394 - 2015-03-14
  
  Hello, sir.
  Is it possible to see your tuned dictionaries?
  I've tried to translate http://www.youtube.com/watch?v=3tqBH7lhlOw (00:01:00-00:02:30)
  voxforge-ru-0.2: http://pastebin.com/x7BKfDP2
  zero_ru_cont_8k_v2: http://pastebin.com/uU2sC7w4
  Both attempts have very poor results, of course my bad knowledge with sphinx is the main problem here.
  I can use sphinx on desktop computer, no need in small device optimization.
  Thanks in advance.
  
  Last edit: John8394 2015-03-14
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2015-03-15
    
    Large vocabulary Russian speech recognition is not possible with current models, they are too small.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Вадим - 2013-02-09

Thanks very much for your reply.

quantize_mixw.py mixture_weights

I will try this.

create a tool to copy fillers from English models.

Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..

Russian acoustic models do not have filler sounds, so they are not recognized. You need to train another model with fillers if you need them or create a tool to copy fillers from English models.

I meant that i tried the English model (on pocketsphinx_continuous.exe) with a
"-fsgusefiller yes" parameter, using an easy jsgf model with (yes|no) rule - and couldn't catch the ++cough++, ++laugh++, or other fillers, only silences (sil). Perhaps i didn't set up the option well?

my pocketsphinx_continuous.exe parameters:
-hmm model\hmm\en_US\hub4wsj_sc_8k -jsgf model\lm\en_US\fillers.gram -samprate 8000 -dict model\lm\en_US\cmu07a.dic -fsgusefiller yes -backtrace yes -bestpath no -nbest no

my fillers.gram file:

JSGF V1.0;

grammar fillers;

public <idle> = (no | yes);</idle>

and here are the logs:

INFO: jsgf.c(581): Defined rule: <fillers.g00000>
INFO: jsgf.c(581): Defined rule: PUBLIC <fillers.idle>
INFO: fsg_model.c(215): Computing transitive closure for null transitions
INFO: fsg_model.c(270): 2 null transitions added
INFO: fsg_model.c(421): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++NOISE++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++BREATH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++SMACK++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++COUGH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++LAUGH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++TONE++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++UH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++UM++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_search.c(366): Added 0 alternate word transitions
INFO: fsg_lextree.c(108): Allocated 612 bytes (0 KiB) for left and right context
phones
INFO: fsg_lextree.c(253): 59 HMM nodes in lextree (56 leaves)
INFO: fsg_lextree.c(255): Allocated 6372 bytes (6 KiB) for all lextree nodes
INFO: fsg_lextree.c(258): Allocated 6048 bytes (5 KiB) for lextree leafnodes
INFO: continuous.c(371): bin\Release\pocketsphinx_continuous.exe COMPILED ON: De
c 13 2012, AT: 17:37:32</sil></fillers.idle></fillers.g00000>

Allocating 32 buffers of 2500 samples each
READY....
Listening...
Stopped listening, please wait...
INFO: fsg_search.c(1032): 110 frames, 1028 HMMs (9/fr), 3453 senones (31/fr), 78
3 history entries (7/fr)

INFO: pocketsphinx.c(851): 000000000: no (-12386)
INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 32 1.000 314409 -317862 1
INFO: no 33 38 1.000 5692 -6930 1
INFO: (NULL) 38 38 1.000 0 0 1
INFO: sil 39 109 1.000 310167 -317862 1
000000000: no
READY....
Listening...
Stopped listening, please wait...
INFO: fsg_search.c(1032): 53 frames, 494 HMMs (9/fr), 1665 senones (31/fr), 231
history entries (4/fr)

INFO: pocketsphinx.c(851): 000000001: no (-6632)
INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 31 1.000 314404 -317862 1
INFO: no 32 37 1.000 5627 -6930 1
INFO: (NULL) 37 37 1.000 0 0 1
INFO: sil 38 52 1.000 315991 -317862 1
000000001: no

so, if i try to cough, or laugh, or cough+yes, cough+no, etc., i receive something like this:

INFO: pocketsphinx.c(851): 000000001: no (-6632)
INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 31 1.000 314404 -317862 1
INFO: no 32 37 1.000 5627 -6930 1
INFO: (NULL) 37 37 1.000 0 0 1
INFO: sil 38 52 1.000 315991 -317862 1
000000001: no

and if to laugh a bit longer:

INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 128 1.000 296296 -317862 1
INFO: sil 129 254 1.000 298816 -317862 1
INFO: sil 255 286 1.000 314725 -317862 1
INFO: yes 287 334 1.000 2324 -6930 1
INFO: (NULL) 334 334 1.000 0 0 1
000000010: yes
READY....

maybe i have made false settings?...

Last edit: Вадим 2013-02-09

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-02-09

"-fsgusefiller yes" parameter, using an easy jsgf model with (yes|no) rule - and couldn't catch the ++cough++, ++laugh++, or other fillers, only silences (sil). Perhaps i didn't set up the option well?

You also need to increase fillprob something like

-fillprob 0.1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Вадим - 2013-02-09
  
  wow, it works!) thanks a lot, and what about the
  
  Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..
  ?...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Вадим - 2013-02-09

quantize_mixw.py mixture_weights

but i receive this:
Traceback (most recent call last):
File "quantize_mixw.py", line 11, in <module>
import sphinxbase
ImportError: No module named sphinxbase</module>

i have built sphinxbase, pocketsphinx, sphinxtrain in one parent folder in cygwin for windows, made

export PATH=/usr/local/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

could you give an adivse?..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-02-09

could you give an adivse?..

Use Linux

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Вадим - 2013-02-09
  
  )) ok, i will try on a JVM
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Вадим - 2013-02-09
  
  I tried, it says "no module named numpy"...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Вадим - 2013-02-09

On windows tried in sphinxbase/python/setup_win32.py build + install, and now this:
Traceback (most recent call last):
File "quantize_mixw.py", line 183, in <module>
ifn, ofn = sys.argv[1:]
ValueError: need more than 1 value to unpack</module>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Вадим - 2013-02-09
  
  ow, it worked when i added the output file as the second parameter, didn't know... now worked
  
  Last edit: Вадим 2013-02-09
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Вадим - 2013-02-10

Thanks Nickolay, I almost got my many questons solved, and i have only some left:
1) do i need to use prune_mixw.py for mixture1 to become mixture2, then quantize_mixw.py for mixture2 to become mixture3?

1.1) do i need to use cluster_mixw.py? (it seems to be used also with mixtures)

1.2) what do these procedures do with acoustic model? they make it 'semi'? and why 'semi' is appropriate for android demo - because it is special for pocketsphinx, which is used by the smartphone, while cont is for sphinx4?

1.3) i also have noticed that 'semi' models have sendump, and no mixture_weights, and 'cont' have mixture_weights, but no sendump... where can i take sendump? or it is my "mixture3" file?

2)

create a tool to copy fillers from English models.

Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..

Last edit: Вадим 2013-02-10

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Вадим - 2013-02-10

i've made sendump from mixture_weights, but pocketsphinx fails when reading senone mixture weights, he still tries to look for mixture_weights...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Вадим - 2013-02-10

Hi Nickolay,
I still cannot find a way to tell the russian acoustic that it should look through the sendump file, and i still looking forward for your reply to the previous 2 messages...
Vadim

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-02-11

I still cannot find a way to tell the russian acoustic that it should look through the sendump file

Sendump is used for semi-continuous models only. Semi-continuous models are better for mobile device because they are faster in decoding.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Вадим - 2013-02-11
  
  Ok.. but what do i need to do then with quantize_mixw, prune_mixw to make my russian acoustic appropriate for mobile devices?
  
  English model is semi-continuous and moreover it's compressed for mobile applications (with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2013-02-12
    
    Those scripts reduce the model size in order to make it better fit the limitid memory of the mobile device.
    
    For more details you can read:
    
    http://www.cs.cmu.edu/~dhuggins/Publications/mixw_quant.pdf
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Вадим - 2013-02-12
      
      Thanks, Nickolay, i understand now the purpose of these scripts..
      But still, can you tell me by steps, what i need to do to compress the model? Do I need the both scripts to apply, and how?
      And which two integers i need to send as an argument to prune_mixw.py?
      (both the scripts return sendump, but i need modified mixture_weights to use in msu_ru_nsh which is 'cont'...)
      
      P.S. i've tried prune_mixw.py mixture_weights mixture_weights2,
      then sendump.py mixture_weights2 mixture_weights3 (as i understand, sendump.py makes mixture_weights from sendump), it told
      
      $ python sendump.py mixture_weights mixture_weights2 rows (model_count): 8, columns (mixture_count): 1156, features (feature_count): 1
      
      and when i try to use it in pocketsphinx (as mixture_weights):
      
      INFO: ms_senone.c(149): Reading senone mixture weights: model\hmm\msu_ru_nsh/mix ture_weights INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits INFO: ms_senone.c(207): Not transposing mixture weights in memory INFO: ms_senone.c(266): Read mixture weights for 1156 senones: 1 features x 8 co dewords INFO: ms_senone.c(320): Mapping senones to individual codebooks FATAL_ERROR: "ms_mgau.c", line 134: Senones need more codebooks (1156) than pres ent (1153)
      
      Last edit: Вадим 2013-02-12
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2013-02-14
        
        both the scripts return sendump, but i need modified mixture_weights to use in msu_ru_nsh which is 'cont'..
        
        You can only compress semi-continuous model
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pocketsphinx russian model: twice slower than the english one

Speech Recognition Toolkit

Forums

Help

Pocketsphinx russian model: twice slower than the english one document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

JSGF V1.0;

Pocketsphinx russian model: twice slower than the english one