Hi Nickolay!
I am testing both English hub4wsj_sc_8k and Russian msu_ru_nsh acoustic and language models on pocketsphinx (both on Android and on PC), and have noticed that the decoder with English model is either initialized and works twice faster than the Russian one - why does this happen? Due to the bigger amount of data for making the language model? or the acoustic model is slower?
And only 4 more questions:
1) how did you make the msu_ru_nsh.lm.dmp? is it made from just a list of most sequent words from the dictionary, like
< s > один < / s>
< s > одна < / s>
....
or made of various sentences?
2) in russian acoustic model i didn't find the fillers like "noise", "cough", "knock" etc., only SIL. How can i easily add this stuff to the russian acoustic model, or maybe there is a ready russian solution? If no, perhaps i need to record these sounds and make an acoustic model adaptation? If so, do i need to add the fillers like ordinal words, except the pronouncation, which should be
++laugh++ +LAUGH+
++lipsmack++ +LIPSMACK+
++cough++ +COUGH+
++breath++ +BREATHE+
?
3) how do i need to add the fillers to the language model? (to check, if these fillers are appropriate for my task, to watch the utterance and see the timestamps, where the fillers had been recognized)? Should I add them just like an ordinal word, for example jsgf:
public = (иди вперед | иди назад | ++breathe++ | ++cough++ );
and creating .DMP:
< s > иди вперед < / s >
< s > иди назад < / s >
< s > ++breathe++ < / s >
Is this correct?
And are the fillers automatically detected, when lauching the ps_process_raw(), but just not shown?
4) a bit offtopic, but too small for a new topic to create: where i can find the chinese models? Is the zh_CN model for Chinese, zh_TW for Taiwan, and the acoustic is shared chinese - zh/tdt_sc_8k?
sorry for such a big message...
Last edit: Вадим 2013-02-07
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
and have noticed that the decoder with English model is either initialized and works twice faster than the Russian one - why does this happen?
English model is semi-continuous and moreover it's compressed for mobile applications (with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.
Actually zero Russian acoustic models are significantly better than voxforge ones, I'd recommend you to try them. Unfortunately they need a different dictionary.
1) how did you make the msu_ru_nsh.lm.dmp? is it made from just a list of most sequent words from the dictionary, like
I trained it on subset of http://lib.ru corpus. I have a newer model which is trained on a whole lib.ru. It's better than older lm.dmp which is more like a test model, not a real model.
in russian acoustic model i didn't find the fillers like "noise", "cough", "knock" etc., only SIL. How can i easily add this stuff to the russian acoustic model, or maybe there is a ready russian solution? If no, perhaps i need to record these sounds and make an acoustic model adaptation? If so, do i need to add the fillers like ordinal words, except the pronouncation, which should be
There is no easy way to add fillers into the acoustic model. Fillers are missing because they where not present in the training database. You need to add fillers in training database transcription and retrain the model
how do i need to add the fillers to the language model?
Fillers are added to a grammar automatically with -fsgusefiller option. You should not add them in the grammar itself.
4) a bit offtopic, but too small for a new topic to create: where i can find the chinese models? Is the zh_CN model for Chinese, zh_TW for Taiwan, and the acoustic is shared chinese - zh/tdt_sc_8k?
Yes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
Thanks very much for such a detailed response.
with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.
I found these scripts in sphinxtrain, but i still don't understand how do i need to run them and where to put my acoustic model as a parameter?
Actually zero Russian acoustic models are significantly better than voxforge ones, I'd recommend you to try them. Unfortunately they need a different dictionary.
I have tried this, it works faster, but has worse accuracy than msu_ru_nsh (if to choose zero acoustic and your msu_ru_nsh.lm.dmp. If change lm to zero's, the recognition process time multiplies a lot...
I trained it on subset of http://lib.ru corpus. I have a newer model which is trained on a whole lib.ru. It's better than older lm.dmp which is more like a test model, not a real model.
I have looked up for it and found a really great corpus base.. Will you make your new language model from whole lib.ru downloadable for others? or can you send it to me, i could test it)
Fillers are added to a grammar automatically with -fsgusefiller option. You should not add them in the grammar itself.
I made a simple jsgf file with (yes | no ) rule, enabled -fsgusefiller, but still cannot get the fillers except sil like ++cough++, ++laugh++ or else.. I would like to implement an ability to register somebody coughing/laughing to handle it somehow in my application, so i am really interested in this...
Last edit: Вадим 2013-02-08
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
how do i need to run them and where to put my acoustic model as a parameter?
quantize_mixw.py mixture_weights
I have tried this, it works faster, but has worse accuracy than msu_ru_nsh (if to choose zero acoustic and your msu_ru_nsh.lm.dmp.
Probably you need to perform more experiments. Also you can train semi-continuous model from Voxforge data
Will you make your new language model from whole lib.ru downloadable for others?
Not right now
I would like to implement an ability to register somebody coughing/laughing to handle it somehow in my application, so i am really interested in this...
Russian acoustic models do not have filler sounds, so they are not recognized. You need to train another model with fillers if you need them or create a tool to copy fillers from English models.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, sir.
Is it possible to see your tuned dictionaries?
I've tried to translate http://www.youtube.com/watch?v=3tqBH7lhlOw (00:01:00-00:02:30)
voxforge-ru-0.2: http://pastebin.com/x7BKfDP2
zero_ru_cont_8k_v2: http://pastebin.com/uU2sC7w4
Both attempts have very poor results, of course my bad knowledge with sphinx is the main problem here.
I can use sphinx on desktop computer, no need in small device optimization.
Thanks in advance.
Last edit: John8394 2015-03-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
create a tool to copy fillers from English models.
Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..
Russian acoustic models do not have filler sounds, so they are not recognized. You need to train another model with fillers if you need them or create a tool to copy fillers from English models.
I meant that i tried the English model (on pocketsphinx_continuous.exe) with a
"-fsgusefiller yes" parameter, using an easy jsgf model with (yes|no) rule - and couldn't catch the ++cough++, ++laugh++, or other fillers, only silences (sil). Perhaps i didn't set up the option well?
my pocketsphinx_continuous.exe parameters:
-hmm model\hmm\en_US\hub4wsj_sc_8k -jsgf model\lm\en_US\fillers.gram -samprate 8000 -dict model\lm\en_US\cmu07a.dic -fsgusefiller yes -backtrace yes -bestpath no -nbest no
my fillers.gram file:
JSGF V1.0;
grammar fillers;
public <idle> = (no | yes);
and here are the logs:
INFO: jsgf.c(581): Defined rule: <fillers.g00000>
INFO: jsgf.c(581): Defined rule: PUBLIC <fillers.idle>
INFO: fsg_model.c(215): Computing transitive closure for null transitions
INFO: fsg_model.c(270): 2 null transitions added
INFO: fsg_model.c(421): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++NOISE++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++BREATH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++SMACK++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++COUGH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++LAUGH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++TONE++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++UH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++UM++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_search.c(366): Added 0 alternate word transitions
INFO: fsg_lextree.c(108): Allocated 612 bytes (0 KiB) for left and right context
phones
INFO: fsg_lextree.c(253): 59 HMM nodes in lextree (56 leaves)
INFO: fsg_lextree.c(255): Allocated 6372 bytes (6 KiB) for all lextree nodes
INFO: fsg_lextree.c(258): Allocated 6048 bytes (5 KiB) for lextree leafnodes
INFO: continuous.c(371): bin\Release\pocketsphinx_continuous.exe COMPILED ON: De
c 13 2012, AT: 17:37:32
Allocating 32 buffers of 2500 samples each
READY....
Listening...
Stopped listening, please wait...
INFO: fsg_search.c(1032): 110 frames, 1028 HMMs (9/fr), 3453 senones (31/fr), 78
3 history entries (7/fr)
INFO: pocketsphinx.c(851): 000000000: no (-12386)
INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 32 1.000 314409 -317862 1
INFO: no 33 38 1.000 5692 -6930 1
INFO: (NULL) 38 38 1.000 0 0 1
INFO: sil 39 109 1.000 310167 -317862 1
000000000: no
READY....
Listening...
Stopped listening, please wait...
INFO: fsg_search.c(1032): 53 frames, 494 HMMs (9/fr), 1665 senones (31/fr), 231
history entries (4/fr)
INFO: pocketsphinx.c(851): 000000001: no (-6632)
INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 31 1.000 314404 -317862 1
INFO: no 32 37 1.000 5627 -6930 1
INFO: (NULL) 37 37 1.000 0 0 1
INFO: sil 38 52 1.000 315991 -317862 1
000000001: no
so, if i try to cough, or laugh, or cough+yes, cough+no, etc., i receive something like this:
INFO: pocketsphinx.c(851): 000000001: no (-6632)
INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 31 1.000 314404 -317862 1
INFO: no 32 37 1.000 5627 -6930 1
INFO: (NULL) 37 37 1.000 0 0 1
INFO: sil 38 52 1.000 315991 -317862 1
000000001: no
"-fsgusefiller yes" parameter, using an easy jsgf model with (yes|no) rule - and couldn't catch the ++cough++, ++laugh++, or other fillers, only silences (sil). Perhaps i didn't set up the option well?
You also need to increase fillprob something like
-fillprob 0.1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
but i receive this:
Traceback (most recent call last):
File "quantize_mixw.py", line 11, in <module>
import sphinxbase
ImportError: No module named sphinxbase
i have built sphinxbase, pocketsphinx, sphinxtrain in one parent folder in cygwin for windows, made
On windows tried in sphinxbase/python/setup_win32.py build + install, and now this:
Traceback (most recent call last):
File "quantize_mixw.py", line 183, in <module>
ifn, ofn = sys.argv[1:]
ValueError: need more than 1 value to unpack
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Nickolay, I almost got my many questons solved, and i have only some left:
1) do i need to use prune_mixw.py for mixture1 to become mixture2, then quantize_mixw.py for mixture2 to become mixture3?
1.1) do i need to use cluster_mixw.py? (it seems to be used also with mixtures)
1.2) what do these procedures do with acoustic model? they make it 'semi'? and why 'semi' is appropriate for android demo - because it is special for pocketsphinx, which is used by the smartphone, while cont is for sphinx4?
1.3) i also have noticed that 'semi' models have sendump, and no mixture_weights, and 'cont' have mixture_weights, but no sendump... where can i take sendump? or it is my "mixture3" file?
2)
create a tool to copy fillers from English models.
Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..
Last edit: Вадим 2013-02-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
I still cannot find a way to tell the russian acoustic that it should look through the sendump file, and i still looking forward for your reply to the previous 2 messages...
Vadim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok.. but what do i need to do then with quantize_mixw, prune_mixw to make my russian acoustic appropriate for mobile devices?
English model is semi-continuous and moreover it's compressed for mobile applications (with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks, Nickolay, i understand now the purpose of these scripts..
But still, can you tell me by steps, what i need to do to compress the model? Do I need the both scripts to apply, and how?
And which two integers i need to send as an argument to prune_mixw.py?
(both the scripts return sendump, but i need modified mixture_weights to use in msu_ru_nsh which is 'cont'...)
P.S. i've tried prune_mixw.py mixture_weights mixture_weights2,
then sendump.py mixture_weights2 mixture_weights3 (as i understand, sendump.py makes mixture_weights from sendump), it told
Hi Nickolay!
I am testing both English hub4wsj_sc_8k and Russian msu_ru_nsh acoustic and language models on pocketsphinx (both on Android and on PC), and have noticed that the decoder with English model is either initialized and works twice faster than the Russian one - why does this happen? Due to the bigger amount of data for making the language model? or the acoustic model is slower?
And only 4 more questions:
1) how did you make the msu_ru_nsh.lm.dmp? is it made from just a list of most sequent words from the dictionary, like
or made of various sentences?
2) in russian acoustic model i didn't find the fillers like "noise", "cough", "knock" etc., only SIL. How can i easily add this stuff to the russian acoustic model, or maybe there is a ready russian solution? If no, perhaps i need to record these sounds and make an acoustic model adaptation? If so, do i need to add the fillers like ordinal words, except the pronouncation, which should be
++laugh++ +LAUGH+
++lipsmack++ +LIPSMACK+
++cough++ +COUGH+
++breath++ +BREATHE+
?
3) how do i need to add the fillers to the language model? (to check, if these fillers are appropriate for my task, to watch the utterance and see the timestamps, where the fillers had been recognized)? Should I add them just like an ordinal word, for example jsgf: = (иди вперед | иди назад | ++breathe++ | ++cough++ );
public
and creating .DMP:
Is this correct?
And are the fillers automatically detected, when lauching the ps_process_raw(), but just not shown?
4) a bit offtopic, but too small for a new topic to create: where i can find the chinese models? Is the zh_CN model for Chinese, zh_TW for Taiwan, and the acoustic is shared chinese - zh/tdt_sc_8k?
sorry for such a big message...
Last edit: Вадим 2013-02-07
English model is semi-continuous and moreover it's compressed for mobile applications (with pocketsphinx/scripts/prune_mixw.py and pocketsphinx/scripts/quantize_mixw.py). You need to do the same with Russian in order to archive same performance point.
Actually zero Russian acoustic models are significantly better than voxforge ones, I'd recommend you to try them. Unfortunately they need a different dictionary.
I trained it on subset of http://lib.ru corpus. I have a newer model which is trained on a whole lib.ru. It's better than older lm.dmp which is more like a test model, not a real model.
For lm training tools see http://cmusphinx.sourceforge.net/wiki/tutoriallm
There is no easy way to add fillers into the acoustic model. Fillers are missing because they where not present in the training database. You need to add fillers in training database transcription and retrain the model
Fillers are added to a grammar automatically with -fsgusefiller option. You should not add them in the grammar itself.
Yes
Hi Nickolay,
Thanks very much for such a detailed response.
I found these scripts in sphinxtrain, but i still don't understand how do i need to run them and where to put my acoustic model as a parameter?
I have tried this, it works faster, but has worse accuracy than msu_ru_nsh (if to choose zero acoustic and your msu_ru_nsh.lm.dmp. If change lm to zero's, the recognition process time multiplies a lot...
I have looked up for it and found a really great corpus base.. Will you make your new language model from whole lib.ru downloadable for others? or can you send it to me, i could test it)
I made a simple jsgf file with (yes | no ) rule, enabled -fsgusefiller, but still cannot get the fillers except sil like ++cough++, ++laugh++ or else.. I would like to implement an ability to register somebody coughing/laughing to handle it somehow in my application, so i am really interested in this...
Last edit: Вадим 2013-02-08
Hi Nickolay,
Could you please help me with the problem in the message above?..
Nickolay, are you here?...
quantize_mixw.py mixture_weights
Probably you need to perform more experiments. Also you can train semi-continuous model from Voxforge data
Not right now
Russian acoustic models do not have filler sounds, so they are not recognized. You need to train another model with fillers if you need them or create a tool to copy fillers from English models.
Hello, sir.
Is it possible to see your tuned dictionaries?
I've tried to translate http://www.youtube.com/watch?v=3tqBH7lhlOw (00:01:00-00:02:30)
voxforge-ru-0.2: http://pastebin.com/x7BKfDP2
zero_ru_cont_8k_v2: http://pastebin.com/uU2sC7w4
Both attempts have very poor results, of course my bad knowledge with sphinx is the main problem here.
I can use sphinx on desktop computer, no need in small device optimization.
Thanks in advance.
Last edit: John8394 2015-03-14
Large vocabulary Russian speech recognition is not possible with current models, they are too small.
Thanks very much for your reply.
I will try this.
Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..
I meant that i tried the English model (on pocketsphinx_continuous.exe) with a
"-fsgusefiller yes" parameter, using an easy jsgf model with (yes|no) rule - and couldn't catch the ++cough++, ++laugh++, or other fillers, only silences (sil). Perhaps i didn't set up the option well?
my pocketsphinx_continuous.exe parameters:
-hmm model\hmm\en_US\hub4wsj_sc_8k -jsgf model\lm\en_US\fillers.gram -samprate 8000 -dict model\lm\en_US\cmu07a.dic -fsgusefiller yes -backtrace yes -bestpath no -nbest no
my fillers.gram file:
JSGF V1.0;
grammar fillers;
public <idle> = (no | yes);
and here are the logs:
INFO: jsgf.c(581): Defined rule: <fillers.g00000>
INFO: jsgf.c(581): Defined rule: PUBLIC <fillers.idle>
INFO: fsg_model.c(215): Computing transitive closure for null transitions
INFO: fsg_model.c(270): 2 null transitions added
INFO: fsg_model.c(421): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++NOISE++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++BREATH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++SMACK++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++COUGH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++LAUGH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++TONE++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++UH++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for ++UM++ to FSG
INFO: fsg_model.c(441): Added 6 silence word transitions
INFO: fsg_search.c(366): Added 0 alternate word transitions
INFO: fsg_lextree.c(108): Allocated 612 bytes (0 KiB) for left and right context
phones
INFO: fsg_lextree.c(253): 59 HMM nodes in lextree (56 leaves)
INFO: fsg_lextree.c(255): Allocated 6372 bytes (6 KiB) for all lextree nodes
INFO: fsg_lextree.c(258): Allocated 6048 bytes (5 KiB) for lextree leafnodes
INFO: continuous.c(371): bin\Release\pocketsphinx_continuous.exe COMPILED ON: De
c 13 2012, AT: 17:37:32
Allocating 32 buffers of 2500 samples each
READY....
Listening...
Stopped listening, please wait...
INFO: fsg_search.c(1032): 110 frames, 1028 HMMs (9/fr), 3453 senones (31/fr), 78
3 history entries (7/fr)
INFO: pocketsphinx.c(851): 000000000: no (-12386)
INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 32 1.000 314409 -317862 1
INFO: no 33 38 1.000 5692 -6930 1
INFO: (NULL) 38 38 1.000 0 0 1
INFO: sil 39 109 1.000 310167 -317862 1
000000000: no
READY....
Listening...
Stopped listening, please wait...
INFO: fsg_search.c(1032): 53 frames, 494 HMMs (9/fr), 1665 senones (31/fr), 231
history entries (4/fr)
INFO: pocketsphinx.c(851): 000000001: no (-6632)
INFO: word start end pprob ascr lscr lback
INFO: (NULL) -1 -1 1.000 0 0 1
INFO: sil 0 31 1.000 314404 -317862 1
INFO: no 32 37 1.000 5627 -6930 1
INFO: (NULL) 37 37 1.000 0 0 1
INFO: sil 38 52 1.000 315991 -317862 1
000000001: no
so, if i try to cough, or laugh, or cough+yes, cough+no, etc., i receive something like this:
and if to laugh a bit longer:
maybe i have made false settings?...
Last edit: Вадим 2013-02-09
You also need to increase fillprob something like
wow, it works!) thanks a lot, and what about the
but i receive this:
Traceback (most recent call last):
File "quantize_mixw.py", line 11, in <module>
import sphinxbase
ImportError: No module named sphinxbase
i have built sphinxbase, pocketsphinx, sphinxtrain in one parent folder in cygwin for windows, made
could you give an adivse?..
Use Linux
)) ok, i will try on a JVM
I tried, it says "no module named numpy"...
On windows tried in sphinxbase/python/setup_win32.py build + install, and now this:
Traceback (most recent call last):
File "quantize_mixw.py", line 183, in <module>
ifn, ofn = sys.argv[1:]
ValueError: need more than 1 value to unpack
ow, it worked when i added the output file as the second parameter, didn't know... now worked
Last edit: Вадим 2013-02-09
Thanks Nickolay, I almost got my many questons solved, and i have only some left:
1) do i need to use prune_mixw.py for mixture1 to become mixture2, then quantize_mixw.py for mixture2 to become mixture3?
1.1) do i need to use cluster_mixw.py? (it seems to be used also with mixtures)
1.2) what do these procedures do with acoustic model? they make it 'semi'? and why 'semi' is appropriate for android demo - because it is special for pocketsphinx, which is used by the smartphone, while cont is for sphinx4?
1.3) i also have noticed that 'semi' models have sendump, and no mixture_weights, and 'cont' have mixture_weights, but no sendump... where can i take sendump? or it is my "mixture3" file?
2)
Does this mean to make it possible to use Russian acoustic, and simultaneously the English one, but only returning the fillers? Or what?..
Last edit: Вадим 2013-02-10
i've made sendump from mixture_weights, but pocketsphinx fails when reading senone mixture weights, he still tries to look for mixture_weights...
Hi Nickolay,
I still cannot find a way to tell the russian acoustic that it should look through the sendump file, and i still looking forward for your reply to the previous 2 messages...
Vadim
Sendump is used for semi-continuous models only. Semi-continuous models are better for mobile device because they are faster in decoding.
Ok.. but what do i need to do then with quantize_mixw, prune_mixw to make my russian acoustic appropriate for mobile devices?
Those scripts reduce the model size in order to make it better fit the limitid memory of the mobile device.
For more details you can read:
http://www.cs.cmu.edu/~dhuggins/Publications/mixw_quant.pdf
Thanks, Nickolay, i understand now the purpose of these scripts..
But still, can you tell me by steps, what i need to do to compress the model? Do I need the both scripts to apply, and how?
And which two integers i need to send as an argument to prune_mixw.py?
(both the scripts return sendump, but i need modified mixture_weights to use in msu_ru_nsh which is 'cont'...)
P.S. i've tried prune_mixw.py mixture_weights mixture_weights2,
then sendump.py mixture_weights2 mixture_weights3 (as i understand, sendump.py makes mixture_weights from sendump), it told
and when i try to use it in pocketsphinx (as mixture_weights):
Last edit: Вадим 2013-02-12
You can only compress semi-continuous model