I have managed to run the demo with tidigits acoustic model which was
recorded at 8khz samplerate and the accuracy is good. Now, I am beginning
to think, does it have to do with samplerate when using in the android
demo? Because, both an4 and my acoustic models use 16khz samplerate.
Other thing is that, in tidigits acoustic model, there are six files,
naming,mdef, feat.params, means,sendump,transition_matrices and variances
(../pocketsphinx/model/hmm/en/tidigits/). All are binary files except for
feat.params. In an4/myAcoustic(< https://www.dropbox.com/sh/qvnb3k8dl1lohl1/dfsAYEOL1m>),however there are 7
files, mdef, feat.params, means, transition_matrices, noisedict,variances
and mixture_weights . Mdef is in text format, and instead of using sendump,
mixture_weights is been used instead. These differences do they have to do
with my acoustic model or an4 one to perform poorly? Do I need to process
further my acoustic models or an4 one, to perform better when using in
android pocketsphinx demo?
Your help will be highly appreciated
Regards,
Alexander
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the quick response.
When, you say, it does depend on samplerate, do you mean, in android demo, the samplerate has to be strictly set to 8khz only and 16khz is not allowed?
Depends on what do you mean by "process"
I meant like, when you create the acoustic model from sphinxtrain using
this tutorial(http://cmusphinx.sourceforge.net/wiki/tutorialam) , can I use
it straight in the android demo as long as, it has a good quality? Because, I did create one, and I used it in sphinx4 application with some good recognition results. Unfortunately, now, when I tried to use in android demo, it is very poor.
Last edit: Nickolay V. Shmyrev 2013-06-24
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When, you say, it does depend on samplerate, do you mean, in android demo, the samplerate has to be strictly set to 8khz only and 16khz is not allowed?
Sample rate configured in android must match the sample rate used during model training
can I use it straight in the android demo as long as, it has a good quality?
Yes
Unfortunately, now, when I tried to use in android demo, it is very poor.
Android demo is configured to use 8khz model by default
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Android demo is configured to use 8khz model by default
My acoustic model is 16khz, and I did change it to that 16khz under:
c.setFloat("-samprate", 16000.0);
but still it gives me poor results. Can you check on my pocketsphinx.log:
Current configuration: [NAME][DEFLT][VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
but still it gives me poor results. Can you check on my pocketsphinx.log:
but still it gives me poor results. Can you check on my pocketsphinx.log:
You also need to set 16khz sample rate in audiorecorder constructor. To verify the sample rate you can listen for raw files collected on sdcard when -rawlogdir option is uncommented in sources.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much Nickolay. It worked. :). Actually, the results are much better here in pocketsphinx android demo than in sphinx4 application. I managed to change the samplerate under audiorecorder constructor. This
should be really noted out because, I believe many people will only change the samplerate in:
c.setFloat("-samprate", 16000.0);
and forget to change on the audiorecorder constructor as you suggested.
Thanks again, you have saved my day.
Last edit: Nickolay V. Shmyrev 2013-06-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi Nickolay,
I am having problem with pocketsphinx android demo running my own acoustic
model(Swahili acoustic model--not attached) and an4(created from <
http://cmusphinx.sourceforge.net/wiki/tutorialam>). These models performed
well in sphinx4 application (<
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4>) but when tried to
use under the pocketsphinx android demo, they are very poor.
I have managed to run the demo with tidigits acoustic model which was
recorded at 8khz samplerate and the accuracy is good. Now, I am beginning
to think, does it have to do with samplerate when using in the android
demo? Because, both an4 and my acoustic models use 16khz samplerate.
Other thing is that, in tidigits acoustic model, there are six files,
naming,mdef, feat.params, means,sendump,transition_matrices and variances
(../pocketsphinx/model/hmm/en/tidigits/). All are binary files except for
feat.params. In an4/myAcoustic(<
https://www.dropbox.com/sh/qvnb3k8dl1lohl1/dfsAYEOL1m>),however there are 7
files, mdef, feat.params, means, transition_matrices, noisedict,variances
and mixture_weights . Mdef is in text format, and instead of using sendump,
mixture_weights is been used instead. These differences do they have to do
with my acoustic model or an4 one to perform poorly? Do I need to process
further my acoustic models or an4 one, to perform better when using in
android pocketsphinx demo?
Your help will be highly appreciated
Regards,
Alexander
Yes
No
Depends on what do you mean by "process"
Thanks for the quick response.
When, you say, it does depend on samplerate, do you mean, in android demo, the samplerate has to be strictly set to 8khz only and 16khz is not allowed?
I meant like, when you create the acoustic model from sphinxtrain using
this tutorial(http://cmusphinx.sourceforge.net/wiki/tutorialam) , can I use
it straight in the android demo as long as, it has a good quality? Because, I did create one, and I used it in sphinx4 application with some good recognition results. Unfortunately, now, when I tried to use in android demo, it is very poor.
Last edit: Nickolay V. Shmyrev 2013-06-24
Sample rate configured in android must match the sample rate used during model training
Yes
Android demo is configured to use 8khz model by default
but still it gives me poor results. Can you check on my pocketsphinx.log:
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
but still it gives me poor results. Can you check on my pocketsphinx.log:
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(691): Parsing command line:
\
-alpha 0.97 \
-doublebw no \
-nfilt 40 \
-ncep 13 \
-lowerf 133.33334 \
-upperf 6855.4976 \
-nfft 512 \
-wlen 0.0256 \
-transform legacy \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-varnorm no
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: mdef.c(517): Reading model definition:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/mdef
INFO: bin_mdef.c(179): Allocating 35423 * 8 bytes (276 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/means
INFO: ms_gauden.c(292): 1093 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/variances
INFO: ms_gauden.c(292): 1093 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 11 variance values floored
INFO: acmod.c(123): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/means
INFO: ms_gauden.c(292): 1093 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/variances
INFO: ms_gauden.c(292): 1093 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 11 variance values floored
INFO: ptm_mgau.c(800): Number of codebooks exceeds 256: 1093
INFO: acmod.c(125): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/means
INFO: ms_gauden.c(292): 1093 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/variances
INFO: ms_gauden.c(292): 1093 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 11 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 1093 senones: 1 features x
8 codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(141): The value of topn: 4
INFO: dict.c(317): Allocating 4209 * 20 bytes (82 KiB) for word entries
INFO: dict.c(332): Reading main dictionary:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/lm/sw/testDict.dic
INFO: dict.c(211): Allocated 0 KiB for strings, 1 KiB for phones
INFO: dict.c(335): 110 words read
INFO: dict.c(341): Reading filler dictionary:
/mnt/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/sw/gelas/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 31^3 * 2 bytes (58 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 11656 bytes (11 KiB) for word-final
triphones
INFO: dict2pid.c(195): Allocated 11656 bytes (11 KiB) for single-phone word
triphones
INFO: fsg_search.c(145): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip:
-26, pip: 0)
INFO: jsgf.c(581): Defined rule: <hello.g00000>
INFO: jsgf.c(581): Defined rule: <hello.g00001>
INFO: jsgf.c(581): Defined rule: PUBLIC <hello.onedigit>
INFO: jsgf.c(353): Right recursion <hello.g00001> 6 => 2
INFO: fsg_model.c(215): Computing transitive closure for null transitions
INFO: fsg_model.c(270): 60 null transitions added
INFO: fsg_model.c(421): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(441): Added 17 silence word transitions
INFO: fsg_search.c(366): Added 0 alternate word transitions
INFO: fsg_lextree.c(108): Allocated 1088 bytes (1 KiB) for left and right
context phones
INFO: fsg_lextree.c(253): 97 HMM nodes in lextree (67 leaves)
INFO: fsg_lextree.c(255): Allocated 10476 bytes (10 KiB) for all lextree
nodes
INFO: fsg_lextree.c(258): Allocated 7236 bytes (7 KiB) for lextree leafnodes</sil></hello.g00001></hello.onedigit></hello.g00001></hello.g00000>
Could it be, on how I configured my pocketsphinx demo?
Last edit: Nickolay V. Shmyrev 2013-06-24
You also need to set 16khz sample rate in audiorecorder constructor. To verify the sample rate you can listen for raw files collected on sdcard when -rawlogdir option is uncommented in sources.
Thank you very much Nickolay. It worked. :). Actually, the results are much better here in pocketsphinx android demo than in sphinx4 application. I managed to change the samplerate under audiorecorder constructor. This
should be really noted out because, I believe many people will only change the samplerate in:
and forget to change on the audiorecorder constructor as you suggested.
Thanks again, you have saved my day.
Last edit: Nickolay V. Shmyrev 2013-06-25
hello as I can make my own acoustic model. thanks for your answer.