Hi, I'm new here and I've a project : building a java application that
recognises words form the arabic alphabet, can you please help me to know the
steps I've to do to creat this application, I was reading the tutorials here
about training, adapting acoustic model... but I don't know what to do
first...
speech recognition is something that I never read about this is why I've no
idea where to start from
I appreciate helping people having trouble. thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I'm new here and I've a project : building a java application that
recognises words form the arabic alphabet, can you please help me to know the
steps I've to do to creat this application, I was reading the tutorials here
about training, adapting acoustic model... but I don't know what to do
first...
First of all you need to collect the speech database. Then follow the acoustic
model training tutorial step-by-step to build an application you need. You
might also want to look on this project which can give you some references:
Hi, thank you
I was testing the tutorial about pocketsphinx, I thought the application will
tell me to say something and then recognize it, but the output was this:
hyt-med@Hyt-Med:~/Desktop$ gcc -o tst tst.c -DMODELDIR=\"pkg-config
--variable=modeldir pocketsphinx\" pkg-config --cflags --libs pocketsphinx
sphinxbase
hyt-med@Hyt-Med:~/Desktop$
hyt-med@Hyt-Med:~/Desktop$ ./tst
INFO: cmd_ln.c(691): Parsing command line:
\
-hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k \
-lm /usr/local/share/pocketsphinx/model/lm/en/turtle.DMP \
-dict /usr/local/share/pocketsphinx/model/lm/en/turtle.dic
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /usr/local/share/pocketsphinx/model/lm/en/turtle.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /usr/local/share/pocketsphinx/model/lm/en/turtle.DMP
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 56,-3,1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef
file
INFO: bin_mdef.c(336): Reading binary model definition:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150
CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/sha
re/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(908): Loading senones from dump file
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 4217 * 20 bytes (82 KiB) for word entries
INFO: dict.c(332): Reading main dictionary:
/usr/local/share/pocketsphinx/model/lm/en/turtle.dic
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(335): 110 words read
INFO: dict.c(341): Reading filler dictionary:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=91, 2=212, 3=177
INFO: ngram_model_dmp.c(242): 91 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(291): 212 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(317): 177 = LM.trigrams read
INFO: ngram_model_dmp.c(342): 20 = LM.prob2 entries read
INFO: ngram_model_dmp.c(362): 12 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(382): 12 = LM.prob3 entries read
INFO: ngram_model_dmp.c(410): 1 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 91 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 67 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 15 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 15
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 328
INFO: ngram_search_fwdtree.c(338): after: 67 root, 200 non-root channels, 14
single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 37.32 -0.91 0.57 0.52 -0.62 0.13 -0.06 0.28 0.39 0.59
0.12 -0.16 0.18
INFO: ngram_search_fwdtree.c(1549): 2000 words recognized (7/fr)
INFO: ngram_search_fwdtree.c(1551): 140003 senones evaluated (502/fr)
INFO: ngram_search_fwdtree.c(1553): 67926 channels searched (243/fr), 17687
1st, 27508 last
INFO: ngram_search_fwdtree.c(1557): 4342 words for which last channels
evaluated (15/fr)
INFO: ngram_search_fwdtree.c(1560): 4207 candidate words for entering last
phone (15/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 0.05 CPU 0.017 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 0.15 wall 0.054 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 24 words
INFO: ngram_search_fwdflat.c(940): 535 words recognized (2/fr)
INFO: ngram_search_fwdflat.c(942): 47071 senones evaluated (169/fr)
INFO: ngram_search_fwdflat.c(944): 37023 channels searched (132/fr)
INFO: ngram_search_fwdflat.c(946): 2159 words searched (7/fr)
INFO: ngram_search_fwdflat.c(948): 1551 word transitions (5/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.007 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.007 xRT
INFO: ngram_search.c(1266): lattice start node .0 end node .213
INFO: ngram_search.c(1294): Eliminated 0 nodes before end node
INFO: ngram_search.c(1399): Lattice has 70 nodes, 21 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(:213:277) = -1409852
INFO: ps_lattice.c(1403): Joint P(O,S) = -1409968 P(S|O) = -116
INFO: ngram_search.c(888): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(891): bestpath 0.00 wall 0.000 xRT
Recognized: go forward ten meters
INFO: cmn_prior.c(121): cmn_prior_update: from < 37.32 -0.91 0.57 0.52 -0.62
0.13 -0.06 0.28 0.39 0.59 0.12 -0.16 0.18 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 37.32 -0.91 0.57 0.52 -0.62
0.13 -0.06 0.28 0.39 0.59 0.12 -0.16 0.18 >
INFO: ngram_search_fwdtree.c(1549): 2000 words recognized (7/fr)
INFO: ngram_search_fwdtree.c(1551): 140208 senones evaluated (503/fr)
INFO: ngram_search_fwdtree.c(1553): 67926 channels searched (243/fr), 17687
1st, 27508 last
INFO: ngram_search_fwdtree.c(1557): 4342 words for which last channels
evaluated (15/fr)
INFO: ngram_search_fwdtree.c(1560): 4207 candidate words for entering last
phone (15/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 0.04 CPU 0.016 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 0.05 wall 0.017 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 24 words
INFO: ngram_search_fwdflat.c(940): 535 words recognized (2/fr)
INFO: ngram_search_fwdflat.c(942): 47071 senones evaluated (169/fr)
INFO: ngram_search_fwdflat.c(944): 37023 channels searched (132/fr)
INFO: ngram_search_fwdflat.c(946): 2159 words searched (7/fr)
INFO: ngram_search_fwdflat.c(948): 1551 word transitions (5/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.007 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.007 xRT
INFO: ngram_search.c(1266): lattice start node .0 end node .213
INFO: ngram_search.c(1294): Eliminated 0 nodes before end node
INFO: ngram_search.c(1399): Lattice has 70 nodes, 21 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(:213:277) = -1409852
INFO: ps_lattice.c(1403): Joint P(O,S) = -1409968 P(S|O) = -116
INFO: ngram_search.c(888): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(891): bestpath 0.00 wall 0.000 xRT
Recognized: go forward ten meters
INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 0.09 CPU 0.017 xRT
INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 0.20 wall 0.036 xRT
INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 0.04 CPU 0.007 xRT
INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 0.04 wall 0.007 xRT
INFO: ngram_search.c(317): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(320): TOTAL bestpath 0.00 wall 0.000 xRT
hyt-med@Hyt-Med:~/Desktop$
If I want to use my data, should I change something here:
config = cmd_ln_init(NULL, ps_args(), TRUE,
"-hmm", MODELDIR "/hmm/en_US/hub4wsj_sc_8k",
"-lm", MODELDIR "/lm/en/turtle.DMP",
"-dict", MODELDIR "/lm/en/turtle.dic",
I want to build an application that tells you to say speak and press "enter"
or "ctrl-c" to stop speaking and after that it will show you what you did just
say...
you advised me to to collect the speech database, I'm using wavesurfer for
this, but I don't know if I need to configure it somehow, because I read
something about 16khz, 16bits... or just open it and record
Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To read data from microphone you need a different API than the one used in a
tutorial. You can read pocketsphinx/src/programs/continuous.c for example on
how to read the data from microphone using pocketsphinx.
you advised me to to collect the speech database, I'm using wavesurfer for
this, but I don't know if I need to configure it somehow, because I read
something about 16khz, 16bits... or just open it and record
You can resample files to the target format after recording
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you, but what I meant is to use my data (dic, lm...) and compare it with
the data from microphone..
I think I've to change the path here:
config = cmd_ln_init(NULL, ps_args(), TRUE, "-hmm", MODELDIR
"/hmm/en_US/hub4wsj_sc_8k", "-lm", MODELDIR "/lm/en/turtle.DMP", "-dict",
MODELDIR "/lm/en/turtle.dic",
and how to resample files (.wav) to the target format? a tutorial of recording
data to use in pocketsphinx or sphinx4 will be great help, and thank you so
match for having patience
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you, but what I meant is to use my data (dic, lm...) and compare it
with the data from microphone.. I think I've to change the path here: config =
cmd_ln_init(NULL, ps_args(), TRUE, "-hmm", MODELDIR
"/hmm/en_US/hub4wsj_sc_8k", "-lm", MODELDIR "/lm/en/turtle.DMP", "-dict",
MODELDIR "/lm/en/turtle.dic",
You are correct
and how to resample files (.wav) to the target format? a tutorial of
recording data to use in pocketsphinx or sphinx4 will be great help, and thank
you so match for having patience
thank you, the output of a wav file in audacity is :
Codec: Uncompressed 16-bit PCM audio
Channels: Stereo
Sample rate: 8000 Hz
Bitrate: N/A
Is there a way to make audacity or wavesurfer to record 16khz without
resampling, if there is no option, please tell me what's the command line to
do it with sox
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there a way to make audacity or wavesurfer to record 16khz without
resampling, if there is no option, please tell me what's the command line to
do it with sox
or you can collect your database and upload it somewhere and i will help you
in building acoustic , language model
and dictionary as well
can you help me doing that , please
my mail is
toneemy@yahoo.com
can you talk to me throw mail or messenger , kindly reply , if you do not have
time now to build to me , just guied me how to start because , i follow th cmu
and i don not get it. help me to understand and start and i will continue
thanks in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I'm new here and I've a project : building a java application that
recognises words form the arabic alphabet, can you please help me to know the
steps I've to do to creat this application, I was reading the tutorials here
about training, adapting acoustic model... but I don't know what to do
first...
speech recognition is something that I never read about this is why I've no
idea where to start from
I appreciate helping people having trouble. thank you
First of all you need to collect the speech database. Then follow the acoustic
model training tutorial step-by-step to build an application you need. You
might also want to look on this project which can give you some references:
https://sourceforge.net/projects/arabisc/
or you can collect your database and upload it somewhere
and i will help you in building acoustic , language model and dictionary as
well
Hi, thank you
I was testing the tutorial about pocketsphinx, I thought the application will
tell me to say something and then recognize it, but the output was this:
hyt-med@Hyt-Med:~/Desktop$ gcc -o tst tst.c -DMODELDIR=\"
pkg-config --variable=modeldir pocketsphinx\"pkg-config --cflags --libs pocketsphinx sphinxbasehyt-med@Hyt-Med:~/Desktop$
hyt-med@Hyt-Med:~/Desktop$ ./tst
INFO: cmd_ln.c(691): Parsing command line:
\
-hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k \
-lm /usr/local/share/pocketsphinx/model/lm/en/turtle.DMP \
-dict /usr/local/share/pocketsphinx/model/lm/en/turtle.dic
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /usr/local/share/pocketsphinx/model/lm/en/turtle.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /usr/local/share/pocketsphinx/model/lm/en/turtle.DMP
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(691): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 56,-3,1 \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 56,-3,1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef
file
INFO: bin_mdef.c(336): Reading binary model definition:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150
CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/sha
re/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(908): Loading senones from dump file
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 4217 * 20 bytes (82 KiB) for word entries
INFO: dict.c(332): Reading main dictionary:
/usr/local/share/pocketsphinx/model/lm/en/turtle.dic
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(335): 110 words read
INFO: dict.c(341): Reading filler dictionary:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=91, 2=212, 3=177
INFO: ngram_model_dmp.c(242): 91 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(291): 212 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(317): 177 = LM.trigrams read
INFO: ngram_model_dmp.c(342): 20 = LM.prob2 entries read
INFO: ngram_model_dmp.c(362): 12 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(382): 12 = LM.prob3 entries read
INFO: ngram_model_dmp.c(410): 1 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 91 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 67 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 15 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 15
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 328
INFO: ngram_search_fwdtree.c(338): after: 67 root, 200 non-root channels, 14
single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 37.32 -0.91 0.57 0.52 -0.62 0.13 -0.06 0.28 0.39 0.59
0.12 -0.16 0.18
INFO: ngram_search_fwdtree.c(1549): 2000 words recognized (7/fr)
INFO: ngram_search_fwdtree.c(1551): 140003 senones evaluated (502/fr)
INFO: ngram_search_fwdtree.c(1553): 67926 channels searched (243/fr), 17687
1st, 27508 last
INFO: ngram_search_fwdtree.c(1557): 4342 words for which last channels
evaluated (15/fr)
INFO: ngram_search_fwdtree.c(1560): 4207 candidate words for entering last
phone (15/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 0.05 CPU 0.017 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 0.15 wall 0.054 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 24 words
INFO: ngram_search_fwdflat.c(940): 535 words recognized (2/fr)
INFO: ngram_search_fwdflat.c(942): 47071 senones evaluated (169/fr)
INFO: ngram_search_fwdflat.c(944): 37023 channels searched (132/fr)
INFO: ngram_search_fwdflat.c(946): 2159 words searched (7/fr)
INFO: ngram_search_fwdflat.c(948): 1551 word transitions (5/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.007 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.007 xRT
INFO: ngram_search.c(1266): lattice start node
.0 end node.213INFO: ngram_search.c(1294): Eliminated 0 nodes before end node
INFO: ngram_search.c(1399): Lattice has 70 nodes, 21 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(:213:277) = -1409852
INFO: ps_lattice.c(1403): Joint P(O,S) = -1409968 P(S|O) = -116
INFO: ngram_search.c(888): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(891): bestpath 0.00 wall 0.000 xRT
Recognized: go forward ten meters
INFO: cmn_prior.c(121): cmn_prior_update: from < 37.32 -0.91 0.57 0.52 -0.62
0.13 -0.06 0.28 0.39 0.59 0.12 -0.16 0.18 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 37.32 -0.91 0.57 0.52 -0.62
0.13 -0.06 0.28 0.39 0.59 0.12 -0.16 0.18 >
INFO: ngram_search_fwdtree.c(1549): 2000 words recognized (7/fr)
INFO: ngram_search_fwdtree.c(1551): 140208 senones evaluated (503/fr)
INFO: ngram_search_fwdtree.c(1553): 67926 channels searched (243/fr), 17687
1st, 27508 last
INFO: ngram_search_fwdtree.c(1557): 4342 words for which last channels
evaluated (15/fr)
INFO: ngram_search_fwdtree.c(1560): 4207 candidate words for entering last
phone (15/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 0.04 CPU 0.016 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 0.05 wall 0.017 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 24 words
INFO: ngram_search_fwdflat.c(940): 535 words recognized (2/fr)
INFO: ngram_search_fwdflat.c(942): 47071 senones evaluated (169/fr)
INFO: ngram_search_fwdflat.c(944): 37023 channels searched (132/fr)
INFO: ngram_search_fwdflat.c(946): 2159 words searched (7/fr)
INFO: ngram_search_fwdflat.c(948): 1551 word transitions (5/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.007 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.007 xRT
INFO: ngram_search.c(1266): lattice start node
.0 end node.213INFO: ngram_search.c(1294): Eliminated 0 nodes before end node
INFO: ngram_search.c(1399): Lattice has 70 nodes, 21 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(:213:277) = -1409852
INFO: ps_lattice.c(1403): Joint P(O,S) = -1409968 P(S|O) = -116
INFO: ngram_search.c(888): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(891): bestpath 0.00 wall 0.000 xRT
Recognized: go forward ten meters
INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 0.09 CPU 0.017 xRT
INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 0.20 wall 0.036 xRT
INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 0.04 CPU 0.007 xRT
INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 0.04 wall 0.007 xRT
INFO: ngram_search.c(317): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(320): TOTAL bestpath 0.00 wall 0.000 xRT
hyt-med@Hyt-Med:~/Desktop$
If I want to use my data, should I change something here:
config = cmd_ln_init(NULL, ps_args(), TRUE,
"-hmm", MODELDIR "/hmm/en_US/hub4wsj_sc_8k",
"-lm", MODELDIR "/lm/en/turtle.DMP",
"-dict", MODELDIR "/lm/en/turtle.dic",
I want to build an application that tells you to say speak and press "enter"
or "ctrl-c" to stop speaking and after that it will show you what you did just
say...
you advised me to to collect the speech database, I'm using wavesurfer for
this, but I don't know if I need to configure it somehow, because I read
something about 16khz, 16bits... or just open it and record
Thank you.
To read data from microphone you need a different API than the one used in a
tutorial. You can read pocketsphinx/src/programs/continuous.c for example on
how to read the data from microphone using pocketsphinx.
You can resample files to the target format after recording
Thank you, but what I meant is to use my data (dic, lm...) and compare it with
the data from microphone..
I think I've to change the path here:
config = cmd_ln_init(NULL, ps_args(), TRUE, "-hmm", MODELDIR
"/hmm/en_US/hub4wsj_sc_8k", "-lm", MODELDIR "/lm/en/turtle.DMP", "-dict",
MODELDIR "/lm/en/turtle.dic",
and how to resample files (.wav) to the target format? a tutorial of recording
data to use in pocketsphinx or sphinx4 will be great help, and thank you so
match for having patience
You are correct
You can use sox to resample files
http://sox.sourceforge.net
thank you, the output of a wav file in audacity is :
Codec: Uncompressed 16-bit PCM audio
Channels: Stereo
Sample rate: 8000 Hz
Bitrate: N/A
Is there a way to make audacity or wavesurfer to record 16khz without
resampling, if there is no option, please tell me what's the command line to
do it with sox
I tryed this but I think it's for downsampling :/
sox original.wav -c 1 -r 16000 -w downsampled.wav
http://www.voxforge.org/home/submitspeech/linux/step-2
@hiyassat
can you help me doing that , please
my mail is
toneemy@yahoo.com
can you talk to me throw mail or messenger , kindly reply , if you do not have
time now to build to me , just guied me how to start because , i follow th cmu
and i don not get it. help me to understand and start and i will continue
thanks in advance