Hi Nick,
I am taking your advise to create a new thread even I have the same problem
with thread https://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/4553620
(how to build acoustic model)
I am creating an android application that need a small vocabulary (~100
words), but let make it easy, I create 10 digits (0-9)
I read taking your advice on the forum link above, so here is what I come up
with.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
digit_train.transcription: ONE (spk1_one) TWO (spk1_two) THREE (spk1_three) FOUR (spk1_four) FIVE (spk1_five) SIX (spk1_six) SEVEN (spk1_seven) EIGHT (spk1_eight) NINE (spk1_nine) ZERO (spk1_zero) ONE (spk2_one) TWO (spk2_two) THREE (spk2_three) FOUR (spk2_four) FIVE (spk2_five) SIX (spk2_six) SEVEN (spk2_seven) EIGHT (spk2_eight) NINE (spk2_nine) ZERO (spk2_zero) ONE (spk3_one) TWO (spk3_two) THREE (spk3_three) FOUR (spk3_four) FIVE (spk3_five) SIX (spk3_six) SEVEN (spk3_seven) EIGHT (spk3_eight) NINE (spk3_nine) ZERO (spk3_zero)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Training for 2 Gaussian(s) completed after 6 iterations
MODULE: 60 Lattice Generation
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 61 Lattice Pruning
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 62 Lattice Format Conversion
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 65 MMIE Training
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 90 deleted interpolation
Skipped for continuous models
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Eliasmagic, **How many people do you think that would be enough? I am looking
for more people to help with recording, but before doing this. I would like
you to check to see any thing wrong with my audio files. **
I set CFG_WAVFILE_TYPE = 'mswav", but I record voice using linux. Does it
conflict with my current setting?
Audio waveform and feature file information
$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
$CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
sphinx_train.cfg said that set CFG_HMM_TYPE = '.semi.'; which for PocketSphinx
and Sphinx II, but I see most people set $CFG_HMM_TYPE = '.cont.'. so I set my
$CFG_HMM_TYPE = '.cont.'.
Below is my current setting.... can you please check?
$CFG_HMM_TYPE = '.cont.'; # Sphinx III
$CFG_HMM_TYPE = '.semi.'; # PocketSphinx and Sphinx II
$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)
RecognizerTask.java I set
c.setString("-hmm",
"/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/tv/digit.cd_cont_200");
c.setString("-dict",
"/sdcard/Android/data/edu.cmu.pocketsphinx/lm/tv/digit.dic");
c.setString("-lm",
"/sdcard/Android/data/edu.cmu.pocketsphinx/lm/tv/digit.lm.DMP");
Hi eliasmajic,
I build the it for an application that only need ~100 commands. I did tried
the existing model which included in the pocketsphinx directory. it takes
along time to load the model, because it is large, and it does not translate
text correctly either (totally wrong), so I do not adapt it. I tried the
models below but no successful
US English WSJ5K
US English HUB4
Can you suggest me a good one that can be used for android?
Do I need to adapt the acoustic model before using it?
I tried tidigits model, and it is loading fast and accurate. That is why I
decided to create a new model. Well, I am in the learning process, so I would
like to try any thing that works first.
thank you very much for your help
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nick,
I am taking your advise to create a new thread even I have the same problem
with thread
https://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/4553620
(how to build acoustic model)
I am creating an android application that need a small vocabulary (~100
words), but let make it easy, I create 10 digits (0-9)
I read taking your advice on the forum link above, so here is what I come up
with.
digit.dic:
EIGHT EY T
FIVE F AY V
FOUR F AO R
NINE N AY N
ONE W AH N
SEVEN S EH V AH N
SIX S IH K S
THREE TH R IY
TWO T UW
ZERO Z IH R OW
ZERO(2) Z IY R OW
digit.phone:
EH
EY
F
IH
IY
K
N
OW
R
S
SIL
T
TH
UW
V
W
Z
AH
AO
AY
digit_train.fileids:
peaker1/spk1_one
speaker1/spk1_two
speaker1/spk1_three
speaker1/spk1_four
speaker1/spk1_five
speaker1/spk1_six
speaker1/spk1_seven
speaker1/spk1_eight
speaker1/spk1_nine
speaker1/spk1_zero
speaker2/spk2_one
speaker2/spk2_two
speaker2/spk2_three
speaker2/spk2_four
speaker2/spk2_five
speaker2/spk2_six
speaker2/spk2_seven
speaker2/spk2_eight
speaker2/spk2_nine
speaker2/spk2_zero
speaker3/spk3_one
speaker3/spk3_two
speaker3/spk3_three
speaker3/spk3_four
speaker3/spk3_five
speaker3/spk3_six
speaker3/spk3_seven
speaker3/spk3_eight
speaker3/spk3_nine
speaker3/spk3_zero
digit_train.transcription:
ONE(spk1_one)TWO(spk1_two)THREE(spk1_three)FOUR(spk1_four)FIVE(spk1_five)SIX(spk1_six)SEVEN(spk1_seven)EIGHT(spk1_eight)NINE(spk1_nine)ZERO(spk1_zero)ONE(spk2_one)TWO(spk2_two)THREE(spk2_three)FOUR(spk2_four)FIVE(spk2_five)SIX(spk2_six)SEVEN(spk2_seven)EIGHT(spk2_eight)NINE(spk2_nine)ZERO(spk2_zero)ONE(spk3_one)TWO(spk3_two)THREE(spk3_three)FOUR(spk3_four)FIVE(spk3_five)SIX(spk3_six)SEVEN(spk3_seven)EIGHT(spk3_eight)NINE(spk3_nine)ZERO(spk3_zero)After ./script_pl/RunAll.pl
I got this message:
Training for 2 Gaussian(s) completed after 6 iterations
MODULE: 60 Lattice Generation
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 61 Lattice Pruning
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 62 Lattice Format Conversion
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 65 MMIE Training
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 90 deleted interpolation
Skipped for continuous models
I assume that ./script_pl/RunAll.pl is successful, so I run
root@ubuntu:/home/hoangb/Projects/Android/v2text/digit#
./scripts_pl/decode/slave.pl
MODULE: DECODE Decoding using models previously trained
Decoding 30 segments starting at 0 (part 1 of 1)
0%
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Aligning results to find error rate
SENTENCE ERROR: 13.3% (4/30) WORD ERROR RATE: 13.3% (3/30)
I dont see a question anywhere but your training data set has far to little
audio.
I run
pocketsphinx_continuous -hmm model_parameters/digit.cd_cont_1000 -lm
etc/digit.lm -dict etc/digit.dic and start speak
INFO: acmod.c(242): Parsed model-specific feature parameters from
model_parameters/digit.cd_cont_1000/feat.params
INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition:
model_parameters/digit.cd_cont_1000/mdef
INFO: bin_mdef.c(173): Allocating 373 * 8 bytes (2 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices:
model_parameters/digit.cd_cont_1000/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
model_parameters/digit.cd_cont_1000/means
INFO: ms_gauden.c(292): 153 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
model_parameters/digit.cd_cont_1000/variances
INFO: ms_gauden.c(292): 153 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 40932 variance values floored
INFO: acmod.c(119): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
model_parameters/digit.cd_cont_1000/means
INFO: ms_gauden.c(292): 153 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
model_parameters/digit.cd_cont_1000/variances
INFO: ms_gauden.c(292): 153 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 40932 variance values floored
INFO: ptm_mgau.c(804): Number of codebooks doesn't match number of ciphones,
doesn't look like PTM: 153 20
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
model_parameters/digit.cd_cont_1000/means
INFO: ms_gauden.c(292): 153 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
model_parameters/digit.cd_cont_1000/variances
INFO: ms_gauden.c(292): 153 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 40932 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights:
model_parameters/digit.cd_cont_1000/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 153 senones: 1 features x 8
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(122): The value of topn: 4
INFO: dict.c(306): Allocating 4110 * 20 bytes (80 KiB) for word entries
INFO: dict.c(321): Reading main dictionary: etc/digit.dic
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(324): 11 words read
INFO: dict.c(330): Reading filler dictionary:
model_parameters/digit.cd_cont_1000/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(333): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 20^3 * 2 bytes (15 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 4880 bytes (4 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 4880 bytes (4 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(477): ngrams 1=12, 2=20, 3=10
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 12 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(533): 20 = #bigrams created
INFO: ngram_model_arpa.c(534): 3 = #prob2 entries
INFO: ngram_model_arpa.c(542): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(555): 10 = #trigrams created
INFO: ngram_model_arpa.c(556): 2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 11 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 142
INFO: ngram_search_fwdtree.c(338): after: 11 root, 14 non-root channels, 3
single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(367): pocketsphinx_continuous COMPILED ON: Sep 11 2011, AT:
02:12:53
Warning: Could not find Mic element
READY....
Listening...
Recording is stopped, start recording with ad_start_rec
Stopped listening, please wait...
INFO: cmn_prior.c(121): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 13.76 -0.00 -0.24 0.02 -0.25
-0.06 -0.20 -0.11 -0.14 -0.14 -0.16 -0.12 -0.30 >
INFO: ngram_search_fwdtree.c(1549): 455 words recognized (1/fr)
INFO: ngram_search_fwdtree.c(1551): 8403 senones evaluated (22/fr)
INFO: ngram_search_fwdtree.c(1553): 3478 channels searched (9/fr), 2675 1st,
803 last
INFO: ngram_search_fwdtree.c(1557): 803 words for which last channels
evaluated (2/fr)
INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 0.05 CPU 0.014 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 4.69 wall 1.230 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 2 words
INFO: ngram_search_fwdflat.c(940): 281 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(942): 1140 senones evaluated (3/fr)
INFO: ngram_search_fwdflat.c(944): 751 channels searched (1/fr)
INFO: ngram_search_fwdflat.c(946): 751 words searched (1/fr)
INFO: ngram_search_fwdflat.c(948): 76 word transitions (0/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.001 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.001 xRT
INFO: ngram_search.c(1201): not found in last frame, using <sil>.379
instead
INFO: ngram_search.c(1253): lattice start node
.0 end node <sil>.226:42:81) = -87429INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
INFO: ngram_search.c(1386): Lattice has 9 nodes, 10 links
INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(<sil>:226:379) = -287518
INFO: ps_lattice.c(1390): Joint P(O,S) = -287518 P(S|O) = 0
INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
000000000:
READY....
Listening...
Recording is stopped, start recording with ad_start_rec
Stopped listening, please wait...
INFO: cmn_prior.c(121): cmn_prior_update: from < 13.76 -0.00 -0.24 0.02 -0.25
-0.06 -0.20 -0.11 -0.14 -0.14 -0.16 -0.12 -0.30 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 13.78 -0.04 -0.21 0.02 -0.23
-0.06 -0.19 -0.12 -0.15 -0.16 -0.17 -0.13 -0.30 >
INFO: ngram_search_fwdtree.c(1549): 181 words recognized (2/fr)
INFO: ngram_search_fwdtree.c(1551): 2406 senones evaluated (29/fr)
INFO: ngram_search_fwdtree.c(1553): 1003 channels searched (12/fr), 792 1st,
211 last
INFO: ngram_search_fwdtree.c(1557): 211 words for which last channels
evaluated (2/fr)
INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 0.02 CPU 0.019 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 1.67 wall 2.010 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 2 words
INFO: ngram_search_fwdflat.c(940): 127 words recognized (2/fr)
INFO: ngram_search_fwdflat.c(942): 246 senones evaluated (3/fr)
INFO: ngram_search_fwdflat.c(944): 303 channels searched (3/fr)
INFO: ngram_search_fwdflat.c(946): 303 words searched (3/fr)
INFO: ngram_search_fwdflat.c(948): 76 word transitions (0/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat -0.00 CPU -0.000 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.001 xRT
INFO: ngram_search.c(1253): lattice start node
.0 end node.42INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
INFO: ngram_search.c(1386): Lattice has 9 nodes, 4 links
INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</sil></sil>
INFO: ps_lattice.c(1390): Joint P(O,S) = -87429 P(S|O) = 0
INFO: ngram_search.c(875): bestpath -0.00 CPU -0.000 xRT
INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
000000001:
READY....
Listening...
Recording is stopped, start recording with ad_start_rec
Stopped listening, please wait...
INFO: cmn_prior.c(121): cmn_prior_update: from < 13.78 -0.04 -0.21 0.02 -0.23
-0.06 -0.19 -0.12 -0.15 -0.16 -0.17 -0.13 -0.30 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 13.86 -0.09 -0.18 0.06 -0.23
-0.08 -0.19 -0.14 -0.17 -0.18 -0.18 -0.14 -0.28 >
INFO: ngram_search_fwdtree.c(1549): 189 words recognized (2/fr)
INFO: ngram_search_fwdtree.c(1551): 2745 senones evaluated (29/fr)
INFO: ngram_search_fwdtree.c(1553): 1131 channels searched (11/fr), 902 1st,
229 last
INFO: ngram_search_fwdtree.c(1557): 229 words for which last channels
evaluated (2/fr)
INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 0.02 CPU 0.021 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 1.86 wall 1.933 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 2 words
INFO: ngram_search_fwdflat.c(940): 133 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(942): 285 senones evaluated (3/fr)
INFO: ngram_search_fwdflat.c(944): 334 channels searched (3/fr)
INFO: ngram_search_fwdflat.c(946): 334 words searched (3/fr)
INFO: ngram_search_fwdflat.c(948): 71 word transitions (0/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat -0.00 CPU -0.000 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.001 xRT
INFO: ngram_search.c(1253): lattice start node
.0 end node.90INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
INFO: ngram_search.c(1386): Lattice has 11 nodes, 12 links
INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(:90:94) = -133883
INFO: ps_lattice.c(1390): Joint P(O,S) = -134584 P(S|O) = -701
INFO: ngram_search.c(875): bestpath -0.00 CPU -0.000 xRT
INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
000000002:
READY....
Listening...
^CINFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 0.09 CPU 0.016 xRT
INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 8.21 wall 1.474 xRT
INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 0.00 CPU 0.001 xRT
INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 0.00 wall 0.001 xRT
INFO: ngram_search.c(317): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(320): TOTAL bestpath 0.00 wall 0.000 xRT
root@ubuntu:/home/hoangb/Projects/Android/v2text/digit# </sil>
I use this script file to record voice on linux
for i in
seq 1 10; doread sent; echo “1. + $sent;
fn=
printf %s ${sent,,};rec -r 8000 -e signed-integer -b 16 -c 1 $fn.wav 2>/dev/null;
done < corpus.txt
And I play the wav file to verify the sample rate. It is 8000hz
root@ubuntu:/home/hoangb/Projects/Android/v2text/digit# play
wav/speaker1/spk1_one.wav
wav/speaker1/spk1_one.wav:
File Size: 49.2k Bit Rate: 128k
Encoding: Signed PCM
Channels: 1 @ 16-bit
Samplerate: 8000Hz
Replaygain: off
Duration: 00:00:03.07
In:100% 00:00:03.07 Out:24.6k Clip:0
Done.
feat.params:
-alpha 0.97
-samprate 8000.0
-doublebw no
-nfilt 31
-ncep 13
-lowerf 200.00
-upperf 3500.00
-dither yes
-nfft 512
-wlen 0.0256
-transform legacy
-feat CFG_FEATURE
-svspec CFG_SVSPEC
-agc CFG_AGC
-cmn CFG_CMN
-varnorm CFG_VARNORM
Aliamagic, you are quick!
The question is: the application seem not recorgnize my voice in both Linux and Android. What have I done incorrect?
Eliasmagic, **How many people do you think that would be enough? I am looking
for more people to help with recording, but before doing this. I would like
you to check to see any thing wrong with my audio files. **
I set CFG_WAVFILE_TYPE = 'mswav", but I record voice using linux. Does it
conflict with my current setting?
Audio waveform and feature file information
$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
$CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
sphinx_train.cfg said that set CFG_HMM_TYPE = '.semi.'; which for PocketSphinx
and Sphinx II, but I see most people set $CFG_HMM_TYPE = '.cont.'. so I set my
$CFG_HMM_TYPE = '.cont.'.
Below is my current setting.... can you please check?
$CFG_HMM_TYPE = '.cont.'; # Sphinx III
$CFG_HMM_TYPE = '.semi.'; # PocketSphinx and Sphinx II
$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)
...
$CFG_FINAL_NUM_DENSITIES = 2;
...
$CFG_N_TIED_STATES = 200;
RecognizerTask.java I set
c.setString("-hmm",
"/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/tv/digit.cd_cont_200");
c.setString("-dict",
"/sdcard/Android/data/edu.cmu.pocketsphinx/lm/tv/digit.dic");
c.setString("-lm",
"/sdcard/Android/data/edu.cmu.pocketsphinx/lm/tv/digit.lm.DMP");
c.setString("-rawlogdir", "/sdcard/Android/data/edu.cmu.pocketsphinx");
c.setFloat("-samprate", 8000.0);
c.setInt("-maxhmmpf", 2000);
c.setInt("-maxwpf", 10);
c.setInt("-pl_window", 2);
c.setBoolean("-backtrace", true);
c.setBoolean("-bestpath", false);
You need hours. The below site gives good numbers but....Why not use an
existing model? You are just doing simple digits...
http://cmusphinx.sourceforge.net/wiki/tutorialam
Hi eliasmajic,
I build the it for an application that only need ~100 commands. I did tried
the existing model which included in the pocketsphinx directory. it takes
along time to load the model, because it is large, and it does not translate
text correctly either (totally wrong), so I do not adapt it. I tried the
models below but no successful
US English WSJ5K
US English HUB4
Can you suggest me a good one that can be used for android?
Do I need to adapt the acoustic model before using it?
I tried tidigits model, and it is loading fast and accurate. That is why I
decided to create a new model. Well, I am in the learning process, so I would
like to try any thing that works first.
thank you very much for your help