Menu

problem adapting acoustic model in pocketsphi

Help
2012-08-16
2012-09-22
  • hidayu kamarudin

    HI,
    anybody for that having experience maybe can help in giving such helping
    advice.
    i'm creating a speech recognition for arabic and right now have successfully
    created the acoustic model. whenever i try to adapt the acoustic model with
    the want i have created, it not giving any result.
    i'm using pocketsphinx1 0.7, sphinxbase 0.7 and sphinxtrain0.7, my os: is
    fedora linux

    Maybe i need to recheck on the audio file. CUrrently the wav file is recorded
    using audacity and under 8000khz and 16bit pcm. please advice and your help is
    much appreciated.

    here is the result:

    $ pocketsphinx_continuous -hmm model/hmm/hijaiyah.cd_cont_1000 -lm
    model/lm/hijaiyah.dmp -dict model/lm/hijaiyah.dic
    INFO: cmd_ln.c(691): Parsing command line:
    pocketsphinx_continuous \
    -hmm model/hmm/hijaiyah.cd_cont_1000 \
    -lm model/lm/hijaiyah.dmp \
    -dict model/lm/hijaiyah.dic

    Current configuration:

    -adcdev
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -bghist no no
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict model/lm/hijaiyah.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm model/hmm/hijaiyah.cd_cont_1000
    -infile
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm model/lm/hijaiyah.dmp
    -lmctl
    -lmname default default
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf -1 -1
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-5 1.000000e-05
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -time no no
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -usewdphones no no
    -uw 1.0 1.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(691): Parsing command line:
    \
    -alpha 0.97 \
    -dither yes \
    -doublebw no \
    -nfilt 40 \
    -ncep 13 \
    -lowerf 133.33334 \
    -upperf 6855.4976 \
    -nfft 512 \
    -wlen 0.0256 \
    -transform legacy \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -varnorm no

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -dither no yes
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.560000e-02

    INFO: acmod.c(242): Parsed model-specific feature parameters from
    model/hmm/hijaiyah.cd_cont_1000/feat.params
    INFO: fe_interface.c(299): You are using the internal mechanism to generate
    the seed.
    INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd',
    ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: mdef.c(520): Reading model definition:
    model/hmm/hijaiyah.cd_cont_1000/mdef
    INFO: bin_mdef.c(173): Allocating 104 * 8 bytes (0 KiB) for CD tree
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    model/hmm/hijaiyah.cd_cont_1000/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    model/hmm/hijaiyah.cd_cont_1000/means
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    model/hmm/hijaiyah.cd_cont_1000/variances
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(354): 10242 variance values floored
    INFO: acmod.c(119): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    model/hmm/hijaiyah.cd_cont_1000/means
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    model/hmm/hijaiyah.cd_cont_1000/variances
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(354): 10242 variance values floored
    INFO: ptm_mgau.c(804): Number of codebooks doesn't match number of ciphones,
    doesn't look like PTM: 51 9
    INFO: acmod.c(121): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    model/hmm/hijaiyah.cd_cont_1000/means
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    model/hmm/hijaiyah.cd_cont_1000/variances
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(354): 10242 variance values floored
    INFO: ms_senone.c(160): Reading senone mixture weights:
    model/hmm/hijaiyah.cd_cont_1000/mixture_weights
    INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(218): Not transposing mixture weights in memory
    INFO: ms_senone.c(277): Read mixture weights for 51 senones: 1 features x 8
    codewords
    INFO: ms_senone.c(331): Mapping senones to individual codebooks
    INFO: ms_mgau.c(122): The value of topn: 4
    INFO: dict.c(306): Allocating 4103 * 32 bytes (128 KiB) for word entries
    INFO: dict.c(321): Reading main dictionary: model/lm/hijaiyah.dic
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(324): 4 words read
    INFO: dict.c(330): Reading filler dictionary:
    model/hmm/hijaiyah.cd_cont_1000/noisedict
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(333): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 9^3 * 2 bytes (1 KiB) for word-initial
    triphones
    INFO: dict2pid.c(131): Allocated 2016 bytes (1 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 2016 bytes (1 KiB) for single-phone word
    triphones
    INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(196): ngrams 1=6, 2=8, 3=4
    INFO: ngram_model_dmp.c(242): 6 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(291): 8 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(317): 4 = LM.trigrams read
    INFO: ngram_model_dmp.c(342): 3 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(362): 3 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(382): 2 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(410): 1 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(466): 6 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 4 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
    words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
    single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128
    INFO: ngram_search_fwdtree.c(338): after: 4 root, 0 non-root channels, 3
    single-phone words
    INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(367): pocketsphinx_continuous COMPILED ON: Aug 16 2012, AT:
    13:39:35

    Warning: Could not find Mic element
    READY....
    Listening...
    Recording is stopped, start recording with ad_start_rec
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00
    0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 12.78 -0.69 -0.31 -0.11 -0.22
    -0.13 -0.17 -0.22 -0.20 -0.17 -0.16 -0.13 0.03 >
    INFO: ngram_search_fwdtree.c(1549): 0 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1551): 921 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1553): 308 channels searched (0/fr), 0 1st, 308
    last
    INFO: ngram_search_fwdtree.c(1557): 308 words for which last channels
    evaluated (0/fr)
    INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.02 CPU 0.005 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 4.51 wall 1.460 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdflat.c(940): 0 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(942): 924 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944): 308 channels searched (0/fr)
    INFO: ngram_search_fwdflat.c(946): 308 words searched (0/fr)
    INFO: ngram_search_fwdflat.c(948): 0 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.001 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.000 xRT
    ERROR: "ngram_search.c", line 1144: Couldn't find in first frame
    000000000: (null)
    READY....
    Listening...
    Recording is stopped, start recording with ad_start_rec
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 12.78 -0.69 -0.31 -0.11 -0.22
    -0.13 -0.17 -0.22 -0.20 -0.17 -0.16 -0.13 0.03 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 13.39 -0.74 -0.38 -0.17 -0.26
    -0.07 -0.18 -0.24 -0.15 -0.14 -0.15 -0.14 -0.01 >
    INFO: ngram_search_fwdtree.c(1549): 0 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1551): 315 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1553): 105 channels searched (0/fr), 0 1st, 105
    last
    INFO: ngram_search_fwdtree.c(1557): 105 words for which last channels
    evaluated (0/fr)
    INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.01 CPU 0.008 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.88 wall 2.715 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdflat.c(940): 0 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(942): 315 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944): 105 channels searched (0/fr)
    INFO: ngram_search_fwdflat.c(946): 105 words searched (0/fr)
    INFO: ngram_search_fwdflat.c(948): 0 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.001 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.000 xRT
    ERROR: "ngram_search.c", line 1144: Couldn't find in first frame
    000000001: (null)
    READY....
    Listening...
    Recording is stopped, start recording with ad_start_rec
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 13.39 -0.74 -0.38 -0.17 -0.26
    -0.07 -0.18 -0.24 -0.15 -0.14 -0.15 -0.14 -0.01 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 14.01 -0.76 -0.40 -0.23 -0.28
    -0.08 -0.20 -0.25 -0.14 -0.14 -0.16 -0.15 -0.05 >
    INFO: ngram_search_fwdtree.c(1549): 0 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1551): 333 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1553): 111 channels searched (0/fr), 0 1st, 111
    last
    INFO: ngram_search_fwdtree.c(1557): 111 words for which last channels
    evaluated (0/fr)
    INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.01 CPU 0.007 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.68 wall 2.391 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdflat.c(940): 0 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(942): 333 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944): 111 channels searched (0/fr)
    INFO: ngram_search_fwdflat.c(946): 111 words searched (0/fr)
    INFO: ngram_search_fwdflat.c(948): 0 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.000 xRT
    ERROR: "ngram_search.c", line 1144: Couldn't find in first frame
    000000002: (null)
    READY....
    Listening...
    Recording is stopped, start recording with ad_start_rec
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 14.01 -0.76 -0.40 -0.23 -0.28
    -0.08 -0.20 -0.25 -0.14 -0.14 -0.16 -0.15 -0.05 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 14.25 -0.79 -0.37 -0.25 -0.28
    -0.08 -0.19 -0.23 -0.13 -0.14 -0.16 -0.16 -0.06 >
    INFO: ngram_search_fwdtree.c(1549): 0 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1551): 291 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1553): 97 channels searched (0/fr), 0 1st, 97
    last
    INFO: ngram_search_fwdtree.c(1557): 97 words for which last channels evaluated
    (0/fr)
    INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.01 CPU 0.008 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.68 wall 2.731 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdflat.c(940): 0 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(942): 291 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944): 97 channels searched (0/fr)
    INFO: ngram_search_fwdflat.c(946): 97 words searched (0/fr)
    INFO: ngram_search_fwdflat.c(948): 0 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.000 xRT
    ERROR: "ngram_search.c", line 1144: Couldn't find in first frame
    000000003: (null)
    READY....
    Listening...
    Recording is stopped, start recording with ad_start_rec
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 14.69 -0.81 -0.36 -0.29 -0.28
    -0.10 -0.19 -0.22 -0.12 -0.13 -0.18 -0.16 -0.08 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 14.67 -0.80 -0.36 -0.29 -0.28
    -0.09 -0.19 -0.22 -0.12 -0.13 -0.18 -0.16 -0.08 >
    INFO: ngram_search_fwdtree.c(1549): 0 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1551): 564 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1553): 188 channels searched (0/fr), 0 1st, 188
    last
    INFO: ngram_search_fwdtree.c(1557): 188 words for which last channels
    evaluated (0/fr)
    INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.01 CPU 0.005 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.68 wall 1.418 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdflat.c(940): 0 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(942): 564 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944): 188 channels searched (0/fr)
    INFO: ngram_search_fwdflat.c(946): 188 words searched (0/fr)
    INFO: ngram_search_fwdflat.c(948): 0 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.001 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.000 xRT
    ERROR: "ngram_search.c", line 1144: Couldn't find in first frame
    000000004: (null)
    READY....
    Listening...
    Recording is stopped, start recording with ad_start_rec
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 14.67 -0.80 -0.36 -0.29 -0.28
    -0.09 -0.19 -0.22 -0.12 -0.13 -0.18 -0.16 -0.08 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 14.65 -0.73 -0.34 -0.27 -0.27
    -0.11 -0.19 -0.22 -0.13 -0.13 -0.15 -0.15 -0.08 >
    INFO: ngram_search_fwdtree.c(1549): 0 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1551): 267 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1553): 89 channels searched (0/fr), 0 1st, 89
    last
    INFO: ngram_search_fwdtree.c(1557): 89 words for which last channels evaluated
    (0/fr)
    INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.01 CPU 0.007 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 1.35 wall 1.500 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdflat.c(940): 0 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(942): 267 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944): 89 channels searched (0/fr)
    INFO: ngram_search_fwdflat.c(946): 89 words searched (0/fr)
    INFO: ngram_search_fwdflat.c(948): 0 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.000 xRT
    ERROR: "ngram_search.c", line 1144: Couldn't find in first frame
    000000005: (null)
    READY....

     
  • Nickolay V. Shmyrev

    i'm creating a speech recognition for arabic and right now have successfully
    created the acoustic model. whenever i try to adapt the acoustic model with
    the want i have created, it not giving any result.

    So you have issues in your adaptation process.

    Maybe i need to recheck on the audio file. CUrrently the wav file is
    recorded using audacity and under 8000khz and 16bit pcm. please advice and
    your help is much appreciated.

    In order to get help you need to provide more information on what you has
    already done. The more information you provide the faster you will solve the
    problem.

     
  • hidayu kamarudin

    Hi nshmyrev,

    thank you for your reply, i need to know what kind of file that i should place
    here to help me get on this. I'm not really god in sphinx and pocketsphinx and
    right now still in learning. Hope you may help me to get on this. thanks again

     
  • bic-user

    bic-user - 2012-08-27

    CUrrently the wav file is recorded using audacity and under 8000khz and
    16bit pcm

    but decoder is configured to recognize 16kHz audio files. use -samprate 8000
    and came back with results. Buy the way what files did you used for
    adaptation? Also 8kHz?

     
  • hidayu kamarudin

    Thank you bic-user and nshmyrev with your reply,

    BY the way, i have recorded the audio again into 16bit pcm and 16000khz for
    the wav file and again, when running the pocket sphinx continuous, it giving
    the same error.

    here is the logdir logs info:

    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 1.000000e-40
    -wip 0.65 2.000000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(512): Parsing command line:
    \
    -alpha 0.97 \
    -dither yes \
    -doublebw no \
    -nfilt 40 \
    -ncep 13 \
    -lowerf 133.33334 \
    -upperf 6855.4976 \
    -nfft 512 \
    -wlen 0.0256 \
    -transform legacy \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -varnorm no

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -dither no yes
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.560000e-02

    INFO: acmod.c(238): Parsed model-specific feature parameters from
    /home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/feat.params
    INFO: fe_interface.c(288): You are using the internal mechanism to generate
    the seed.
    INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd',
    ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: mdef.c(520): Reading model definition:
    /home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/mdef
    INFO: bin_mdef.c(173): Allocating 104 * 8 bytes (0 KiB) for CD tree
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    /home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/means
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size
    8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/variances
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size
    8x39
    INFO: ms_gauden.c(356): 8122 variance values floored
    INFO: acmod.c(119): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/means
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size
    8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/variances
    INFO: ms_gauden.c(292): 51 codebook, 1 feature, size
    8x39
    INFO: ms_gauden.c(356): 8122 variance values floored
    INFO: ptm_mgau.c(671): Reading mixture weights file
    '/home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/mixture_weights'
    INFO: ptm_mgau.c(765): Read 51 x 1 x 8 mixture weights
    INFO: ptm_mgau.c(831): Maximum top-N: 4
    INFO: dict.c(294): Allocating 4103 * 32 bytes (128 KiB) for word entries
    INFO: dict.c(306): Reading main dictionary: /home/hidayu/ayu3/etc/hijaiyah.dic
    INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(309): 4 words read
    INFO: dict.c(314): Reading filler dictionary:
    /home/hidayu/ayu3/model_parameters/hijaiyah.cd_cont_1000/noisedict
    INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(317): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(405): Allocating 9^3 * 2 bytes (1 KiB) for word-initial
    triphones
    INFO: dict2pid.c(131): Allocated 2016 bytes (1 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 2016 bytes (1 KiB) for single-phone word
    triphones
    ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file
    INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(195): ngrams 1=6, 2=8, 3=4
    INFO: ngram_model_dmp.c(241): 6 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(289): 8 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(314): 4 = LM.trigrams read
    INFO: ngram_model_dmp.c(338): 3 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(357): 3 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(377): 2 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(405): 1 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(461): 6 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 4 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
    words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
    single-phone words
    INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 128
    INFO: ngram_search_fwdtree.c(333): after: 4 root, 0 non-root channels, 3
    single-phone words
    INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: cmn.c(175): CMN: 13.72 -0.77 -0.33 -0.34 -0.23 -0.09 -0.11 -0.15 -0.11
    -0.12 -0.16 -0.17 -0.16
    Wed Aug 29 10:53:18 2012

    the result when running pocketsphinx::

    INFO: ngram_search_fwdtree.c(1549): 109 words recognized (1/fr)
    INFO: ngram_search_fwdtree.c(1551): 1536 senones evaluated (10/fr)
    INFO: ngram_search_fwdtree.c(1553): 674 channels searched (4/fr), 355 1st, 319
    last
    INFO: ngram_search_fwdtree.c(1557): 319 words for which last channels
    evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.01 CPU 0.007 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.78 wall 1.760 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 1 words
    INFO: ngram_search_fwdflat.c(940): 72 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(942): 471 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944): 434 channels searched (2/fr)
    INFO: ngram_search_fwdflat.c(946): 434 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(948): 26 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.001 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.000 xRT
    INFO: ngram_search.c(1201): not found in last frame, using <sil>.153
    instead
    INFO: ngram_search.c(1253): lattice start node .0 end node <sil>.139
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 5 nodes, 2 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(<sil>:139:153) = -116922
    INFO: ps_lattice.c(1390): Joint P(O,S) = -116922 P(S|O) = 0
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
    000000015:
    READY....
    Listening...
    Recording is stopped, start recording with ad_start_rec
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 16.96 -0.72 -0.39 -0.41 -0.35
    -0.12 -0.21 -0.19 -0.07 -0.14 -0.21 -0.25 -0.21 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 16.82 -0.70 -0.39 -0.38 -0.34
    -0.13 -0.22 -0.19 -0.08 -0.13 -0.20 -0.24 -0.21 >
    INFO: ngram_search_fwdtree.c(1549): 90 words recognized (1/fr)
    INFO: ngram_search_fwdtree.c(1551): 759 senones evaluated (11/fr)
    INFO: ngram_search_fwdtree.c(1553): 344 channels searched (5/fr), 188 1st, 156
    last
    INFO: ngram_search_fwdtree.c(1557): 156 words for which last channels
    evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1560): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.00 CPU 0.008 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 1.27 wall 1.925 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 2 words
    INFO: ngram_search_fwdflat.c(940): 70 words recognized (1/fr)
    INFO: ngram_search_fwdflat.c(942): 195 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944): 218 channels searched (3/fr)
    INFO: ngram_search_fwdflat.c(946): 218 words searched (3/fr)
    INFO: ngram_search_fwdflat.c(948): 57 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.001 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .63
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 13 nodes, 15 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</sil></sil>
    :63:64) = -138654
    INFO: ps_lattice.c(1390): Joint P(O,S) = -138989 P(S|O) = -335
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
    000000016: </sil>

    • is it the error below affecting to give the output and if yes, what is the solution that can help?

    ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file

    THank you very much for those who answering this.

     
  • Nickolay V. Shmyrev

    • is it the error below affecting to give the output

    No, this error is from old pocketsphinx, you should use newer one

    and if yes, what is the solution that can help?

    The solution is described in the adaptation tutorial in the testing section;

    http://cmusphinx.sourceforge.net/wiki/tutorialadapt#testing_the_adaptation

    You need to test the adaptation. After you perform the test it's possible to
    diagnoze and solve the problem. To get the fastest answer you need to provide
    the test results and test and adaptation data.

     

Log in to post a comment.