CMU Sphinx / Forums / Help: Error Converting From M4A

I'm using pocketsphinx to convert an audio file recorded on a Motorola Droid.
Since the Droid can only upload M4A files, I'm converting the file to a wav
with mplayer. When I put the wav into pocketsphinx, I get this output:

-7.vp.tg.lm.DMP -dict /usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic
INFO: cmd_ln.c(506): Parsing command line:
/home/zach/utils/wav2text \
    -infile /home/zach/Desktop/test.wav \
    -hmm /usr/share/pocketsphinx/model/hmm/wsj1 \
    -lm /usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP \
    -dict /usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-ascale     20.0        2.000000e+01
-backtrace  no      no
-beam       1e-48       1.000000e-48
-bestpath   yes     yes
-bestpathlw 9.5     9.500000e+00
-cep2spec   no      no
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     8.0
-compallsen no      no
-dict               /usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic
-dictcase   no      no
-dither     no      no
-doublebw   no      no
-ds     1       1
-fdict              
-feat       1s_c_d_dd   1s_c_d_dd
-featparams         
-fillprob   1e-8        1.000000e-08
-frate      100     100
-fsg                
-fsgusealtpron  yes     yes
-fsgusefiller   yes     yes
-fwdflat    yes     yes
-fwdflatbeam    1e-64       1.000000e-64
-fwdflatefwid   4       4
-fwdflatlw  8.5     8.500000e+00
-fwdflatsfwin   25      25
-fwdflatwbeam   7e-29       7.000000e-29
-fwdtree    yes     yes
-hmm                /usr/share/pocketsphinx/model/hmm/wsj1
-infile             /home/zach/Desktop/test.wav
-input_endian   little      little
-jsgf               
-kdmaxbbi   -1      -1
-kdmaxdepth 0       0
-kdtree             
-latsize    5000        5000
-lda                
-ldadim     0       0
-lifter     0       0
-lm             /usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP
-lmctl              
-lmname     default     default
-logbase    1.0001      1.000100e+00
-logfn              
-logspec    no      no
-lowerf     133.33334   1.333333e+02
-lpbeam     1e-40       1.000000e-40
-lponlybeam 7e-29       7.000000e-29
-lw     6.5     6.500000e+00
-maxhistpf  100     100
-maxhmmpf   -1      -1
-maxnewoov  20      20
-maxwpf     -1      -1
-mdef               
-mean               
-mfclogdir          
-mixw               
-mixwfloor  0.0000001   1.000000e-07
-mmap       yes     yes
-ncep       13      13
-nfft       512     512
-nfilt      40      40
-nwpen      1.0     1.000000e+00
-pbeam      1e-48       1.000000e-48
-pip        1.0     1.000000e+00
-rawlogdir          
-remove_dc  no      no
-round_filters  yes     yes
-samprate   16000       1.600000e+04
-sdmap              
-seed       -1      -1
-sendump            
-silprob    0.005       5.000000e-03
-smoothspec no      no
-spec2cep   no      no
-svspec             
-tmat               
-tmatfloor  0.0001      1.000000e-04
-topn       4       4
-toprule            
-transform  legacy      legacy
-unit_area  yes     yes
-upperf     6855.4976   6.855498e+03
-usewdphones    no      no
-uw     1.0     1.000000e+00
-var                
-varfloor   0.0001      1.000000e-04
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wbeam      7e-29       7.000000e-29
-wip        0.65        6.500000e-01
-wlen       0.025625    2.562500e-02

INFO: cmd_ln.c(506): Parsing command line:
\
    -lowerf 1 \
    -upperf 4000 \
    -nfilt 20 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -feat s2_4x

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-cep2spec   no      no
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     8.0
-dither     no      no
-doublebw   no      no
-feat       1s_c_d_dd   s2_4x
-frate      100     100
-input_endian   little      little
-lda                
-ldadim     0       0
-lifter     0       0
-logfn              
-logspec    no      no
-lowerf     133.33334   1.000000e+00
-mfclogdir          
-ncep       13      13
-nfft       512     512
-nfilt      40      20
-rawlogdir          
-remove_dc  no      yes
-round_filters  yes     no
-samprate   16000       1.600000e+04
-seed       -1      -1
-smoothspec no      no
-spec2cep   no      no
-svspec             
-transform  legacy      dct
-unit_area  yes     yes
-upperf     6855.4976   4.000000e+03
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wlen       0.025625    2.562500e-02

INFO: acmod.c(82): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/wsj1/feat.params
INFO: mdef.c(520): Reading model definition: /usr/share/pocketsphinx/model/hmm/wsj1/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(301): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/wsj1/mdef
INFO: bin_mdef.c(480): 44 CI-phone, 66516 CD-phone, 5 emitstate/phone, 220 CI-sen, 5220 Sen, 18660 Sen-Seq
INFO: tmat.c(204): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/wsj1/transition_matrices
INFO: acmod.c(114): Attempting to use SCGMM computation module
INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/wsj1/means'
INFO: s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/wsj1/variances'
INFO: s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
INFO: s2_semi_mgau.c(748): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/wsj1/sendump
INFO: s2_semi_mgau.c(764): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(793): Rows: 256, Columns: 5220
INFO: s2_semi_mgau.c(801): Using memory-mapped I/O for senones
INFO: kdtree.c(231): Reading tree for feature 0
INFO: kdtree.c(249): n_density 256 n_comp 12 n_level 8 threshold 0.200000
INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 1
INFO: kdtree.c(249): n_density 256 n_comp 24 n_level 8 threshold 0.200000
INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 2
INFO: kdtree.c(249): n_density 256 n_comp 3 n_level 8 threshold 0.200000
INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 3
INFO: kdtree.c(249): n_density 256 n_comp 12 n_level 8 threshold 0.200000
INFO: kdtree.c(186): Read 255 nodes
INFO: feat.c(849): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: dict.c(232): Allocating 20 placeholders for new OOVs
INFO: dict.c(494):   6270 = words in file [/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic]
WARNING: "dict.c", line 435: Skipping duplicate definition of <s>
WARNING: "dict.c", line 435: Skipping duplicate definition of </s>
WARNING: "dict.c", line 435: Skipping duplicate definition of <sil>
INFO: dict.c(494):      3 = words in file [/usr/share/pocketsphinx/model/hmm/wsj1/noisedict]
INFO: dict.c(349): LEFT CONTEXT TABLES
INFO: dict.c(1013): Entry Context table contains
       450 entries
INFO: dict.c(1014):      19800 possible cross word triphones.
INFO: dict.c(1052):      17920 triphones
      1792 pseudo diphones
        88 uniphones
INFO: dict.c(1099): Exit Context table contains
       450 entries
INFO: dict.c(1100):      19800 possible cross word triphones.
INFO: dict.c(1166):      17920 triphones
      1792 pseudo diphones
        88 uniphones
INFO: dict.c(1168):       7653 right context entries
INFO: dict.c(1169):         17 ave entries per exit context
INFO: dict.c(355): RIGHT CONTEXT TABLES
INFO: dict.c(1013): Entry Context table contains
       416 entries
INFO: dict.c(1014):      18304 possible cross word triphones.
INFO: dict.c(1052):      17388 triphones
       828 pseudo diphones
        88 uniphones
INFO: dict.c(1099): Exit Context table contains
       416 entries
INFO: dict.c(1100):      18304 possible cross word triphones.
INFO: dict.c(1166):      17388 triphones
       828 pseudo diphones
        88 uniphones
INFO: dict.c(1168):       8753 right context entries
INFO: dict.c(1169):         21 ave entries per exit context
WARNING: "listelem_alloc.c", line 89: List item size (20) not multiple of sizeof(void *), rounding to 24
ERROR: "ngram_model_arpa.c", line 155: No \data\ mark in LM file
INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(190): ngrams 1=5002, 2=338656, 3=291318
INFO: ngram_model_dmp.c(236):     5002 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(286):   338656 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(313):   291318 = LM.trigrams read
INFO: ngram_model_dmp.c(338):    32470 = LM.prob2 entries read
INFO: ngram_model_dmp.c(358):    13795 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379):    31136 = LM.prob3 entries read
INFO: ngram_model_dmp.c(408):      662 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(467):     5002 = ascii word strings read
INFO: ngram_search_fwdtree.c(156): 0 root, 0 non-root channels, 37 single-phone words
INFO: ngram_search_fwdtree.c(195): Creating search tree
INFO: ngram_search_fwdtree.c(203): 0 root, 0 non-root channels, 37 single-phone words
INFO: ngram_search_fwdtree.c(325): max nonroot chan increased to 13871
INFO: ngram_search_fwdtree.c(334): 443 root, 13743 non-root channels, 17 single-phone words
INFO: ngram_search_fwdflat.c(95): fwdflat: min_ef_width = 4, max_sf_win = 25
FATAL_ERROR: "cont.c", line 108: cont_ad_calib failed

If, however, I record a wav file in Audacity directly on my desktop,
pocketsphinx works. This issue is happening consistently among ~10 Droid-
generated files and ~10 Audacity-generated files. The conversion is taking
place on a 64-bit Ubuntu 10.04 machine.

An example file that doesn't work (from Droid) is here:
http://zachrattner.com/wav/test.wav
An example file that does work (from Audacity) is here:
http://zachrattner.com/wav/test2.wav

If anyone could shed some light on what I'm doing wrong, I'd appreciate it.

Thanks,
Zach

Error Converting From M4A

Speech Recognition Toolkit

Forums

Help

Error Converting From M4A document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Error Converting From M4A