Menu

Error Converting From M4A

Help
z
2010-09-10
2012-09-22
  • z

    z - 2010-09-10

    I'm using pocketsphinx to convert an audio file recorded on a Motorola Droid.
    Since the Droid can only upload M4A files, I'm converting the file to a wav
    with mplayer. When I put the wav into pocketsphinx, I get this output:

    -7.vp.tg.lm.DMP -dict /usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic
    INFO: cmd_ln.c(506): Parsing command line:
    /home/zach/utils/wav2text \
        -infile /home/zach/Desktop/test.wav \
        -hmm /usr/share/pocketsphinx/model/hmm/wsj1 \
        -lm /usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP \
        -dict /usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ascale     20.0        2.000000e+01
    -backtrace  no      no
    -beam       1e-48       1.000000e-48
    -bestpath   yes     yes
    -bestpathlw 9.5     9.500000e+00
    -cep2spec   no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -compallsen no      no
    -dict               /usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic
    -dictcase   no      no
    -dither     no      no
    -doublebw   no      no
    -ds     1       1
    -fdict              
    -feat       1s_c_d_dd   1s_c_d_dd
    -featparams         
    -fillprob   1e-8        1.000000e-08
    -frate      100     100
    -fsg                
    -fsgusealtpron  yes     yes
    -fsgusefiller   yes     yes
    -fwdflat    yes     yes
    -fwdflatbeam    1e-64       1.000000e-64
    -fwdflatefwid   4       4
    -fwdflatlw  8.5     8.500000e+00
    -fwdflatsfwin   25      25
    -fwdflatwbeam   7e-29       7.000000e-29
    -fwdtree    yes     yes
    -hmm                /usr/share/pocketsphinx/model/hmm/wsj1
    -infile             /home/zach/Desktop/test.wav
    -input_endian   little      little
    -jsgf               
    -kdmaxbbi   -1      -1
    -kdmaxdepth 0       0
    -kdtree             
    -latsize    5000        5000
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -lm             /usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP
    -lmctl              
    -lmname     default     default
    -logbase    1.0001      1.000100e+00
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.333333e+02
    -lpbeam     1e-40       1.000000e-40
    -lponlybeam 7e-29       7.000000e-29
    -lw     6.5     6.500000e+00
    -maxhistpf  100     100
    -maxhmmpf   -1      -1
    -maxnewoov  20      20
    -maxwpf     -1      -1
    -mdef               
    -mean               
    -mfclogdir          
    -mixw               
    -mixwfloor  0.0000001   1.000000e-07
    -mmap       yes     yes
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -nwpen      1.0     1.000000e+00
    -pbeam      1e-48       1.000000e-48
    -pip        1.0     1.000000e+00
    -rawlogdir          
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -sdmap              
    -seed       -1      -1
    -sendump            
    -silprob    0.005       5.000000e-03
    -smoothspec no      no
    -spec2cep   no      no
    -svspec             
    -tmat               
    -tmatfloor  0.0001      1.000000e-04
    -topn       4       4
    -toprule            
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -usewdphones    no      no
    -uw     1.0     1.000000e+00
    -var                
    -varfloor   0.0001      1.000000e-04
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wbeam      7e-29       7.000000e-29
    -wip        0.65        6.500000e-01
    -wlen       0.025625    2.562500e-02
    
    INFO: cmd_ln.c(506): Parsing command line:
    \
        -lowerf 1 \
        -upperf 4000 \
        -nfilt 20 \
        -transform dct \
        -round_filters no \
        -remove_dc yes \
        -feat s2_4x
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -cep2spec   no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -dither     no      no
    -doublebw   no      no
    -feat       1s_c_d_dd   s2_4x
    -frate      100     100
    -input_endian   little      little
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.000000e+00
    -mfclogdir          
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      20
    -rawlogdir          
    -remove_dc  no      yes
    -round_filters  yes     no
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -smoothspec no      no
    -spec2cep   no      no
    -svspec             
    -transform  legacy      dct
    -unit_area  yes     yes
    -upperf     6855.4976   4.000000e+03
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wlen       0.025625    2.562500e-02
    
    INFO: acmod.c(82): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/wsj1/feat.params
    INFO: mdef.c(520): Reading model definition: /usr/share/pocketsphinx/model/hmm/wsj1/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(301): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/wsj1/mdef
    INFO: bin_mdef.c(480): 44 CI-phone, 66516 CD-phone, 5 emitstate/phone, 220 CI-sen, 5220 Sen, 18660 Sen-Seq
    INFO: tmat.c(204): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/wsj1/transition_matrices
    INFO: acmod.c(114): Attempting to use SCGMM computation module
    INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/wsj1/means'
    INFO: s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
    INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/wsj1/variances'
    INFO: s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
    INFO: s2_semi_mgau.c(748): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/wsj1/sendump
    INFO: s2_semi_mgau.c(764): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(793): Rows: 256, Columns: 5220
    INFO: s2_semi_mgau.c(801): Using memory-mapped I/O for senones
    INFO: kdtree.c(231): Reading tree for feature 0
    INFO: kdtree.c(249): n_density 256 n_comp 12 n_level 8 threshold 0.200000
    INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 1
    INFO: kdtree.c(249): n_density 256 n_comp 24 n_level 8 threshold 0.200000
    INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 2
    INFO: kdtree.c(249): n_density 256 n_comp 3 n_level 8 threshold 0.200000
    INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 3
    INFO: kdtree.c(249): n_density 256 n_comp 12 n_level 8 threshold 0.200000
    INFO: kdtree.c(186): Read 255 nodes
    INFO: feat.c(849): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: dict.c(232): Allocating 20 placeholders for new OOVs
    INFO: dict.c(494):   6270 = words in file [/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic]
    WARNING: "dict.c", line 435: Skipping duplicate definition of <s>
    WARNING: "dict.c", line 435: Skipping duplicate definition of </s>
    WARNING: "dict.c", line 435: Skipping duplicate definition of <sil>
    INFO: dict.c(494):      3 = words in file [/usr/share/pocketsphinx/model/hmm/wsj1/noisedict]
    INFO: dict.c(349): LEFT CONTEXT TABLES
    INFO: dict.c(1013): Entry Context table contains
           450 entries
    INFO: dict.c(1014):      19800 possible cross word triphones.
    INFO: dict.c(1052):      17920 triphones
          1792 pseudo diphones
            88 uniphones
    INFO: dict.c(1099): Exit Context table contains
           450 entries
    INFO: dict.c(1100):      19800 possible cross word triphones.
    INFO: dict.c(1166):      17920 triphones
          1792 pseudo diphones
            88 uniphones
    INFO: dict.c(1168):       7653 right context entries
    INFO: dict.c(1169):         17 ave entries per exit context
    INFO: dict.c(355): RIGHT CONTEXT TABLES
    INFO: dict.c(1013): Entry Context table contains
           416 entries
    INFO: dict.c(1014):      18304 possible cross word triphones.
    INFO: dict.c(1052):      17388 triphones
           828 pseudo diphones
            88 uniphones
    INFO: dict.c(1099): Exit Context table contains
           416 entries
    INFO: dict.c(1100):      18304 possible cross word triphones.
    INFO: dict.c(1166):      17388 triphones
           828 pseudo diphones
            88 uniphones
    INFO: dict.c(1168):       8753 right context entries
    INFO: dict.c(1169):         21 ave entries per exit context
    WARNING: "listelem_alloc.c", line 89: List item size (20) not multiple of sizeof(void *), rounding to 24
    ERROR: "ngram_model_arpa.c", line 155: No \data\ mark in LM file
    INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(190): ngrams 1=5002, 2=338656, 3=291318
    INFO: ngram_model_dmp.c(236):     5002 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(286):   338656 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(313):   291318 = LM.trigrams read
    INFO: ngram_model_dmp.c(338):    32470 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(358):    13795 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(379):    31136 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(408):      662 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(467):     5002 = ascii word strings read
    INFO: ngram_search_fwdtree.c(156): 0 root, 0 non-root channels, 37 single-phone words
    INFO: ngram_search_fwdtree.c(195): Creating search tree
    INFO: ngram_search_fwdtree.c(203): 0 root, 0 non-root channels, 37 single-phone words
    INFO: ngram_search_fwdtree.c(325): max nonroot chan increased to 13871
    INFO: ngram_search_fwdtree.c(334): 443 root, 13743 non-root channels, 17 single-phone words
    INFO: ngram_search_fwdflat.c(95): fwdflat: min_ef_width = 4, max_sf_win = 25
    FATAL_ERROR: "cont.c", line 108: cont_ad_calib failed
    

    If, however, I record a wav file in Audacity directly on my desktop,
    pocketsphinx works. This issue is happening consistently among ~10 Droid-
    generated files and ~10 Audacity-generated files. The conversion is taking
    place on a 64-bit Ubuntu 10.04 machine.

    An example file that doesn't work (from Droid) is here:
    http://zachrattner.com/wav/test.wav
    An example file that does work (from Audacity) is here:
    http://zachrattner.com/wav/test2.wav

    If anyone could shed some light on what I'm doing wrong, I'd appreciate it.

    Thanks,
    Zach

     
  • marekl

    marekl - 2010-09-10

    Hi Zach,

    Sphinx files should be uncompressed PCM 16-bit files while your test file is
    32-bit IEEE float file. Change output options in mplayer to PCM 16-bit

     
  • z

    z - 2010-09-11

    Thanks marekl0. Now I have an 8kHz, stereo, PCM 16-bit WAV file and I'm still
    getting the same error. Do you know if there are any other limitations on what
    kind of wav I can use?

    Thanks,
    Zach

     
  • marekl

    marekl - 2010-09-11

    In general you should have Uncompressed PCM 16-bit mono WAV. Most acoustic
    models are created for 16kHz files so if you use other sampling frequencies
    you have to be sure that acoustic model you are using is created for this
    frequency.

    Marek

     
  • marekl

    marekl - 2010-09-11

    moreover (I forget to add) your current settings of pocketsphinx are for 16kHz
    files (see -samprate option)

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.