Menu

Format mismatch, pocketsphinx

Help
2016-02-02
2016-02-09
  • Andreas Ravndal

    Andreas Ravndal - 2016-02-02

    Hi, so I get this error when i try to make pocketsphinx recognize an 8000 sample rate audiofile
    ERROR: "continuous.c", line 136: Input audio file has sample rate [8000], but decoder expects [16000] FATAL: "continuous.c", line 165: Failed to process file '/home/andreas/Documents/Taledatabase/wav/soundfile_1.wav' due to format mismatch. ]
    although i trained my acoustic model for audiofiles with 8000 sample rate... so why do I get this error?
    Here is the output from my terminal window:

    andreas@andreas-MS-7817:~$ pocketsphinx_continuous -hmm /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200 -lm /home/andreas/Documents/Taledatabase/etc/tesLm.lm.DMP -dict /home/andreas/Documents/Taledatabase/etc/test.dic -infile /home/andreas/Documents/Taledatabase/wav/soundfile_1.wavINFO: pocketsphinx.c(145): Parsed model-specific feature parameters from /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/feat.params
    Current configuration:
    [NAME]          [DEFLT]     [VALUE]
    -agc            none        none
    -agcthresh      2.0     2.000000e+00
    -allphone               
    -allphone_ci        no      no
    -alpha          0.97        9.700000e-01
    -ascale         20.0        2.000000e+01
    -aw         1       1
    -backtrace      no      no
    -beam           1e-48       1.000000e-48
    -bestpath       yes     yes
    -bestpathlw     9.5     9.500000e+00
    -ceplen         13      13
    -cmn            current     current
    -cmninit        8.0     8.0
    -compallsen     no      no
    -debug                  0
    -dict                   /home/andreas/Documents/Taledatabase/etc/test.dic
    -dictcase       no      no
    -dither         no      no
    -doublebw       no      no
    -ds         1       1
    -fdict                  /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/noisedict
    -feat           1s_c_d_dd   1s_c_d_dd
    -featparams             /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/feat.params
    -fillprob       1e-8        1.000000e-08
    -frate          100     100
    -fsg                    
    -fsgusealtpron      yes     yes
    -fsgusefiller       yes     yes
    -fwdflat        yes     yes
    -fwdflatbeam        1e-64       1.000000e-64
    -fwdflatefwid       4       4
    -fwdflatlw      8.5     8.500000e+00
    -fwdflatsfwin       25      25
    -fwdflatwbeam       7e-29       7.000000e-29
    -fwdtree        yes     yes
    -hmm                    /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200
    -input_endian       little      little
    -jsgf                   
    -keyphrase              
    -kws                    
    -kws_delay      10      10
    -kws_plp        1e-1        1.000000e-01
    -kws_threshold      1       1.000000e+00
    -latsize        5000        5000
    -lda                    
    -ldadim         0       0
    -lifter         0       22
    -lm                 /home/andreas/Documents/Taledatabase/etc/tesLm.lm.DMP
    -lmctl                  
    -lmname                 
    -logbase        1.0001      1.000100e+00
    -logfn                  
    -logspec        no      no
    -lowerf         133.33334   2.000000e+02
    -lpbeam         1e-40       1.000000e-40
    -lponlybeam     7e-29       7.000000e-29
    -lw         6.5     6.500000e+00
    -maxhmmpf       30000       30000
    -maxwpf         -1      -1
    -mdef                   /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/mdef
    -mean                   /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/means
    -mfclogdir              
    -min_endfr      0       0
    -mixw                   /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/mixture_weights
    -mixwfloor      0.0000001   1.000000e-07
    -mllr                   
    -mmap           yes     yes
    -ncep           13      13
    -nfft           512     512
    -nfilt          40      15
    -nwpen          1.0     1.000000e+00
    -pbeam          1e-48       1.000000e-48
    -pip            1.0     1.000000e+00
    -pl_beam        1e-10       1.000000e-10
    -pl_pbeam       1e-10       1.000000e-10
    -pl_pip         1.0     1.000000e+00
    -pl_weight      3.0     3.000000e+00
    -pl_window      5       5
    -rawlogdir              
    -remove_dc      no      no
    -remove_noise       yes     yes
    -remove_silence     yes     yes
    -round_filters      yes     yes
    -samprate       16000       1.600000e+04
    -seed           -1      -1
    -sendump                
    -senlogdir              
    -senmgau                
    -silprob        0.005       5.000000e-03
    -smoothspec     no      no
    -svspec                 
    -tmat                   /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/transition_matrices
    -tmatfloor      0.0001      1.000000e-04
    -topn           4       4
    -topn_beam      0       0
    -toprule                
    -transform      legacy      dct
    -unit_area      yes     yes
    -upperf         6855.4976   3.500000e+03
    -uw         1.0     1.000000e+00
    -vad_postspeech     50      50
    -vad_prespeech      20      20
    -vad_startspeech    10      10
    -vad_threshold      2.0     2.000000e+00
    -var                    /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/variances
    -varfloor       0.0001      1.000000e-04
    -varnorm        no      no
    -verbose        no      no
    -warp_params                
    -warp_type      inverse_linear  inverse_linear
    -wbeam          7e-29       7.000000e-29
    -wip            0.65        6.500000e-01
    -wlen           0.025625    2.562500e-02
    
    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(518): Reading model definition: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/mdef
    INFO: bin_mdef.c(181): Allocating 146535 * 8 bytes (1144 KiB) for CD tree
    INFO: tmat.c(206): Reading HMM transition probability matrices: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/means
    INFO: ms_gauden.c(292): 374 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/variances
    INFO: ms_gauden.c(292): 374 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(354): 80 variance values floored
    INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 374
    INFO: acmod.c(119): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/means
    INFO: ms_gauden.c(292): 374 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/variances
    INFO: ms_gauden.c(292): 374 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(354): 80 variance values floored
    INFO: acmod.c(121): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/means
    INFO: ms_gauden.c(292): 374 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/variances
    INFO: ms_gauden.c(292): 374 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(354): 80 variance values floored
    INFO: ms_senone.c(149): Reading senone mixture weights: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/mixture_weights
    INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(207): Not transposing mixture weights in memory
    INFO: ms_senone.c(268): Read mixture weights for 374 senones: 1 features x 8 codewords
    INFO: ms_senone.c(320): Mapping senones to individual codebooks
    INFO: ms_mgau.c(141): The value of topn: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 60406 * 32 bytes (1887 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /home/andreas/Documents/Taledatabase/etc/test.dic
    INFO: dict.c(213): Allocated 500 KiB for strings, 819 KiB for phones
    INFO: dict.c(336): 56302 words read
    INFO: dict.c(358): Reading filler dictionary: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 8 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 58^3 * 2 bytes (381 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 81200 bytes (79 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 81200 bytes (79 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(456): Trying to read LM in trie binary format
    INFO: ngram_model_trie.c(467): Header doesn't match
    INFO: ngram_model_trie.c(189): Trying to read LM in arpa format
    INFO: ngram_model_trie.c(205): LM of order 3
    INFO: ngram_model_trie.c(207): #1-grams: 16002
    INFO: ngram_model_trie.c(207): #2-grams: 65457
    INFO: ngram_model_trie.c(207): #3-grams: 95363
    INFO: lm_trie.c(399): Training quantizer
    INFO: lm_trie.c(407): Building LM trie
    INFO: ngram_search_fwdtree.c(99): 900 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 58 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 58 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 71380
    INFO: ngram_search_fwdtree.c(339): after: 885 root, 71252 non-root channels, 56 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(305): pocketsphinx_continuous COMPILED ON: Nov  8 2015, AT: 19:42:05
    
    ERROR: "continuous.c", line 136: Input audio file has sample rate [8000], but decoder expects [16000]
    FATAL: "continuous.c", line 165: Failed to process file '/home/andreas/Documents/Taledatabase/wav/soundfile_1.wav' due to format mismatch.
    
     
    • Nickolay V. Shmyrev

      You need to add -samprate 8000 to configure decoder to process 8khz data.

      It has no relation to the model.

       
      • Andreas Ravndal

        Andreas Ravndal - 2016-02-03

        thanks! that worked, but how come that the accuracy is very low even though I got 100% correct on this audiofile in the decodoing part of my training?

        har du  noen gang sett stokkmaur ogsaa kalt hestemaur plageaanden som liker aa  spise seg inn i   treverket  (user-SOUNDFILE_1)
        har du  noen gang sett stokkmaur ogsaa kalt hestemaur plageaanden som liker aa  spise seg inn i   treverket  (user-SOUNDFILE_1)
        Words: 18 Correct: 18 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
        Insertions: 0 Deletions: 0 Substitutions: 0
        

        Here is what i get using pocketsphinx on same audiofile:
        FOR AA GJOERE OM PAA HOEYRE OG ER FOR DAARLIG REGJERINGEN

         
        • Nickolay V. Shmyrev

          You need to configure cmninit value in feat.params in the model, add a line -cmninit 40,3,-1 in a text editor.

           
          • Andreas Ravndal

            Andreas Ravndal - 2016-02-03

            ok, I tried that and it did not make any difference. I still get FOR AA GJOERE OM PAA HOEYRE OG ER FOR DAARLIG REGJERINGEN

             
            • Nickolay V. Shmyrev

              Ok, you are welcome to provide all the files - model training folder, audio file you are trying to decode, pocketsphinx log.

               
  • Andreas Ravndal

    Andreas Ravndal - 2016-02-03

    Ok, here are the files. Thank you so much for your help Nickolay! May I ask what cminit defines and how you came up with these numbers?

     
    • Nickolay V. Shmyrev

      You'd better share the model training folder too.

       
      • Andreas Ravndal

        Andreas Ravndal - 2016-02-03

        When you say model training folder do you mean the model_parameters folder? Or all the folders generated by the training(fat,trees,qmanage...)?

         
        • Nickolay V. Shmyrev

          All the folders

           
          • Andreas Ravndal

            Andreas Ravndal - 2016-02-04

            You can download the folders here: http://we.tl/eB4dh89mkF

             
          • Andreas Ravndal

            Andreas Ravndal - 2016-02-08

            are all the folders you need in the link? or have I forgotten something this time as well? XD

             

            Last edit: Andreas Ravndal 2016-02-08
            • Nickolay V. Shmyrev

              Hello

              The following line should give you good results:

               pocketsphinx_continuous -infile soundfile_1.wav -lm tesLm.lm.DMP -dict test.dic -hmm test.cd_cont_200 -samprate 8000 -cmninit 40,3,-1 -beam 1e-80 -pbeam 1e-80 -wbeam 1e-40 -lw 8.0 -vad_prespeech 50 -vad_postspeech 100
              

              The things is that your model is pretty small and not very stable in unseen conditions. Second issue is that batch training handles silence quite differently from continuous decoding. Both try to remove silence but the effect might be slightly different for both, training handles your file as a single utterance, continuous tries to split on many utterances.

              You need more data for training and bigger model basically.

               
              • Andreas Ravndal

                Andreas Ravndal - 2016-02-09

                I know, but my resources are limited since there is not many freely available speech databases for norwegian out there =/ So I have to just do the best out of the resources I have. Would you recommend training a ptm model instead of a continuous model?

                 
                • Nickolay V. Shmyrev

                  There is very big Norwegian database availalbe here:

                  http://www.nb.no/sprakbanken/show?serial=oai%3Anb.no%3Asbr-13&lang=en

                  You could work on that, there are also Swedish and Danish corpora.

                   
                  • Andreas Ravndal

                    Andreas Ravndal - 2016-02-09

                    I know, but that database is incomplete(since NST went bakrupt) and som studends tried to worki it out last year, but they did not succeed training an acoustic model with that database. Of course i could give it another shot, and I am probably going to since the speech database I use know(Database from the same resource site you linked above. produced by a company named Lingit) seems to be to small for any practical use. Thank you for your help so far!

                     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.