Menu

first decode and second decode dont match

Help
skatz_teyp
2007-11-23
2012-09-22
  • skatz_teyp

    skatz_teyp - 2007-11-23

    hello again,

    now that i finished creating my language model (thanks nickolay for your help), im using it now using sphinx3_continuous... i tried decoding a file but its hypothesis is too far from the original (lets call this hypothesis A)... so i tried decoding it with another copy... that is, i decode a the file and a copy of it in batch and i come up with two hypothesis... the first one was the same when i decode only one file (hypothesis A) but the second hypothesis (lets call this hypothesis B) is not the same as the first one but its accuracy is greater than the first one.... i tried it with more than 2 copies.. still the first hypothesis is the same as the first hypothesis (hypothesis A) of the previous tests but the second to the last hypothesis are the same and having greater accuracy (hypothesis B)... so now i wonder whats going on at the first time of decoding?

     
    • Nickolay V. Shmyrev

      Something wrong with initial states in your lm, no? I suspect they are initialized first with some values and on second utterance latest ngram state is used.

       
      • skatz_teyp

        skatz_teyp - 2007-11-23

        umm... im using both the language model i created and the language model that can be downloaded at the CMU Sphinx Open Source Model website... both language models yield results that don't match.... im using the sphinx3_continuous without any code modification...

         
        • Nickolay V. Shmyrev

          Can you please share wav files and scripts to start sphinx3_continuous so we can try them?

           
          • skatz_teyp

            skatz_teyp - 2007-12-03

            umm, nickolay, have you tried it using the files and configuration i gave you? anything you found?

             
            • Nickolay V. Shmyrev

              Yes, I tried, no time to look for explanation yet, sorry. I'll try to look this week.

               
              • skatz_teyp

                skatz_teyp - 2007-12-04

                okay thanks... ill keep track of this forum then...

                 
                • Nickolay V. Shmyrev

                  Ok, I've looked at this. It looks that random dither noise applied during feature extraction is the reason. If you will decode mfc files result will be the same.

                   
          • skatz_teyp

            skatz_teyp - 2007-11-24

            this is the raw audio file. 16khz sample rate, 16 bits per sample, 1 channel (mono):
            http://rapidshare.com/files/71864506/arctic_a0001-sin.raw

            the configuration file looks like this:

            -mdef .\hub4opensrc.6000.mdef
            -senmgau .cont.
            -mean .\means
            -var .\variances
            -mixw .\mixture_weights
            -tmat .\transition_matrices
            -feat 1s_c_d_dd
            -wbeam 1e-100
            -dict .\cmudict.06d
            -fdict .\fillerdict
            -lm .\language_model.arpaformat.DMP
            -ctloffset 0
            -ctlcount 600
            -agc none
            -varnorm no
            -lw 13
            -wip 0.2
            -hyp .\test.match
            -cmn current
            -hypseg .\test.hypseg

            the acoustic model used is the hub4 opensource model 6000 senones and the language model, dictionary and filledict are the same as the once found in the open source language model in the cmu sphinx site..

            my control file have this:
            arctic_a0001-sin
            arctic_a0001-sin2 //i copied the original file and renamed it to this...

            so basically this is the control file, the config file and raw audio file is used to produce this output:

            WELL THERE COULD YOUR TRAIL PHILLIPS DEALS THAT ARE OUT (arctic_a0001-sin_0.624)
            A THIRD OF THE DANGER TRAIL PHILIP'S DEALS THAT CETERA (arctic_a0001-sin2_0.624)

            and the correct transcript for this is

            AUTHOR OF THE DANGER TRAIL PHILIP STEELS ET CETERA

            by the way, i'm compiling sphinx3_continuous using visual studio 2005 (it that makes a difference) and am running on a windows xp sp2...

             
    • skatz_teyp

      skatz_teyp - 2007-12-19

      so all i have to do input mfc files instead of raw files? but sphinx3_continuous only accept raw files... how can i change it to accept mfc files?

       
      • Nickolay V. Shmyrev

        It depends on what you are trying to get. If you want to use mfc you can use another variant of the decoder, say sphinx3_decode for example. But it won't give you much except the advantage of having the same output from the same data. Alternatively, you can disable dither and you'll get the same result with the same raw files.

        The point is that recognition is very unstable, even small noise affects quality.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.