Menu

Maximum size for raw audio format?

Help
chotty
2007-11-29
2012-09-22
  • chotty

    chotty - 2007-11-29

    Hi all,

    is there a maximum size for my audio input file which I want to have a transcript of?
    sphinx3_livepretend ends with the message
    lt-sphinx3_livepretend: lextree.c:1262: lextree_hmm_eval: Assertion `((hmm_t *)(ln))->frame == frm' failed.
    when I use bigger input files (>150 MB), so I wonder if this could be the reason for the error.

    Any ideas?

    Thanks

     
    • David Huggins-Daines

      Hi, you probably want to use sphinx3_continuous instead - it will segment your input into individual utterances and do recognition on these independently.

      There is an actual hard-coded limit, and I'm a bit surprised that sphinx3 isn't complaining about it before it reaches that assertion. I believe the limit is 150 seconds, which is well beyond the amount of speech any human can produce without pausing to breathe :-)

       
    • chotty

      chotty - 2007-11-30

      Thanks for the hint!
      I assume I have to execute sphinx3_continuous using the following parameters?

      ctrlfile - file with list of input files (batch)
      rawdir - directory where file in above list are to be found.
      cfgfile - file with config params.

      Unfortunately, when I do so, the system stops without starting the recognition. My last output is:

      [...]
      INFO: Operation Mode = 4, Operation Name = fwdtree
      INFO:
      INFO: s3_decode.c(267): Input data will NOT be byte swapped
      INFO: s3_decode.c(272): Partial hypothesis WILL be dumped
      ERROR: "cont_ad_base.c", line 718: cont_ad_read requires buffer of at least 83845 samples
      INFO: corpus.c(647): 05-11-07: -0.0 sec CPU, 0.0 sec Clk; TOT: -0.0 sec CPU, 0.0 sec Clk

      INFO: stat.c(223): SUMMARY: 0 fr , No report

      Any ideas what could be the problem?

       
      • Nickolay V. Shmyrev

        Hm, reproducable for me, it really looks like a bug at least it worked before.

         
        • Nickolay V. Shmyrev

          I've just committed a fix to sphinxbase, please update from svn and try again, now everything should be fine.

           
          • chotty

            chotty - 2007-12-03

            Thanks, it's running now without any error message. But what does it exactly do now, does it automatically segment my file into smaller uterances or does it just stop once it reaches the maximum size of an utterance and ignores the rest?
            Right now, my recognition is too bad so I can't judge from the output... Just looking at the number of recognised words, I doubt that it goes through the whole audio file!?

             
            • Nickolay V. Shmyrev

              For me it automatically segments the file by pauses in speech and decode each chunk. If it's not, please provide audio file and model parameters you are using

               
              • chotty

                chotty - 2007-12-04

                Just noticed that I was actually wrong. The system now automatically divides my input into the lattice files
                30-10-07_0.256.lat.gz
                30-10-07_10.912.lat.gz
                30-10-07_195.984.lat.gz
                30-10-07_290.064.lat.gz
                30-10-07_31.984.lat.gz
                30-10-07_78.000.lat.gz
                but still still crashes with

                ..lt-sphinx3_continuous: lextree.c:1262: lextree_hmm_eval: Assertion `((hmm_t *)(ln))->frame == frm' failed.

                (Find my output file including my settings here: http://us.share.geocities.com/ww.ranger/30-10-07.txt)

                I don't know what I am doing wrong :( Do you mind trying with my settings and my audio file (http://tinyurl.com/39vsfv right mouse click and download, approx. 60 MB) on your system?

                 
                • chotty

                  chotty - 2007-12-04

                  Mh, the output file link is broken now, try this one please http://tinyurl.com/2naqxo

                   
                  • Nickolay V. Shmyrev

                    Oh, your file has a lot of music and noise. Decoding such files will be very-very challenging task. Probably someone else will suggest something but I need a time to think :)

                    About splitting. Usually clean speech has pauses with very significant energy drops. By such silence regions sphinx3_continuous splits file on utterances and decodes each one. If your file has music chunks will be too large for a single decoder pass thus you probably might try to segment speech manually and pass small chunks to the decoder. For example try adcin tool from julius speech recognizer, probably it will split speech on chunks better.

                     
                    • chotty

                      chotty - 2007-12-06

                      Hi guys,

                      first of all, thanks a lot for your permanent feedback!!

                      Now, to my question ;-)
                      I decided to make a simple test run and divide the video in a "dump" way every 30 seconds. Do I therefore have to divide my wav file physically or can I just enter in my control file where it should be divided?

                      On http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.html#sec_ctl , it says I can use the format

                      AudioFile [ StartFrame EndFrame UttID ]

                      However, if I do so, StartFrame, EndFrame and UttID are completely ignored and it tool takes the whole audio file as input (and divides it then based on silence).

                      What am I doing wrong there now?

                      Thanks!!

                       
                    • David Huggins-Daines

                      Ahh. Yes, you need something more sophisticated than the simple speech/silence based endpointer used by sphinx3_continuous.

                      LIUM in France has contributed a segmenter, which they have used successfully in French broadcast news transcription, which you can find at:
                      http://www-lium.univ-lemans.fr/tools/index.php?option=com_content&task=blogcategory&id=29&Itemid=56

                      However I haven't actually tried to use it on anything yet so I can't answer any questions about it at the moment.

                       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.