Menu

Strage segment frame bounds

Help
kriomant
2016-05-14
2016-05-27
  • kriomant

    kriomant - 2016-05-14

    Hello. I use keyword search and get word bounds after it is found. But frames returned by ps_seg_frames are too big compairing to number of frames processed.

    I have made small program reproducing problem: https://yadi.sk/i/_JTvOD10ri6dt
    It uses language data from https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Russian/zero_ru_cont_8k_v3.tar.gz/download
    and this raw sound file: https://yadi.sk/d/EApMmoEari6du

    Program output on my machine is:

    hyp: железяка
    n_frames: 515
    samples total: 48128
    sample rate: 8000.000000
    frame rate: 100
    word: железяка, frames: 887-940

    I.e. word frames are larger than number of frames processed.

     
    • Nickolay V. Shmyrev

      There was no need to start another thread, you could continue in https://sourceforge.net/p/cmusphinx/discussion/help/thread/8d848336/

      To calculate the offset properly you need to track and restart utterances with ps_in_speech/ps_end_utt/ps_start_utt, like in continuous.c.

      Also, in your code you do not need

      "-lm", "/Users/kriomant/Downloads/zero_ru_cont_8k_v3/ru.lm",
      

      it just slows down loading

       
  • kriomant

    kriomant - 2016-05-15

    I did look at continuous.c, but I still don't see where I am wrong. I also call ps_start_utt and ps_end_utt, and stop processing after first hypothesis. If frame offsets were lower than expected, then I may thought that I have to ignore data for which ps_get_in_speech returns 0 or something like this. But frame offsets are larger than ps_n_frames and larger than (number_of_samples_fed / sample_rate * frame_rate), how can it be?

     
    • Nickolay V. Shmyrev

      This is how pocketsphinx works, you need to restart utterance on every silence.

       
      • kriomant

        kriomant - 2016-05-16

        Thanks, this seems to resolve my problem. Is this true for every mode, not only keyword search? Where is this documented?

         
      • kriomant

        kriomant - 2016-05-27

        Something strange happens.

        After using this approach (restarting utterance after each silence) my program stopped recognizing keyword. Every time after silence is detected hypotesis is empty. While it recognizes keyword if data are fed continously until hypotesis found.

        test.cc — start_end() restarts utterance after silence and cont() doesn't.
        synth.wav — source audio

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.