CMU Sphinx / Forums / Help: Strage segment frame bounds

kriomant - 2016-05-14

Hello. I use keyword search and get word bounds after it is found. But frames returned by ps_seg_frames are too big compairing to number of frames processed.

I have made small program reproducing problem: https://yadi.sk/i/_JTvOD10ri6dt
It uses language data from https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Russian/zero_ru_cont_8k_v3.tar.gz/download
and this raw sound file: https://yadi.sk/d/EApMmoEari6du

Program output on my machine is:

hyp: железяка
n_frames: 515
samples total: 48128
sample rate: 8000.000000
frame rate: 100
word: железяка, frames: 887-940

I.e. word frames are larger than number of frames processed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-14
  
  There was no need to start another thread, you could continue in https://sourceforge.net/p/cmusphinx/discussion/help/thread/8d848336/
  
  To calculate the offset properly you need to track and restart utterances with ps_in_speech/ps_end_utt/ps_start_utt, like in continuous.c.
  
  Also, in your code you do not need
  
  "-lm", "/Users/kriomant/Downloads/zero_ru_cont_8k_v3/ru.lm",
  
  it just slows down loading
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

kriomant - 2016-05-15

I did look at continuous.c, but I still don't see where I am wrong. I also call ps_start_utt and ps_end_utt, and stop processing after first hypothesis. If frame offsets were lower than expected, then I may thought that I have to ignore data for which ps_get_in_speech returns 0 or something like this. But frame offsets are larger than ps_n_frames and larger than (number_of_samples_fed / sample_rate * frame_rate), how can it be?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-15
  
  This is how pocketsphinx works, you need to restart utterance on every silence.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - kriomant - 2016-05-16
    
    Thanks, this seems to resolve my problem. Is this true for every mode, not only keyword search? Where is this documented?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - kriomant - 2016-05-27
    
    Something strange happens.
    
    After using this approach (restarting utterance after each silence) my program stopped recognizing keyword. Every time after silence is detected hypotesis is empty. While it recognizes keyword if data are fed continously until hypotesis found.
    
    test.cc — start_end() restarts utterance after silence and cont() doesn't.
    synth.wav — source audio
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Strage segment frame bounds

Speech Recognition Toolkit

Forums

Help

Strage segment frame bounds document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Strage segment frame bounds