I am not change any configuration settings. By default these are the parameters in pocket sphinx in feature vector computation.
Frame rate=100,
window length=0.0256
From these I have calculated number of samples/frame=0.025616000=410 samples/frame
window shift=Fs/ frame rate= 16000/100=160
For every 160 samples there will be shift in window. For the last word </s> (4.6 s- 4.8 s) onset and offset time. But the total length of audio signal is showing around 6.195 sec. There is a mismatch. Please help me if am wrong. Thank you.
Last edit: Diwakar.G 2017-01-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is some problem with feature extraction with wav files. When I try to do feature extraction with .sph files with nist header the wavfile length and feature vectors are matched but for .wav files with RIFF header the length of wavfile and feature vectors are not matching.
The wavfile is actually 5.12s instead of getting 512 or 511 frames I am getting only 401 frames. The same problem occured for all wavfiles. Please help me.
Last edit: Diwakar.G 2017-01-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By default feature extractor removes silence. You can add -remove_silence no to sphinx_fe to disable that but large silence in audio is harmful for other reasons.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is the sphinx3_align results. Now, I am trying to convert frame number to corresponding onset and offset time of words.
I am not change any configuration settings. By default these are the parameters in pocket sphinx in feature vector computation.
Frame rate=100,
window length=0.0256
From these I have calculated number of samples/frame=0.025616000=410 samples/frame
window shift=Fs/ frame rate= 16000/100=160
For every 160 samples there will be shift in window. For the last word
</s>(4.6 s- 4.8 s) onset and offset time. But the total length of audio signal is showing around 6.195 sec. There is a mismatch. Please help me if am wrong. Thank you.Last edit: Diwakar.G 2017-01-10
There is some problem with feature extraction with wav files. When I try to do feature extraction with .sph files with nist header the wavfile length and feature vectors are matched but for .wav files with RIFF header the length of wavfile and feature vectors are not matching.
The wavfile is actually 5.12s instead of getting 512 or 511 frames I am getting only 401 frames. The same problem occured for all wavfiles. Please help me.
Last edit: Diwakar.G 2017-01-10
By default feature extractor removes silence. You can add
-remove_silence noto sphinx_fe to disable that but large silence in audio is harmful for other reasons.