a wired problem about detected boundaries

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

a wired problem about detected boundaries

Forum: Help

Creator: tfpeach

Created: 2016-02-18

Updated: 2016-02-19

tfpeach - 2016-02-18

Hi, dear all,

I trained a mono-phone(CI stage) ASR system on the syllable level (the data amount is about 0.15 hour so it is too small to train tri-phone system. The basic unit is syllable rather than phoneme). The performance is OK, about 6.7% WER. But the output deteced boundary seems to have a wired problem that it has a leading shift.

Please look at the spectrogram of three samples and their corresponding detection. The example is shown in the picture in the next post. You can see that the first syllable "PA" always leads to the audio. Starting from the second syllable "TA", the boundary seems nice fitting the audio track. The deteced boundary seems to be shifted one segment leading to the audio track.

Could anyone tell me why I have this phenomenon? Is that because I only have the CI system rather than CD system?

Thank you.

Last edit: tfpeach 2016-02-18

Screenshot from 2016-02-18 15:59:00.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.