using sphinx for auto-segmentation

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

using sphinx for auto-segmentation

Forum: Help

Creator: Ivan Uemlianin

Created: 2004-10-06

Updated: 2012-09-22

Ivan Uemlianin - 2004-10-06

I'm looking into tools for automatic segmentation - i.e. automatic phonemic segmentation of a speech file. How feasible/sensible would it be to use sphinx to do this?

I envisage training an AM with some phonologically transcribed data (I would hack the pronunciation dictionary to allow phonological transcription). Sphinx 2 would do a timestamped phonemic segmentation by running it in allphone and timealign modes. Do Sphinxes 3 & 4 have these modes (or equivalent)?

How much training data would I need to get a usable transcription? (Doesn' t have to be perfect, but should be better rather than worse than nothing).

Is it just a stupid idea?

Are there any tools for autosegmentation out there (apart from SFS)?

NB The training data has been transcribed with timestamps (e.g. with Praat). Can sphinxTrain use the time information, or should I cut the speech data into lots of tiny files (say, a second long)?

All comments welcome.

Thanks

Ivan

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.