Ivan Uemlianin - 2004-10-06

I'm looking into tools for automatic segmentation - i.e. automatic phonemic segmentation of a speech file. How feasible/sensible would it be to use sphinx to do this?

I envisage training an AM with some phonologically transcribed data (I would hack the pronunciation dictionary to allow phonological transcription). Sphinx 2 would do a timestamped phonemic segmentation by running it in allphone and timealign modes. Do Sphinxes 3 & 4 have these modes (or equivalent)?

How much training data would I need to get a usable transcription? (Doesn' t have to be perfect, but should be better rather than worse than nothing).

Is it just a stupid idea?

Are there any tools for autosegmentation out there (apart from SFS)?

NB The training data has been transcribed with timestamps (e.g. with Praat). Can sphinxTrain use the time information, or should I cut the speech data into lots of tiny files (say, a second long)?

All comments welcome.

Thanks

Ivan