I'm looking into tools for automatic segmentation - i.e. automatic phonemic segmentation of a speech file. How feasible/sensible would it be to use sphinx to do this?
I envisage training an AM with some phonologically transcribed data (I would hack the pronunciation dictionary to allow phonological transcription). Sphinx 2 would do a timestamped phonemic segmentation by running it in allphone and timealign modes. Do Sphinxes 3 & 4 have these modes (or equivalent)?
How much training data would I need to get a usable transcription? (Doesn' t have to be perfect, but should be better rather than worse than nothing).
Is it just a stupid idea?
Are there any tools for autosegmentation out there (apart from SFS)?
NB The training data has been transcribed with timestamps (e.g. with Praat). Can sphinxTrain use the time information, or should I cut the speech data into lots of tiny files (say, a second long)?
All comments welcome.
Thanks
Ivan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm looking into tools for automatic segmentation - i.e. automatic phonemic segmentation of a speech file. How feasible/sensible would it be to use sphinx to do this?
I envisage training an AM with some phonologically transcribed data (I would hack the pronunciation dictionary to allow phonological transcription). Sphinx 2 would do a timestamped phonemic segmentation by running it in allphone and timealign modes. Do Sphinxes 3 & 4 have these modes (or equivalent)?
How much training data would I need to get a usable transcription? (Doesn' t have to be perfect, but should be better rather than worse than nothing).
Is it just a stupid idea?
Are there any tools for autosegmentation out there (apart from SFS)?
NB The training data has been transcribed with timestamps (e.g. with Praat). Can sphinxTrain use the time information, or should I cut the speech data into lots of tiny files (say, a second long)?
All comments welcome.
Thanks
Ivan