how can i time align a given audio file using sphinx-3. I was using sphinx-2 before and it had sphinx2-align to do this. What is the equivalent to this in sphinx-3 ?
I have the audio files and the corresponding text. I want to time align the words.
thanks
mj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-03-07
I have successfully used s3align for force-aligning; is that the same (or close enough) to what you mean by time-aligning? See my 2003-02-21 posting under "Force-aligning for Sphinx2 models?" in this forum.
Alternatively, I believe there is a timealign program in one of the Sphinx3 decoders, but I don't know details.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Alternatively, I believe there is a timealign program in one of the Sphinx3 decoders, but I don't know details.
I tried using time-align with Sphinx2 models. Carl Quillen provided me with some Sphinx3 scripts which I was hoping to use to build Sphinx2-format models, but I couldn't get it to work. I also tried hacking some of the code and scripts that build the Sphinx2 models to get it to work with time-align, but I couldn't get that to work either. (It had to do with matrix dimension problems. I could never get the dimensions just right and probably would actually need to understand the algorithms to do that :-(.)
I'm hoping to find time to try the s3align thing.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-03-07
I dont need to do any recognition. I have a bunch of sound files and their corresponding transcripts (sentences uttered in those files) I need to get the time stamps for each word in a sound file.
sphinx2-batch had options for specifying the file names in a control file (-ctlfn) and also the corresponding transcripts in another file (-tactlfn).
madan.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-03-07
See the archive_s3/et94-align.csh file for arguments for running s3align. Use -insentfn for the transcriptions and -ctlfn for the list of utterances. Use -wdsegdir to specify a directory in which to write word segmentations, which look like:
SFrm EFrm SegAScr Word
0 15 1375896 <s>
16 55 3918595 WE
56 101 4015687 WE
102 143 5338447 <sil>
144 165 304749 HAVE
166 218 2888821 FARM
219 242 537819 TWO
243 286 5335998 HOURS
287 339 5461385 AWAY
340 356 1970228 </s>
Total score: 31147625
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
how can i time align a given audio file using sphinx-3. I was using sphinx-2 before and it had sphinx2-align to do this. What is the equivalent to this in sphinx-3 ?
I have the audio files and the corresponding text. I want to time align the words.
thanks
mj
I have successfully used s3align for force-aligning; is that the same (or close enough) to what you mean by time-aligning? See my 2003-02-21 posting under "Force-aligning for Sphinx2 models?" in this forum.
Alternatively, I believe there is a timealign program in one of the Sphinx3 decoders, but I don't know details.
> Alternatively, I believe there is a timealign program in one of the Sphinx3 decoders, but I don't know details.
I tried using time-align with Sphinx2 models. Carl Quillen provided me with some Sphinx3 scripts which I was hoping to use to build Sphinx2-format models, but I couldn't get it to work. I also tried hacking some of the code and scripts that build the Sphinx2 models to get it to work with time-align, but I couldn't get that to work either. (It had to do with matrix dimension problems. I could never get the dimensions just right and probably would actually need to understand the algorithms to do that :-(.)
I'm hoping to find time to try the s3align thing.
I dont need to do any recognition. I have a bunch of sound files and their corresponding transcripts (sentences uttered in those files) I need to get the time stamps for each word in a sound file.
sphinx2-batch had options for specifying the file names in a control file (-ctlfn) and also the corresponding transcripts in another file (-tactlfn).
madan.
See the archive_s3/et94-align.csh file for arguments for running s3align. Use -insentfn for the transcriptions and -ctlfn for the list of utterances. Use -wdsegdir to specify a directory in which to write word segmentations, which look like:
SFrm EFrm SegAScr Word
0 15 1375896 <s>
16 55 3918595 WE
56 101 4015687 WE
102 143 5338447 <sil>
144 165 304749 HAVE
166 218 2888821 FARM
219 242 537819 TWO
243 286 5335998 HOURS
287 339 5461385 AWAY
340 356 1970228 </s>
Total score: 31147625