I'm trying to use an audio file and a text transcript to create a caption file by using Spninx4 to determine the time each word is said. I see that I can get frame numbers from the nodes in the resulting lattice. How do I convert frames into milliseconds? Is there a frame length property somewhere that I am missing?
Also, is it possible to have the recognizer return a single result set for a two minute audio clip? Right now I get a half dozen results or so, which I can work with, but a single lattice would be easier.
Thanks,
Terry Luedtke
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> I see that I can get frame numbers from the nodes in the resulting lattice. How do I convert frames into milliseconds?
1 frame = 10 milliseconds
>Also, is it possible to have the recognizer return a single result set for a two minute audio clip? Right now I get a half dozen results or so, which I can work with, but a single lattice would be easier.
It's bad because of memory usage reasons.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I'm trying to use an audio file and a text transcript to create a caption file by using Spninx4 to determine the time each word is said. I see that I can get frame numbers from the nodes in the resulting lattice. How do I convert frames into milliseconds? Is there a frame length property somewhere that I am missing?
Also, is it possible to have the recognizer return a single result set for a two minute audio clip? Right now I get a half dozen results or so, which I can work with, but a single lattice would be easier.
Thanks,
Terry Luedtke
Thank you Nickolay for the information.
> I see that I can get frame numbers from the nodes in the resulting lattice. How do I convert frames into milliseconds?
1 frame = 10 milliseconds
>Also, is it possible to have the recognizer return a single result set for a two minute audio clip? Right now I get a half dozen results or so, which I can work with, but a single lattice would be easier.
It's bad because of memory usage reasons.