I just tried out the demos, and wondering how accurately can Sphinx perform.
The Transcriber demo is very accurate with the numbers (the .wav file
10001-90210-01803.wav). However, the LatticeDemo interpreted the same wav file
very inaccurately. I am using the language model wsj5kc.
My goal is to generate subtitles for movies. Is Sphinx accurate enough for
this task? I'm wondering whether using a larger language model like HUB4 can
help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-08-17
hi mshmyrev,
I am hoping it is accurate enough to generate subtitles for movies, or at
least catch phrases in the talk.
but the lattice demo is not able to find any correct phrases. so I am
wondering whether it underperformed
because the language model is too small, or maybe it has reached the limit.
Is sphinx capable of doing this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
They were actually mp3 files. I converted them into 16-bits audio before
feeding into the lattice demo.
(audio format viewed from ffmpeg: Stream #0.0: Audio: pcm_s16le, 16000 Hz, 1
channels, s16, 256 kb/s)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just tried out the demos, and wondering how accurately can Sphinx perform.
The Transcriber demo is very accurate with the numbers (the .wav file
10001-90210-01803.wav). However, the LatticeDemo interpreted the same wav file
very inaccurately. I am using the language model wsj5kc.
My goal is to generate subtitles for movies. Is Sphinx accurate enough for
this task? I'm wondering whether using a larger language model like HUB4 can
help.
You forgot to define what is "enough"
hi mshmyrev,
I am hoping it is accurate enough to generate subtitles for movies, or at
least catch phrases in the talk.
but the lattice demo is not able to find any correct phrases. so I am
wondering whether it underperformed
because the language model is too small, or maybe it has reached the limit.
Is sphinx capable of doing this?
The sound files that I used for testing are taken from there:
http://www.wavlist.com/movies/043/index.html
They were actually mp3 files. I converted them into 16-bits audio before
feeding into the lattice demo.
(audio format viewed from ffmpeg: Stream #0.0: Audio: pcm_s16le, 16000 Hz, 1
channels, s16, 256 kb/s)
movie files generally have background music and their quality seems low.
so i don't think that the results will be good enough for you.