There are twenty relatively easy ways to make the decoding faster. Our famous Arthur even developed so called 4-level framework to cathegorize all possible methods
It all depends on vocabulary, your programming skills and many more factors. Things like phoneme lookahead or kdtrees are very useful. And I'm not even sure you are using s3 properly, there are so many configuration tweaks which are required before you start to optimize speed.
The first step would be probably to have a right configuration and then profile it to find out which component takes more time than needed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry for sounding like an idiot, but I'm looking for "dumb" ways to speed up decoding. I've got 20k files for which decoding takes hours.
I was thinking on the line of breaking up control file (*.fileid) into small chunks and running separate instance of sphinx3_decode on each of them and finally combining all the hyp files. Does this sound ok?
(I do appreciate information you gave about the finer ways to speed up, I will look into them.)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First of all, I noticed that your problem seems to be batch processing a set of files. You should be careful, in those cases, lossy techniques such as pruning could increase word error rate. If you are still developing your system, it might not be the best way. Chances are your sped up number would blur your understanding of WER.
For that reason, sphinx3 is tuned to be slower than 1xRT out-of-the-box because my interest has been running it slow and understand its accuracy characteristics.
Now if you are okay with this speed-accuracy trade-off though, then tuning sphinx3_decode will give you quite a bit of speed-up. Other than my paper Nick quote. (Thanks!) You should also read a more practical guide such as "The Incomplete Guide to Sphinx-3 Performance Tuning". http://cmusphinx.sourceforge.net/wiki/decodertuning . I found it very closed to what I did back then. If I a chance, I might write up what I really did in the paper.
You also mentioned cutting the file into smaller chunks previously. That is also a valid approach. But it depends on the speech density of your waveforms, suppose you have speech which filled the wavefiles. It probably doesn't help much in terms of processing time.
Of course, breaking down the speech also means you were not applying language model spanning different subchunks. Again, it will affect your WER.
In any case, hope this answers your question. Good Luck! :)
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there a way to speed up sphinx3_decode? I have 4 GB RAM, 4 core intel processor. I am looking for solutions apart from changing beamwidth.
Dude you could ask something easier.
There are twenty relatively easy ways to make the decoding faster. Our famous Arthur even developed so called 4-level framework to cathegorize all possible methods
http://www.cs.cmu.edu/~archan/papers/icslp2004.ps
It all depends on vocabulary, your programming skills and many more factors. Things like phoneme lookahead or kdtrees are very useful. And I'm not even sure you are using s3 properly, there are so many configuration tweaks which are required before you start to optimize speed.
The first step would be probably to have a right configuration and then profile it to find out which component takes more time than needed.
Sphinx3 provides very detailed reports on the time spent on each decoding component, you need to study those first of all
Sorry for sounding like an idiot, but I'm looking for "dumb" ways to speed up decoding. I've got 20k files for which decoding takes hours.
I was thinking on the line of breaking up control file (*.fileid) into small chunks and running separate instance of sphinx3_decode on each of them and finally combining all the hyp files. Does this sound ok?
(I do appreciate information you gave about the finer ways to speed up, I will look into them.)
yes
Hey Dovark,
Sorry for the late reply.
First of all, I noticed that your problem seems to be batch processing a set of files. You should be careful, in those cases, lossy techniques such as pruning could increase word error rate. If you are still developing your system, it might not be the best way. Chances are your sped up number would blur your understanding of WER.
For that reason, sphinx3 is tuned to be slower than 1xRT out-of-the-box because my interest has been running it slow and understand its accuracy characteristics.
Now if you are okay with this speed-accuracy trade-off though, then tuning sphinx3_decode will give you quite a bit of speed-up. Other than my paper Nick quote. (Thanks!) You should also read a more practical guide such as "The Incomplete Guide to Sphinx-3 Performance Tuning". http://cmusphinx.sourceforge.net/wiki/decodertuning . I found it very closed to what I did back then. If I a chance, I might write up what I really did in the paper.
You also mentioned cutting the file into smaller chunks previously. That is also a valid approach. But it depends on the speech density of your waveforms, suppose you have speech which filled the wavefiles. It probably doesn't help much in terms of processing time.
Of course, breaking down the speech also means you were not applying language model spanning different subchunks. Again, it will affect your WER.
In any case, hope this answers your question. Good Luck! :)
Arthur