I managed to build sphinxbase and pocketsphinx, and it recognizes my voice with perhaps 30-50% accuracy. So, I need to improve this, and I started to look at sphinxtrain. I seem to have succeeded with this step, but I'm not able to run it.
I should note that when I run pocketsphinx, I manually set LD_LIBRARY_PATH to ensure that it finds the supporting elements.
Also, I'm making guesses on how to run sphinxtrain, as I have not found any documentation (If you can point me at that, that would be a good thin, I"m sure!)
So far, I have attempted to run <build>/scripts/sphinxtrain and also to run this with the argument "run".</build>
When I run it solo, it prints out a usage message (which is how I came to the argument "run"). When I run it with "run" it complains:
Failed to find sphinxtrain binaries. Check your installation
But I built it with:
./autogen.sh && make && sudo make install
(which is also how I built all the sphinx related binaries.
If I run it under strace, I see this error:
Can't open perl script "SPHINXTRAINDIR/scripts/000.comp_feat/make_feats.pl": No such file or directory
Can't open perl script "SPHINXTRAINDIR/scripts/000.comp_feat/make_feats.pl": No such file or directory
suggesting that somehow there's a ??variable?? sphinxtrain_dir that I need to set somehow (the scripts are present in the same subdirectory as the sphinxtrain script)
Anyway, assuming I get past this point, I would also like to understand:
1) How much training am I likely to have to give to get this up to 80-plus-percent accuracy? Is that likely to be possible by this route? If it's not going to get there, then I should probably not invest the time trying.
2) How do I train it? Will it be obvious when I get this command running? Since I don't see documentation, I'm a bit concerned that there might be some complex training process that I won't know about if it's not GUI led.
3) Will I be able to do the training based on recorded files and text templates? I'd prefer that if possible, since I have a lot of recorded material already, and some of it has already been manually transcribed (which is what I want this to do for me in the long run)...
Thanks for any assistance!
Cheers,
Simon
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I managed to build sphinxbase and pocketsphinx, and it recognizes my voice with perhaps 30-50% accuracy. So, I need to improve this, and I started to look at sphinxtrain. I seem to have succeeded with this step, but I'm not able to run it.
I should note that when I run pocketsphinx, I manually set LD_LIBRARY_PATH to ensure that it finds the supporting elements.
Also, I'm making guesses on how to run sphinxtrain, as I have not found any documentation (If you can point me at that, that would be a good thin, I"m sure!)
So far, I have attempted to run <build>/scripts/sphinxtrain and also to run this with the argument "run".</build>
When I run it solo, it prints out a usage message (which is how I came to the argument "run"). When I run it with "run" it complains:
Failed to find sphinxtrain binaries. Check your installation
But I built it with:
./autogen.sh && make && sudo make install
(which is also how I built all the sphinx related binaries.
If I run it under strace, I see this error:
Can't open perl script "SPHINXTRAINDIR/scripts/000.comp_feat/make_feats.pl": No such file or directory
Can't open perl script "SPHINXTRAINDIR/scripts/000.comp_feat/make_feats.pl": No such file or directory
suggesting that somehow there's a ??variable?? sphinxtrain_dir that I need to set somehow (the scripts are present in the same subdirectory as the sphinxtrain script)
Anyway, assuming I get past this point, I would also like to understand:
1) How much training am I likely to have to give to get this up to 80-plus-percent accuracy? Is that likely to be possible by this route? If it's not going to get there, then I should probably not invest the time trying.
2) How do I train it? Will it be obvious when I get this command running? Since I don't see documentation, I'm a bit concerned that there might be some complex training process that I won't know about if it's not GUI led.
3) Will I be able to do the training based on recorded files and text templates? I'd prefer that if possible, since I have a lot of recorded material already, and some of it has already been manually transcribed (which is what I want this to do for me in the long run)...
Thanks for any assistance!
Cheers,
Simon
Hello Simon
First of all you need to provide the audio you are trying to recognize. Further steps on accuracy improvement depends on that.
I don't think you need to proceed with training before you get some good accuracy with your recordings.