I have a use of pocketsphinx that is giving me some problems. There are multiple parts, each of which may contribute to the failure to successfully create accurate text output.
The wav files are generated from a Software Defined Radio (SDR) using an application that uses GNU Radio (https://www.gnuradio.org/) called Trunk Recorder (https://github.com/robotastic/trunk-recorder) The files created are 16bit 8000 hz LittleEndian wav files.
Due to the origin of the file (GNURadio), that it was originally a digital audio signal (as opposed to audio captured by a mic), the artificail voice, and converting of the file format, I feel as though I am missing something that may be obvious to someone else.
BTW, I have tried adapting a language model, and that didn't seem to do anything to improve results.
I can't seem to attach sample of the wav file.
Any thoughts?
Victor
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My apologies. I wanted to when I originally posted, but didn't see an opportunity to. I do now.
Attached is a file that is recognised as:
"for for for","to eat to you hear it in your shoes during eighty five good for a new piece he talks friends are fast pack corn","fool fool pool","where que forty forty forty one for forty find comfort for thirty two she forty three he had collapsed and and kept the direction of highway for twenty seven the north end and after eat well west pac warm
While there are some correct words, it is not correct.
Hi,
I have a use of pocketsphinx that is giving me some problems. There are multiple parts, each of which may contribute to the failure to successfully create accurate text output.
The wav files are generated from a Software Defined Radio (SDR) using an application that uses GNU Radio (https://www.gnuradio.org/) called Trunk Recorder (https://github.com/robotastic/trunk-recorder) The files created are 16bit 8000 hz LittleEndian wav files.
The 'file' command tells me this:
I have tried feeding the files in using the command with the 8k model:
I have also tried converting to 16bit, 16khz using FFMPEG with the command:
Due to the origin of the file (GNURadio), that it was originally a digital audio signal (as opposed to audio captured by a mic), the artificail voice, and converting of the file format, I feel as though I am missing something that may be obvious to someone else.
BTW, I have tried adapting a language model, and that didn't seem to do anything to improve results.
I can't seem to attach sample of the wav file.
Any thoughts?
Victor
You need to share the file
Nickolay,
My apologies. I wanted to when I originally posted, but didn't see an opportunity to. I do now.
Attached is a file that is recognised as:
While there are some correct words, it is not correct.
Victor
Last edit: Victor Biro 2017-12-26
Default model is not supposed to decode this but with enough adaptation data (30mins) and 8khz acoustic model it should be 100% accurate.