CMU Sphinx / Forums / Help: Problem Converting TTS audio back to text

Victor Biro - 2017-12-22

Hi,

I have a use of pocketsphinx that is giving me some problems. There are multiple parts, each of which may contribute to the failure to successfully create accurate text output.

The wav files are generated from a Software Defined Radio (SDR) using an application that uses GNU Radio (https://www.gnuradio.org/) called Trunk Recorder (https://github.com/robotastic/trunk-recorder) The files created are 16bit 8000 hz LittleEndian wav files.

The 'file' command tells me this:

1401-1513771892_7.70856e+08.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz

I have tried feeding the files in using the command with the 8k model:

pocketsphinx_continuous -infile file.wav -samprate 8000 -hmm /lib/sphinx/hmm/en-8k/

I have also tried converting to 16bit, 16khz using FFMPEG with the command:

ffmpeg -i infile.wav -acodec pcm_s16le -ac 1 -ar 16000 outfile.wav

Due to the origin of the file (GNURadio), that it was originally a digital audio signal (as opposed to audio captured by a mic), the artificail voice, and converting of the file format, I feel as though I am missing something that may be obvious to someone else.

BTW, I have tried adapting a language model, and that didn't seem to do anything to improve results.

I can't seem to attach sample of the wav file.

Any thoughts?
Victor
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-12-24
  
  You need to share the file
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Victor Biro - 2017-12-26

Nickolay,

My apologies. I wanted to when I originally posted, but didn't see an opportunity to. I do now.

Attached is a file that is recognised as:

"for for for","to eat to you hear it in your shoes during eighty five good for a new piece he talks friends are fast pack corn","fool fool pool","where que forty forty forty one for forty find comfort for thirty two she forty three he had collapsed and and kept the direction of highway for twenty seven the north end and after eat well west pac warm

While there are some correct words, it is not correct.

Victor

Last edit: Victor Biro 2017-12-26

1401-1514147120_7.71356e+08.wav
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-12-29
  
  Default model is not supposed to decode this but with enough adaptation data (30mins) and 8khz acoustic model it should be 100% accurate.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Problem Converting TTS audio back to text

Speech Recognition Toolkit

Forums

Help

Problem Converting TTS audio back to text document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Problem Converting TTS audio back to text