CMU Sphinx / Forums / Help: Quick and dirty brainless Linux step-by-step tutorial for wav to text English conversion

c1672140 - 2014-12-17

I am a student of sociology and my group needs to convert, as part of a group project, a wav file to text. We are not Linux gurus (Kubuntu 14.04) and probably never will be. We are not speech recognition gurus and probably never will be. We just want to convert a file from wav to text quick and dirty.

We can't find a step-by-step cut-and-paste simple brainless tutorial on how to just install the darn thing, on Linux.

Here is our best attempt at making that tutorial but it fails, we think, because we can't figure out WHERE to put the English library files, and how then to make the call for the conversion.

Would you kindly explain our error?

HOW TO CONVERT A WAV TO TEXT ON LINUX (KUBUNTU 14.04):
REF: http://sourceforge.net/projects/cmusphinx/

SUMMARY:
You apparently need three things:
1. sphinxbase
2. pocketsphinx
3. a language database (English is all we care about)
4. wav file (e.g., /usr/share/sounds/alsa/Front_Center.wav)

Obtain sphinxbase-0.8 from http://cmusphinx.sourceforge.net/wiki/download/
Namely: http://sourceforge.net/projects/cmusphinx/files/sphinxbase/0.8
NOTE: It would be preferable to have a "wget" command inserted here.
$ tar -xvzf sphinxbase-0.8.tar.gz
$ mv /tmp/spinxbase-0.8 /tmp/sphinxbase
$ cd /tmp/sphinxbase
$ sudo apt-get install libtool bison (dependencies that we had needed)
$ view README
$ view INSTALL
$ ./autogen.sh
$ ./configure
$ make check (optional)
$ make installcheck (optional)
$ make clean all (if this is not the 1st time)
$ make
$ su root
# make install

Obtain pocketsphinx from http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/0.8/
NOTE: It would be preferable to have a "wget" command inserted here.
$ tar -xvzf pocketsphinx-0.8.tar.gz
$ mv /tmp/pocketsphinx-0.8 /tmp/pocketsphinx
$ cd /tmp/pocketsphinx
$ view README
$ view INSTALL
$ ./autogen.sh
$ ./configure
$ make check (optional)
$ make test
$ make installcheck (optional)
$ make clean all (if this is not the 1st time)
$ make
$ su root
# make install

Obtain US English generic acoustic models:
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/

Note: One or more of these English databases seem to be needed:
$ tar -xvzf /tmp/en-us.tar.gz
$ tar -xvzf /tmp/en-us-8khz.tar.gz
$ tar -xvzf /tmp/en-us-semi.tar.gz
$ tar -xvzf /tmp/en-us-semi-full.tar.gz
NOTE: We just want a brainless pick of what will work on a "hello world" style file.

Convert a WAV file to text on Linux:
This seems to be needed because ours seems to have installed in a different place than the compiled binary is looking for it (which is odd but we're ok since it's a simple step):
$ sudo ln -s /usr/local/share/pocketsphinx /usr/share/pocketsphinx

Here are some universal "hello world" style test files:
$ cp /usr/share/sounds/alsa/Front_Center.wav file1.wav
$ cp /usr/share/sounds/alsa/Front_Right.wav file2.wav
$ cp /usr/share/sounds/alsa/Rear_Right.wav file3.wav

This is the best we can come up with to date for a basic dumb first-time-ever running of the program on the simplest of all test files:
$ pocketsphinx_continuous -infile file1.wav -hmm en-us -lm en-us.lm.dmp 2> pocketsphinx.log

Unfortunately, that command fails every single time, mostly, we think, because we have no guidance as to where to find the thing called "en-us" (which is in /tmp/) and the thing called "en-us.lm.dmp" which doesn't seem to exist yet).

We can't find a decent tutorial that just says what to do (nothing else is desired or needed). No choices. Just do this. Do that. And it will work, is what we want. (We are not gurus and don't even want to be gurus. We just want it to work.)

From the man pages:
Note: The -hmm directory and -dict file arguments are always required.
Note: Either -lm or -fsg is required, depending on whether you are using a statistical language model or a finite-state grammar.

What step are we doing wrong for a quick and dirty brainless Linux Kubuntu 14.04 installation and "hello world" style simplest-possible test run?

Last edit: c1672140 2014-12-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-17
  
  Steps looks ok, what is the problem you have?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Alexander Solovets - 2014-12-17
    
    The values provided by -dict, -hmm and -lm arguments should point to
    existing files or directories. At least your command is missing the
    -dict option, and I'm not sure about validity of other paths. In
    general, it should be -dict <dictionary-file> (usually comes with the
    language model and has .dict extension), -hmm <audio-model-directory>
    (the one where you extracted en-us-semi.tar.gz), -lm
    <language-model-file> (has either .lm or .dmp extension; the generic
    English one is https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/).
    
    On Thu, Dec 18, 2014 at 5:57 AM, Nickolay V. Shmyrev
    nshmyrev@users.sf.net wrote:
    
    Steps looks ok, what is the problem you have?
    
    Quick and dirty brainless Linux step-by-step tutorial for wav to text English conversion
    
    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cmusphinx/discussion/help/
    
    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
    
    --
    Sincerely, Alexander
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

c1672140 - 2014-12-17

Thank you Nickolay V. Shmyrev, for looking at the steps, and especially thank you Alexander for conveying that we are missing the basic-English files that the command needs.

Arbitrary wav file can be created easily with the following command:
$ arecord -f S16_LE -r 8000 -D default > record.wav

Audio Model directory we will use is: -hmm /tmp/en-us-semi-full/
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model

Dictionaly file, we don't have one yet: -dict /tmp/????

Language Model file, we don't have one yet: -lm /tmp/????

So, we're stuck in that we don't know where to find the most basic of dictionary or language model files for English on the sourceforge site.

Where do we get these two missing files on the net for basic English?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-17
  
  Arbitrary wav file can be created easily with the following command:
  $ arecord -f S16_LE -r 8000 -D default > record.wav
  
  It must be -r 16000
  
  Audio Model directory we will use is: -hmm /tmp/en-us-semi-full/
  https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model
  
  You need to use en-us, not en-us-semi-full
  
  Dictionaly file, we don't have one yet: -dict /tmp/????
  
  You can use
  
  https://raw.githubusercontent.com/cmusphinx/pocketsphinx/master/model/lm/en_US/cmu07a.dic
  
  Language Model file, we don't have one yet: -lm /tmp/????
  
  You can take
  
  http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

c1672140 - 2014-12-17

Hi Alexander,
Thank you for correcting the four items and for your patience, in that we are very new to this process so we realize our questions are very basic as we just want a "hello world" test to work for sociology students.

WAV file: -infile /tmp/record.wav
We can create our own English language wav file using the following command:
$ arecord -f S16_LE -r 16000 -D default > /tmp/record.wav
Or, we can find an existing WAV file on our Linux system:
$ cp /usr/share/sounds/alsa/Front_Center.wav /tmp/record.wav

Audio Model directory: -hmm /tmp/en-us/
Obtain and unpack that "en_us" directory from:
$ wget http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz

Dictionary file: -dict /tmp/cmu07a.dic
Obtain that "cmu07a.dic" file from:
$ wget https://raw.githubusercontent.com/cmusphinx/pocketsphinx/master/model/lm/en_US/cmu07a.dic

Language Model file: -lm cmusphinx-5.0-en-us.lm.dmp
Obtain that "cmusphinx-5.0-en-us.lm.dmp" file from:
$ wget http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp
$ wget http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.gz
Note: I am not sure why there are two different files but the latter file above unpacks to a single ASCII "cmusphinx-5.0-en-us.lm" file.

Create a WAV file:
$ cp /usr/share/sounds/alsa/Front_Center.wav /tmp/record.wav

Run the program:
$ pocketsphinx_continuous -infile /tmp/record.wav -hmm /tmp/en-us -dict /tmp/cmu07a.dic -lm /tmp/cmusphinx-5.0-en-us.lm.dmp 2> pocketsphinx.log
000000000: only (should be "Front")
000000001: fisherman (should be "Center")

Does this seem like I performed the task correctly?

Last edit: c1672140 2014-12-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-21
  
  Does this seem like I performed the task correctly?
  
  No, you did a mistake with the file
  
  Or, we can find an existing WAV file on our Linux system:
  $ cp /usr/share/sounds/alsa/Front_Center.wav /tmp/record.wav
  
  You can not take file as is, you need to resample the file to 16khz mono:
  
  sox /usr/share/sounds/alsa/Front_Center.wav -r 16000 -c 1 /tmp/record.wav
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Quick and dirty brainless Linux step-by-step tutorial for wav to text...

Speech Recognition Toolkit

Forums

Help

Quick and dirty brainless Linux step-by-step tutorial for wav to text English conversion

Quick and dirty brainless Linux step-by-step tutorial for wav to text...

Speech Recognition Toolkit

Forums

Help

Quick and dirty brainless Linux step-by-step tutorial for wav to text English conversion document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Quick and dirty brainless Linux step-by-step tutorial for wav to text English conversion