I am a student of sociology and my group needs to convert, as part of a group project, a wav file to text. We are not Linux gurus (Kubuntu 14.04) and probably never will be. We are not speech recognition gurus and probably never will be. We just want to convert a file from wav to text quick and dirty.
We can't find a step-by-step cut-and-paste simple brainless tutorial on how to just install the darn thing, on Linux.
Here is our best attempt at making that tutorial but it fails, we think, because we can't figure out WHERE to put the English library files, and how then to make the call for the conversion.
SUMMARY:
You apparently need three things:
1. sphinxbase
2. pocketsphinx
3. a language database (English is all we care about)
4. wav file (e.g., /usr/share/sounds/alsa/Front_Center.wav)
Obtain sphinxbase-0.8 from http://cmusphinx.sourceforge.net/wiki/download/
Namely: http://sourceforge.net/projects/cmusphinx/files/sphinxbase/0.8
NOTE: It would be preferable to have a "wget" command inserted here.
$ tar -xvzf sphinxbase-0.8.tar.gz
$ mv /tmp/spinxbase-0.8 /tmp/sphinxbase
$ cd /tmp/sphinxbase
$ sudo apt-get install libtool bison (dependencies that we had needed)
$ view README
$ view INSTALL
$ ./autogen.sh
$ ./configure
$ make check (optional)
$ make installcheck (optional)
$ make clean all (if this is not the 1st time)
$ make
$ su root
# make install
Obtain pocketsphinx from http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/0.8/
NOTE: It would be preferable to have a "wget" command inserted here.
$ tar -xvzf pocketsphinx-0.8.tar.gz
$ mv /tmp/pocketsphinx-0.8 /tmp/pocketsphinx
$ cd /tmp/pocketsphinx
$ view README
$ view INSTALL
$ ./autogen.sh
$ ./configure
$ make check (optional)
$ make test
$ make installcheck (optional)
$ make clean all (if this is not the 1st time)
$ make
$ su root
# make install
Note: One or more of these English databases seem to be needed:
$ tar -xvzf /tmp/en-us.tar.gz
$ tar -xvzf /tmp/en-us-8khz.tar.gz
$ tar -xvzf /tmp/en-us-semi.tar.gz
$ tar -xvzf /tmp/en-us-semi-full.tar.gz
NOTE: We just want a brainless pick of what will work on a "hello world" style file.
Convert a WAV file to text on Linux:
This seems to be needed because ours seems to have installed in a different place than the compiled binary is looking for it (which is odd but we're ok since it's a simple step):
$ sudo ln -s /usr/local/share/pocketsphinx /usr/share/pocketsphinx
Here are some universal "hello world" style test files:
$ cp /usr/share/sounds/alsa/Front_Center.wav file1.wav
$ cp /usr/share/sounds/alsa/Front_Right.wav file2.wav
$ cp /usr/share/sounds/alsa/Rear_Right.wav file3.wav
This is the best we can come up with to date for a basic dumb first-time-ever running of the program on the simplest of all test files:
$ pocketsphinx_continuous -infile file1.wav -hmm en-us -lm en-us.lm.dmp 2> pocketsphinx.log
Unfortunately, that command fails every single time, mostly, we think, because we have no guidance as to where to find the thing called "en-us" (which is in /tmp/) and the thing called "en-us.lm.dmp" which doesn't seem to exist yet).
We can't find a decent tutorial that just says what to do (nothing else is desired or needed). No choices. Just do this. Do that. And it will work, is what we want. (We are not gurus and don't even want to be gurus. We just want it to work.)
From the man pages:
Note: The -hmm directory and -dict file arguments are always required.
Note: Either -lm or -fsg is required, depending on whether you are using a statistical language model or a finite-state grammar.
What step are we doing wrong for a quick and dirty brainless Linux Kubuntu 14.04 installation and "hello world" style simplest-possible test run?
Last edit: c1672140 2014-12-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The values provided by -dict, -hmm and -lm arguments should point to
existing files or directories. At least your command is missing the
-dict option, and I'm not sure about validity of other paths. In
general, it should be -dict <dictionary-file> (usually comes with the
language model and has .dict extension), -hmm <audio-model-directory>
(the one where you extracted en-us-semi.tar.gz), -lm
<language-model-file> (has either .lm or .dmp extension; the generic
English one is https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/).
Thank you Nickolay V. Shmyrev, for looking at the steps, and especially thank you Alexander for conveying that we are missing the basic-English files that the command needs.
Arbitrary wav file can be created easily with the following command:
$ arecord -f S16_LE -r 8000 -D default > record.wav
Hi Alexander,
Thank you for correcting the four items and for your patience, in that we are very new to this process so we realize our questions are very basic as we just want a "hello world" test to work for sociology students.
WAV file: -infile /tmp/record.wav
We can create our own English language wav file using the following command:
$ arecord -f S16_LE -r 16000 -D default > /tmp/record.wav
Or, we can find an existing WAV file on our Linux system:
$ cp /usr/share/sounds/alsa/Front_Center.wav /tmp/record.wav
Create a WAV file:
$ cp /usr/share/sounds/alsa/Front_Center.wav /tmp/record.wav
Run the program:
$ pocketsphinx_continuous -infile /tmp/record.wav -hmm /tmp/en-us -dict /tmp/cmu07a.dic -lm /tmp/cmusphinx-5.0-en-us.lm.dmp 2> pocketsphinx.log
000000000: only (should be "Front")
000000001: fisherman (should be "Center")
Does this seem like I performed the task correctly?
Last edit: c1672140 2014-12-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am a student of sociology and my group needs to convert, as part of a group project, a wav file to text. We are not Linux gurus (Kubuntu 14.04) and probably never will be. We are not speech recognition gurus and probably never will be. We just want to convert a file from wav to text quick and dirty.
We can't find a step-by-step cut-and-paste simple brainless tutorial on how to just install the darn thing, on Linux.
Here is our best attempt at making that tutorial but it fails, we think, because we can't figure out WHERE to put the English library files, and how then to make the call for the conversion.
Would you kindly explain our error?
HOW TO CONVERT A WAV TO TEXT ON LINUX (KUBUNTU 14.04):
REF: http://sourceforge.net/projects/cmusphinx/
SUMMARY:
You apparently need three things:
1. sphinxbase
2. pocketsphinx
3. a language database (English is all we care about)
4. wav file (e.g., /usr/share/sounds/alsa/Front_Center.wav)
Obtain sphinxbase-0.8 from http://cmusphinx.sourceforge.net/wiki/download/
Namely: http://sourceforge.net/projects/cmusphinx/files/sphinxbase/0.8
NOTE: It would be preferable to have a "wget" command inserted here.
$ tar -xvzf sphinxbase-0.8.tar.gz
$ mv /tmp/spinxbase-0.8 /tmp/sphinxbase
$ cd /tmp/sphinxbase
$ sudo apt-get install libtool bison (dependencies that we had needed)
$ view README
$ view INSTALL
$ ./autogen.sh
$ ./configure
$ make check (optional)
$ make installcheck (optional)
$ make clean all (if this is not the 1st time)
$ make
$ su root
# make install
Obtain pocketsphinx from http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/0.8/
NOTE: It would be preferable to have a "wget" command inserted here.
$ tar -xvzf pocketsphinx-0.8.tar.gz
$ mv /tmp/pocketsphinx-0.8 /tmp/pocketsphinx
$ cd /tmp/pocketsphinx
$ view README
$ view INSTALL
$ ./autogen.sh
$ ./configure
$ make check (optional)
$ make test
$ make installcheck (optional)
$ make clean all (if this is not the 1st time)
$ make
$ su root
# make install
Obtain US English generic acoustic models:
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/
Note: One or more of these English databases seem to be needed:
$ tar -xvzf /tmp/en-us.tar.gz
$ tar -xvzf /tmp/en-us-8khz.tar.gz
$ tar -xvzf /tmp/en-us-semi.tar.gz
$ tar -xvzf /tmp/en-us-semi-full.tar.gz
NOTE: We just want a brainless pick of what will work on a "hello world" style file.
This seems to be needed because ours seems to have installed in a different place than the compiled binary is looking for it (which is odd but we're ok since it's a simple step):
$ sudo ln -s /usr/local/share/pocketsphinx /usr/share/pocketsphinx
Here are some universal "hello world" style test files:
$ cp /usr/share/sounds/alsa/Front_Center.wav file1.wav
$ cp /usr/share/sounds/alsa/Front_Right.wav file2.wav
$ cp /usr/share/sounds/alsa/Rear_Right.wav file3.wav
This is the best we can come up with to date for a basic dumb first-time-ever running of the program on the simplest of all test files:
$ pocketsphinx_continuous -infile file1.wav -hmm en-us -lm en-us.lm.dmp 2> pocketsphinx.log
Unfortunately, that command fails every single time, mostly, we think, because we have no guidance as to where to find the thing called "en-us" (which is in /tmp/) and the thing called "en-us.lm.dmp" which doesn't seem to exist yet).
We can't find a decent tutorial that just says what to do (nothing else is desired or needed). No choices. Just do this. Do that. And it will work, is what we want. (We are not gurus and don't even want to be gurus. We just want it to work.)
From the man pages:
Note: The -hmm directory and -dict file arguments are always required.
Note: Either -lm or -fsg is required, depending on whether you are using a statistical language model or a finite-state grammar.
What step are we doing wrong for a quick and dirty brainless Linux Kubuntu 14.04 installation and "hello world" style simplest-possible test run?
Last edit: c1672140 2014-12-17
Steps looks ok, what is the problem you have?
The values provided by -dict, -hmm and -lm arguments should point to
existing files or directories. At least your command is missing the
-dict option, and I'm not sure about validity of other paths. In
general, it should be -dict <dictionary-file> (usually comes with the
language model and has .dict extension), -hmm <audio-model-directory>
(the one where you extracted en-us-semi.tar.gz), -lm
<language-model-file> (has either .lm or .dmp extension; the generic
English one is https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/).
On Thu, Dec 18, 2014 at 5:57 AM, Nickolay V. Shmyrev
nshmyrev@users.sf.net wrote:
--
Sincerely, Alexander
Thank you Nickolay V. Shmyrev, for looking at the steps, and especially thank you Alexander for conveying that we are missing the basic-English files that the command needs.
$ arecord -f S16_LE -r 8000 -D default > record.wav
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model
So, we're stuck in that we don't know where to find the most basic of dictionary or language model files for English on the sourceforge site.
Where do we get these two missing files on the net for basic English?
It must be
-r 16000
You need to use en-us, not en-us-semi-full
You can use
https://raw.githubusercontent.com/cmusphinx/pocketsphinx/master/model/lm/en_US/cmu07a.dic
You can take
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download
Hi Alexander,
Thank you for correcting the four items and for your patience, in that we are very new to this process so we realize our questions are very basic as we just want a "hello world" test to work for sociology students.
WAV file: -infile /tmp/record.wav
We can create our own English language wav file using the following command:
$ arecord -f S16_LE -r 16000 -D default > /tmp/record.wav
Or, we can find an existing WAV file on our Linux system:
$ cp /usr/share/sounds/alsa/Front_Center.wav /tmp/record.wav
Audio Model directory: -hmm /tmp/en-us/
Obtain and unpack that "en_us" directory from:
$ wget http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz
Dictionary file: -dict /tmp/cmu07a.dic
Obtain that "cmu07a.dic" file from:
$ wget https://raw.githubusercontent.com/cmusphinx/pocketsphinx/master/model/lm/en_US/cmu07a.dic
Language Model file: -lm cmusphinx-5.0-en-us.lm.dmp
Obtain that "cmusphinx-5.0-en-us.lm.dmp" file from:
$ wget http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp
$ wget http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.gz
Note: I am not sure why there are two different files but the latter file above unpacks to a single ASCII "cmusphinx-5.0-en-us.lm" file.
Create a WAV file:
$ cp /usr/share/sounds/alsa/Front_Center.wav /tmp/record.wav
Run the program:
$ pocketsphinx_continuous -infile /tmp/record.wav -hmm /tmp/en-us -dict /tmp/cmu07a.dic -lm /tmp/cmusphinx-5.0-en-us.lm.dmp 2> pocketsphinx.log
000000000: only (should be "Front")
000000001: fisherman (should be "Center")
Does this seem like I performed the task correctly?
Last edit: c1672140 2014-12-18
No, you did a mistake with the file
You can not take file as is, you need to resample the file to 16khz mono:
sox /usr/share/sounds/alsa/Front_Center.wav -r 16000 -c 1 /tmp/record.wav