CMU Sphinx / Forums / Help: Old vs new API

Robert Nagy - 2015-10-06

My current task is to implement SpeechToText on Ubuntu 14.04.
I installed pocketsphinx from packages (python-pocketsphinx pocketsphinx-hmm-en-hub4wsj pocketsphinx-lm-en-hub4) and installed cmuclmtk-0.7 (in order to be able to build the dictionary).
corpus.txt, mklm.sh, mkdict.sh, vc.lm and vc.dic can be found in the attached zip. Note that cmu07a.dic also needs to be present in this folder, since mkdict.sh uses it. (copy it from /usr/share/pocketsphinx/model/lm/en_US)
For the cmu-toolkit:
LD_LIBRARY_PATH=/usr/local/lib
export LD_LIBRARY_PATH

I created the language model(vc.lm) and the dictionary(vc.dic) with:
sh mklm.sh
sh mkdict.sh

Change /home/rob/CMUSphinx/vc.lm and /home/.../vc.dic in cmupyt.py to point to the vc.lm and vc.dic files on your computer.

The wav-files are all in the acmds folder (bigger, browser, center, close, down, email, keyboard, left, music, open, play, right, select, smaller, stop, tv, up)
For example:
python ./cmupyt.py acmds/browser.wav
correctly prints: "browser"
The old API (pocketsphinx installed from packages correclty recognized all the words (in the wav-files))

The problem is that the new API (installed from source) doesn't recognize "down", "open", "play", "stop", "up". It gives back an empy string for "down", "open" and "play". It returns "stop" for "select" and "stop" for "up".
I believe it should be at least as good as the old API, so I must be doing something wrong.
I know that it is better to have words with more that 3 syllables, but the old API recognizes all of them.

++++++New API (installation)
install: autoconf automake libtool bison python-dev swig git libasound2-dev

mkdir voice_recognition
cd voice_recognition

git clone git://github.com/cmusphinx/sphinxbase.git
cd sphinxbase
./autogen.sh
make
sudo make install
cd ..
git clone git://github.com/cmusphinx/pocketsphinx.git
cd pocketsphinx
./autogen.sh
make
sudo make install
cd ..

acustic model, dict and lm are in /usr/local/share/pocketsphinx/model/en-us

new.py is included in the attachement.
Copy vc.lm and vc.dic to /usr/local/share/pocketsphinx/model/en-us

+++Usage:
python ./new.py acmds/browser.wav

I like better the new API and it is also recommened to use the new one.
Any help is appreciated.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Nagy - 2015-10-06

The scipts, and wav-files.

AAFORUM.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Nagy - 2015-10-06

I forgot to mention that I correctly uninstalled the old API before installing the new one.
It is "python ./new.py browser.wav".

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-10-06
  
  Your audio files are too loud, decoder needs some time to adapt to the volume of the speech it receives. If you decode all your files sequentially without resetting the decoder or just if you set -cmninit value to 66 in en-us/feat.params it will decode all your utterances properly.
  
  Your files are even clipped to the maximum value, you better reduce recording level and also noise has to be reduced.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Nagy - 2015-10-07

Thank you for your answer Nickolay.
I will try them.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Nagy - 2015-10-07

Reducing audio volume (to 50%) solved the problem. Now it works very well.
Of course, noise can still be eliminated with the following steps:
1. Fourier transform
2. set the low and high frequencies to zero in the array
3. Inverse Fourier

Maybe audio-level should be adjusted between steps 2 and 3 (decibel) by simply multiplying the amplitude of the frequencies.

Again, thank you for your help.
CMU-Sphinx (pocketsphinx) is wonderful.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-10-15
  
  Of course, noise can still be eliminated with the following steps:
  
  This has no effect and is harmful for accuracy. Pocketsphinx does filtering inside by itself and it simply does not consider frequencies you filter.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Nagy - 2015-10-15

Thank you for your answer, Nickolay.
I misunderstood your previous comment: "also noise has to be reduced".
I thought you meant to apply filtering.
Now it is clear.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Old vs new API

Speech Recognition Toolkit

Forums

Help

Old vs new API document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Old vs new API