Menu

Old vs new API

Help
2015-10-06
2015-10-15
  • Robert Nagy

    Robert Nagy - 2015-10-06

    My current task is to implement SpeechToText on Ubuntu 14.04.
    I installed pocketsphinx from packages (python-pocketsphinx pocketsphinx-hmm-en-hub4wsj pocketsphinx-lm-en-hub4) and installed cmuclmtk-0.7 (in order to be able to build the dictionary).
    corpus.txt, mklm.sh, mkdict.sh, vc.lm and vc.dic can be found in the attached zip. Note that cmu07a.dic also needs to be present in this folder, since mkdict.sh uses it. (copy it from /usr/share/pocketsphinx/model/lm/en_US)
    For the cmu-toolkit:
    LD_LIBRARY_PATH=/usr/local/lib
    export LD_LIBRARY_PATH

    I created the language model(vc.lm) and the dictionary(vc.dic) with:
    sh mklm.sh
    sh mkdict.sh

    Change /home/rob/CMUSphinx/vc.lm and /home/.../vc.dic in cmupyt.py to point to the vc.lm and vc.dic files on your computer.

    The wav-files are all in the acmds folder (bigger, browser, center, close, down, email, keyboard, left, music, open, play, right, select, smaller, stop, tv, up)
    For example:
    python ./cmupyt.py acmds/browser.wav
    correctly prints: "browser"
    The old API (pocketsphinx installed from packages correclty recognized all the words (in the wav-files))

    The problem is that the new API (installed from source) doesn't recognize "down", "open", "play", "stop", "up". It gives back an empy string for "down", "open" and "play". It returns "stop" for "select" and "stop" for "up".
    I believe it should be at least as good as the old API, so I must be doing something wrong.
    I know that it is better to have words with more that 3 syllables, but the old API recognizes all of them.

    ++++++New API (installation)
    install: autoconf automake libtool bison python-dev swig git libasound2-dev

    mkdir voice_recognition
    cd voice_recognition

    git clone git://github.com/cmusphinx/sphinxbase.git
    cd sphinxbase
    ./autogen.sh
    make
    sudo make install
    cd ..
    git clone git://github.com/cmusphinx/pocketsphinx.git
    cd pocketsphinx
    ./autogen.sh
    make
    sudo make install
    cd ..

    acustic model, dict and lm are in /usr/local/share/pocketsphinx/model/en-us

    new.py is included in the attachement.
    Copy vc.lm and vc.dic to /usr/local/share/pocketsphinx/model/en-us

    +++Usage:
    python ./new.py acmds/browser.wav

    I like better the new API and it is also recommened to use the new one.
    Any help is appreciated.

     
  • Robert Nagy

    Robert Nagy - 2015-10-06

    The scipts, and wav-files.

     
  • Robert Nagy

    Robert Nagy - 2015-10-06

    I forgot to mention that I correctly uninstalled the old API before installing the new one.
    It is "python ./new.py browser.wav".

     
    • Nickolay V. Shmyrev

      Your audio files are too loud, decoder needs some time to adapt to the volume of the speech it receives. If you decode all your files sequentially without resetting the decoder or just if you set -cmninit value to 66 in en-us/feat.params it will decode all your utterances properly.

      Your files are even clipped to the maximum value, you better reduce recording level and also noise has to be reduced.

       
  • Robert Nagy

    Robert Nagy - 2015-10-07

    Thank you for your answer Nickolay.
    I will try them.

     
  • Robert Nagy

    Robert Nagy - 2015-10-07

    Reducing audio volume (to 50%) solved the problem. Now it works very well.
    Of course, noise can still be eliminated with the following steps:
    1. Fourier transform
    2. set the low and high frequencies to zero in the array
    3. Inverse Fourier

    Maybe audio-level should be adjusted between steps 2 and 3 (decibel) by simply multiplying the amplitude of the frequencies.

    Again, thank you for your help.
    CMU-Sphinx (pocketsphinx) is wonderful.

     
    • Nickolay V. Shmyrev

      Of course, noise can still be eliminated with the following steps:

      This has no effect and is harmful for accuracy. Pocketsphinx does filtering inside by itself and it simply does not consider frequencies you filter.

       
  • Robert Nagy

    Robert Nagy - 2015-10-15

    Thank you for your answer, Nickolay.
    I misunderstood your previous comment: "also noise has to be reduced".
    I thought you meant to apply filtering.
    Now it is clear.

     

Log in to post a comment.