Menu

About using PocketSphinx

Help
2008-04-21
2012-09-22
  • Eloy Garces

    Eloy Garces - 2008-04-21

    Hello,

    We are an NGO that offers cultural resources to disabled people and we would like to congratulate you on the development of your software voice recognition for its’ speed and flexibility.

    We are developing an application based on the integration of Asterisk and PocketSphinx. To do it we have relied on the guide at http://www.syednetworks.com/asterisk-integration-with-sphinx-voice-recognition-system#more-71 (although this page is not available anymore), with the changes necessary to adapt it to PocketSphinx.

    In this integration process me have found impossible to use some modules, like Speech::Recognizer::SPX, that facilitates the bridge between Asterisk and Sphinx. Not being able to install version 0.9, adapted theoretically for PocketSphinx, we have had to carry out the integration in the following way:

    • We use the AGI client written in Pearl that provides the web adapting the code to our needs. However, the part that is in charge of the communication with the server program continues to be the same: A socket, which we use to pass the wav file, captured by Asterisk.

    • We have programmed in C, where we link with the client using a socket that receives the audio from the client. This server uses the PocketSphinx libraries "fbs.h", "s2types.h", "err.h", "ad.h" and "cont_ad.h". After receiving the full content of the socket, we send it to PocketSphinx with the following parameters:

                        {"pocketsphinx_continuous",
                        "-live","0,
                        "-samprate","8000",
                        "-adcin","1",
                        "-ctloffset","0",
                        "-ctlcount","100000000",
                        "-cepdir","/usr/local/pocketsphinx/share/pocketsphinx/model/lm/meivox",
                        "-cepext",".wav",
                        "-agc","max",
                        "-beam","1e-20",
                        "-lponlybeam","7e-29",
                        "-fwdflatbeam","1e-64",
                        "-fwdflatwbeam","7e-29",
                        "-lpbeam","1e-40",
                        "-pbeam","1e-48",
                        "-wbeam","7e-29",
                        "-hmm", "/usr/local/pocketsphinx/share/pocketsphinx/model/hmm/wsj1",
                        "-lm", "/usr/local/pocketsphinx-0.4.1/share/pocketsphinx/model/lm/meivox/meivoxDiari.lm",
                       "-dict","/usr/local/pocketsphinx-0.4.1/share/pocketsphinx/model/lm/meivox/meivoxDiari.dic"};
    

    • We have used the grammar used obtained from the web http://www.speech.cs.cmu.edu/tools/lmtool.html .

    Our goal is to recognize a word among a vocabulary of 10 words. After some tests, we have found that it works quite well. However, we have many doubts about the optimization of the system:

    1. Would you recommend putting small phonetic variants in the grammar?
    2. How do the parameters that we use affect during the initialization?
    3. Is there some formula to determine the optimal values based on something?
    4. Are we focusing correctly our solution?

    Summarizing what we try is to enhance this aspect and it would be very important to have the advisement of the group that has developed this technology. Maybe some changes in the parameters or the use of some other type of grammar would increase the performance, but without the necessary knowledge, it results very difficult.
    What we try with this application is to give cultural support to a sector of the population so unflavored as the disabled people offering them reading controlled by voice.
    Looking forward to hearing from you, we greatly thank you for your interest and collaboration.

    Eloy Garces

    AIDES

     
    • Nickolay V. Shmyrev

      Hi Eloy, it's amazing you've choosen pocketsphinx for your applications. I hope you'll made a great progress with it.

      About docs, the proper docs are located here:

      http://www.voip-info.org/wiki-Sphinx

      The perl module from cmusphinx svn must be compatible with pocketsphinx and work at least. Another minor note:

      > "-agc","max",

      It's better to use "-agc none" I suppose.

      But actually there are much more fundamental problems here. The whole idea of the interface as described in the docs present in the network is bad. The AGI usage is a plain hack that makes recognition almost unusable. The integration must be done completely differently. If you want a good example of ASR engine integration, I suggest you to look on lumenvox and Asterisk's Generic Speech API.

      Proper tasks on integration with asterisk will include:

      1. Design a user interface.
      2. Implement generic speech api using pocketsphinx
      3. Collect a database of test samples to test performance of your recognizer.
      4. Implement confidence score matching and add a bit of language understanding for endpointing.
      5. Join everything in a single application.

      It looks like a rather large work but actually it isn't I think it's a month or two project. Only if you'll complete these steps you'll be able to build a reliable interface. The present one using AGI is just ugly. If you'd like to discuss the details, you are welcome. First of all we need more information on your applications I suppose. How exactly are you going to control dialplan?

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.