Menu

Questions about the use of PocketSphinx in Ct

Help
Paul Rolin
2015-07-27
2015-08-19
  • Paul Rolin

    Paul Rolin - 2015-07-27

    Hi again,

    Thank you for your help on Adapting Acoustic Model. I have some questions but as they are not in the same topic, I thought it was better to create a new topic. But feel free to merge the two topics if you prefer.

    I'm trying to use the PocketSphinx "library" written in C so that I will be able to use with LUA code. Basically, I would like to "create" only a few methods and some will merge Pocketsphinx's functions. However, I have some basic questions reading the code :

    1) I don't really understand the difference between :

    char const *ps_get_hyp(ps_decoder_t *ps, int32 *out_best_score)
    char const *ps_get_hyp_final(ps_decoder_t *ps, int32 *out_is_final)
    char const *kws_search_hyp(ps_search_t * search, int32 * out_score,
                           int32 * out_is_final); // and  all_phone_search etc...
    

    Which one should I use to get the result of the decoder at the end of an utterance ?

    2) I don't really understand how to create a new decoder with an Acoustic Model (given its path) and a Dictionary (given its path). I think I have to use this function :

    cmd_ln_t *cmd_ln_parse_r(cmd_ln_t *inout_cmdln,                                                     
                         arg_t const *defn, 
                         int32 argc,        
                         char *argv[],      
                         int32 strict                                                        
    );
    

    and then do

    ps_init(config)
    

    But I have no idea how to implement it. For example, I would like to give the path of the Acoustic Model and of the Dictionary.
    For now, the only solution I found was to use

    cmd_ln_set_str_r(config, "-hmm", hmmpath);
    

    and

    cmd_ln_set_str_r(config, "-dict", dictpath);
    

    Maybe it is the right way to do it, but I don't understand the interest of cmd_ln_parse_r.

    I also saw that we could do :

     cmd_ln_t *cmd_ln_init(cmd_ln_t *inout_cmdln, arg_t const *defn, int32 strict, ...);
    

    So I just don't know what's the right way to do it.

    Thanks in advance for your help,
    Paul

     

    Last edit: Paul Rolin 2015-07-27
    • Nickolay V. Shmyrev

      I don't really understand the difference between :
      char const ps_get_hyp(ps_decoder_t ps, int32 out_best_score)
      char const
      ps_get_hyp_final(ps_decoder_t ps, int32 out_is_final)

      Those two functions are described in API docs, first returns the hypothesis and score, second hypothesis and final flag. You can use either of them according to your needs.

      char const *kws_search_hyp(ps_search_t * search, int32 * out_score,
      int32 * out_is_final); // and all_phone_search etc...

      This is an internal function and it is not present in public headers. You can not use it

      I don't really understand how to create a new decoder with an Acoustic Model (given its path) and a Dictionary (given its path).

      You can use cmd_ln_init instead of cmd_ln_parse to quickly create config from set of string paramters:

              config =
                  cmd_ln_init(NULL, ps_args(), TRUE,
                      "-hmm", MODELDIR "/en-us/en-us",
                      "-lm", MODELDIR "/en-us/en-us.lm.bin",
                      "-dict", MODELDIR "/en-us/cmudict-en-us.dict",
                      NULL);
      

      All those functions are covered and explained in Pocketsphinx tutorial:

      http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx

      Please review it.

       
  • Paul Rolin

    Paul Rolin - 2015-07-27

    Ok thank you for your answer. I had seen the tutorial but as I also read the files, I didn't know exactly which functions to use.

    I still don't really understand what you mean by score and final flag. (I read the tutorial and the API but I still don't understand, sorry...)

     
    • Nickolay V. Shmyrev

      I still don't really understand what you mean by score and final flag.

      Score is a log probability of the match between the acoustic model and the audio. It is not really useful just present for historical reasons.

      Grammar recognition result is final when it fully match the grammar. For example if grammar is:

          public <result> = hello world;
      

      then result "hello" is partial. Result "hello world" is final because it fully matches the grammar. The final flag tells you if result is final.

       
  • Paul Rolin

    Paul Rolin - 2015-07-27

    Ok that's clear :) Thank you very much !

     
  • Paul Rolin

    Paul Rolin - 2015-07-28

    Hi again,

    I have a new question concerning how to recuperate the audio stream from Android and iOS.
    Indeed, I've seen that in pokcetsphinx, there is the "ad" functions :

    /**
     * Open a specific audio device for recording.
     *
     * The device is opened in non-blocking mode and placed in idle state.
     *
     * @return pointer to read-only ad_rec_t structure if successful, NULL
     * otherwise.  The return value to be used as the first argument to
     * other recording functions.
     */
    SPHINXBASE_EXPORT
    ad_rec_t *ad_open_dev (
        const char *dev, /**< Device name (platform-specific) */
        int32 samples_per_sec /**< Samples per second */
        );
    
    /**
     * Open the default audio device with a given sampling rate.
     */
    SPHINXBASE_EXPORT
    ad_rec_t *ad_open_sps (
               int32 samples_per_sec /**< Samples per second */
               );
    
    
    /**
     * Open the default audio device.
     */
    SPHINXBASE_EXPORT
    ad_rec_t *ad_open ( void );
    

    However, I've also seen that on the android-demo, you use directly the function from Android :

     public void startRecording()
    

    So I would like to know if I could use the Android/iOS functions to do all the recording and reading things and give the buffers to the pocketsphinx functions.

    For example, with the 2 functions from Android :

     public void startRecording()
    

    and

    /**
     * Reads audio data from the audio hardware for recording into a direct buffer. If this buffer
     * is not a direct buffer, this method will always return 0.
     * Note that the value returned by {@link java.nio.Buffer#position()} on this buffer is
     * unchanged after a call to this method.
     * @param audioBuffer the direct buffer to which the recorded audio data is written.
     * @param sizeInBytes the number of requested bytes.
     * @return the number of bytes that were read or or {@link #ERROR_INVALID_OPERATION}
     *    if the object wasn't properly initialized, or {@link #ERROR_BAD_VALUE} if
     *    the parameters don't resolve to valid data and indexes.
     *    The number of bytes will not exceed sizeInBytes.
     */
    public int read(ByteBuffer audioBuffer, int sizeInBytes) {
        if (mState != STATE_INITIALIZED) {
            return ERROR_INVALID_OPERATION;
        }
    
        if ( (audioBuffer == null) || (sizeInBytes < 0) ) {
            return ERROR_BAD_VALUE;
        }
    
        return native_read_in_direct_buffer(audioBuffer, sizeInBytes);
    }
    

    And then give the data to the pocketsphinx function :

    ps_process_raw(ps, adbuf, k, FALSE, FALSE);
    

    with

    adbuf = audioBuffer (from the read function from Android)
    k = sizeInBytes (from the read function from Android)
    

    Can I do that ? Or should I use the "ad" functions and they would be to do the same both on Android and iOS with :

    ad_rec_t *ad_open_dev (
    const char *dev, /**< Device name (platform-specific) */
    int32 samples_per_sec /**< Samples per second */
    );
    

    What would be the device name btw (for Android for example) ?

    Thanks a lot.
    Paul

     
    • Nickolay V. Shmyrev

      Ad functions are not supported neither on iOS nor on Android. You have to record data with Android tools and pass it to decoder with ps_process_raw, same for iOS. This approach is demonstrated in demo.

      You can also implement ad in Android with OpenSL ES if you want to record audio without Java.

       
  • Paul Rolin

    Paul Rolin - 2015-07-30

    Ok thank you.

    I have a new question :

    In the file continuous.c, you call ps_end_utt then ps_get_hyp.
    In the Android demo, it seems that you first call get_hyp (Hypothesis var4 = SpeechRecognizer.this.decoder.hyp(); ) then you call endUtt.

    Is there a reason for that (maybe the difference between onPartialResult and onResult ?) ? Or am I wrong ?

    And so another question : do we have to call end_utt after/before getting any hypothesis ?

    Thanks
    Paul

     

    Last edit: Paul Rolin 2015-07-30
    • Nickolay V. Shmyrev

      You can retrieve a current hypothesis any time, before or after utterance is over.

       
  • Paul Rolin

    Paul Rolin - 2015-08-19

    Hello again,

    I'm closed to the end but I still have a problem concerning the setting up of the decoder.

    My code is C (in fact the main code is in LUA but I call C code in it) and it seems that the "ps_init" method is a kind of "blocking" (ie, when it's called nothing is done then).

    Here is my code :

    ps_decoder_t *ps;
    cmd_ln_t *config;
    
    void InitDecoder (char *path_acmod, char *path_dict) {
    
    config = cmd_ln_init(NULL, ps_args(), FALSE, "-hmm", path_acmod, "-dict", path_dict, NULL);
    ps = ps_init(config);
    
    print("BLOP");
    
    }
    

    And then I do :

       InitDecoder(pathofmyacmod, pathofmydict)
    

    When I call the ps_init, the code is "stopped", anything after it is not executed (for ex, BLOP is never printed). Do you have a explanation ?

    Thanks

     

    Last edit: Paul Rolin 2015-08-19
    • Nickolay V. Shmyrev

      You can find more details in log output printed on console when you run this code.

       
  • Paul Rolin

    Paul Rolin - 2015-08-19

    Ok so here is the code I execute (in LUA) :

    speechRecognizer:initDecoder(MOAIEnvironment.rootDirectory..'r/acmod', MOAIEnvironment.rootDirectory..'r/en.dict')
    

    initDecoder is :

    void InitDecoder (char *path_acmod, char *path_dict) {
    config = cmd_ln_init(NULL, ps_args(), FALSE, "-hmm", path_acmod, "-dict", path_dict, NULL);
    ps = ps_init(config);
    print("BLOP");
    }
    

    And in my logcat, I have :

    08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx INFO: cmd_ln.c(697): Parsing command line:
    08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx \
    08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx -hmm
    08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx /bundle/assets/lua/r/acmod
    08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx \
    08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx -dict
    08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx /bundle/assets/lua/r/en.dict
    08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx [ 08-19 18:16:39.351 12382:12382 I/cmusphinx ]
    Current configuration:
    08-19 18:16:46.363   1277-12352/? V/ConfigFetchTask ConfigFetchTask getDeviceDataVersionInfo(): ABFEt1UkwWhRUA_pbJpo9FqkPkKBmjcLzECx7Ge9zi_fzD_jDHRDLz9XxKc9VNrCspx-VaYu2NhHkki7Oyn3RmbvqzgijLoI9MxzQ6PyrWrsIiXWrx4_3O0P7bTggVnxaQMzvpmYVGyRu5HyCEu2RcyZcGV5JmYuTdlkiawaqgaOm8G2h4Fsd3vrQIdz-5aXnJYdTImk2hOAfchz8WGk8krmijOdifTxeKlXjvJIOjlSOxbolR7fRvgbyMCSPwJaw4OEMeS5qeb19ndIYyEc2sx9tN6-c-5YSRaRR1KpyYjwsNfs3UZ17f4QEij8CXCYflyG5so8XpSqZsvlLti3IqjLj-ngIDWzeEvqrzn1fE2xXWnTO9paZPs
    08-19 18:16:46.366   1277-12352/? I/GoogleURLConnFactory Using platform SSLCertificateSocketFactory
    
     
  • Paul Rolin

    Paul Rolin - 2015-08-19

    Okay, if I do :

     ps = ps_init(cmd_ln_init(NULL, ps_args(), FALSE, NULL));
    

    The log output is :

    INFO: cmd_ln.c(697): Parsing command line:
    08-19 18:44:41.063    2654-2654/com.plumzi.app.beta I/cmusphinx﹕ [ 08-19 18:44:41.064  2654: 2654 I/cmusphinx ]
    Current configuration:
    08-19 18:44:41.098    2654-2654/com.plumzi.app.beta I/cmusphinx INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    08-19 18:44:41.098    2654-2654/com.plumzi.app.beta I/cmusphinx INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    08-19 18:44:41.098    2654-2654/com.plumzi.app.beta E/cmusphinx ERROR: "acmod.c", line 80: Acoustic model definition is not specified either with -mdef option or with -hmm
    

    But the program continues. (I know that the error comes from the fact that the program doesn't know where are the files but it continues whereas if I specify the path, it doesn't)

    However, if I do :

    ps_config = cmd_ln_init(NULL, ps_args(), FALSE, "-hmm", path_acmod, "-dict", path_dict, NULL);
    ps = ps_init(ps_config);
    

    The log output is :

    08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx INFO: cmd_ln.c(697): Parsing command line:
    08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx \
    08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx -hmm
    08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx /bundle/assets/lua/r/acmod
    08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx \
    08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx -dict
    08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx /bundle/assets/lua/r/en.dict
    08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx [ 08-19 18:48:20.507  6029: 6029 I/cmusphinx ]
    Current configuration:
    

    And the program never stops.

    In fact, my only solution is to use a AsyncTask in Android (that calls a function init in C that only does ps_init(config) ) in order to have something non-blocking but I think it can be done differently (for example in the continuous.c file, there is a ps_init(config) )

     

    Last edit: Paul Rolin 2015-08-19

Log in to post a comment.