CMU Sphinx / Forums / Help: Questions about the use of PocketSphinx in Ct

Paul Rolin - 2015-07-27

Hi again,

Thank you for your help on Adapting Acoustic Model. I have some questions but as they are not in the same topic, I thought it was better to create a new topic. But feel free to merge the two topics if you prefer.

I'm trying to use the PocketSphinx "library" written in C so that I will be able to use with LUA code. Basically, I would like to "create" only a few methods and some will merge Pocketsphinx's functions. However, I have some basic questions reading the code :

1) I don't really understand the difference between :

char const *ps_get_hyp(ps_decoder_t *ps, int32 *out_best_score) char const *ps_get_hyp_final(ps_decoder_t *ps, int32 *out_is_final) char const *kws_search_hyp(ps_search_t * search, int32 * out_score, int32 * out_is_final); // and all_phone_search etc...

Which one should I use to get the result of the decoder at the end of an utterance ?

2) I don't really understand how to create a new decoder with an Acoustic Model (given its path) and a Dictionary (given its path). I think I have to use this function :

cmd_ln_t *cmd_ln_parse_r(cmd_ln_t *inout_cmdln, arg_t const *defn, int32 argc, char *argv[], int32 strict );

and then do

ps_init(config)

But I have no idea how to implement it. For example, I would like to give the path of the Acoustic Model and of the Dictionary.
For now, the only solution I found was to use

cmd_ln_set_str_r(config, "-hmm", hmmpath);

and

cmd_ln_set_str_r(config, "-dict", dictpath);

Maybe it is the right way to do it, but I don't understand the interest of cmd_ln_parse_r.

I also saw that we could do :

cmd_ln_t *cmd_ln_init(cmd_ln_t *inout_cmdln, arg_t const *defn, int32 strict, ...);

So I just don't know what's the right way to do it.

Thanks in advance for your help,
Paul

Last edit: Paul Rolin 2015-07-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-27
  
  I don't really understand the difference between :
  char const ps_get_hyp(ps_decoder_t ps, int32 out_best_score)
  char const ps_get_hyp_final(ps_decoder_t ps, int32 out_is_final)
  
  Those two functions are described in API docs, first returns the hypothesis and score, second hypothesis and final flag. You can use either of them according to your needs.
  
  char const *kws_search_hyp(ps_search_t * search, int32 * out_score,
  int32 * out_is_final); // and all_phone_search etc...
  
  This is an internal function and it is not present in public headers. You can not use it
  
  I don't really understand how to create a new decoder with an Acoustic Model (given its path) and a Dictionary (given its path).
  
  You can use cmd_ln_init instead of cmd_ln_parse to quickly create config from set of string paramters:
  
  config = cmd_ln_init(NULL, ps_args(), TRUE, "-hmm", MODELDIR "/en-us/en-us", "-lm", MODELDIR "/en-us/en-us.lm.bin", "-dict", MODELDIR "/en-us/cmudict-en-us.dict", NULL);
  
  All those functions are covered and explained in Pocketsphinx tutorial:
  
  http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx
  
  Please review it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Rolin - 2015-07-27

Ok thank you for your answer. I had seen the tutorial but as I also read the files, I didn't know exactly which functions to use.

I still don't really understand what you mean by score and final flag. (I read the tutorial and the API but I still don't understand, sorry...)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-27
  
  I still don't really understand what you mean by score and final flag.
  
  Score is a log probability of the match between the acoustic model and the audio. It is not really useful just present for historical reasons.
  
  Grammar recognition result is final when it fully match the grammar. For example if grammar is:
  
  public <result> = hello world;
  
  then result "hello" is partial. Result "hello world" is final because it fully matches the grammar. The final flag tells you if result is final.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Rolin - 2015-07-27

Ok that's clear :) Thank you very much !

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hi again,

I have a new question concerning how to recuperate the audio stream from Android and iOS.
Indeed, I've seen that in pokcetsphinx, there is the "ad" functions :

/**
 * Open a specific audio device for recording.
 *
 * The device is opened in non-blocking mode and placed in idle state.
 *
 * @return pointer to read-only ad_rec_t structure if successful, NULL
 * otherwise.  The return value to be used as the first argument to
 * other recording functions.
 */
SPHINXBASE_EXPORT
ad_rec_t *ad_open_dev (
    const char *dev, /**< Device name (platform-specific) */
    int32 samples_per_sec /**< Samples per second */
    );

/**
 * Open the default audio device with a given sampling rate.
 */
SPHINXBASE_EXPORT
ad_rec_t *ad_open_sps (
           int32 samples_per_sec /**< Samples per second */
           );


/**
 * Open the default audio device.
 */
SPHINXBASE_EXPORT
ad_rec_t *ad_open ( void );

However, I've also seen that on the android-demo, you use directly the function from Android :

 public void startRecording()

So I would like to know if I could use the Android/iOS functions to do all the recording and reading things and give the buffers to the pocketsphinx functions.

For example, with the 2 functions from Android :

 public void startRecording()

and

/**
 * Reads audio data from the audio hardware for recording into a direct buffer. If this buffer
 * is not a direct buffer, this method will always return 0.
 * Note that the value returned by {@link java.nio.Buffer#position()} on this buffer is
 * unchanged after a call to this method.
 * @param audioBuffer the direct buffer to which the recorded audio data is written.
 * @param sizeInBytes the number of requested bytes.
 * @return the number of bytes that were read or or {@link #ERROR_INVALID_OPERATION}
 *    if the object wasn't properly initialized, or {@link #ERROR_BAD_VALUE} if
 *    the parameters don't resolve to valid data and indexes.
 *    The number of bytes will not exceed sizeInBytes.
 */
public int read(ByteBuffer audioBuffer, int sizeInBytes) {
    if (mState != STATE_INITIALIZED) {
        return ERROR_INVALID_OPERATION;
    }

    if ( (audioBuffer == null) || (sizeInBytes < 0) ) {
        return ERROR_BAD_VALUE;
    }

    return native_read_in_direct_buffer(audioBuffer, sizeInBytes);
}

And then give the data to the pocketsphinx function :

ps_process_raw(ps, adbuf, k, FALSE, FALSE);

with

adbuf = audioBuffer (from the read function from Android)
k = sizeInBytes (from the read function from Android)

Can I do that ? Or should I use the "ad" functions and they would be to do the same both on Android and iOS with :

ad_rec_t *ad_open_dev (
const char *dev, /**< Device name (platform-specific) */
int32 samples_per_sec /**< Samples per second */
);

What would be the device name btw (for Android for example) ?

Thanks a lot.
Paul

Nickolay V. Shmyrev - 2015-07-28

Ad functions are not supported neither on iOS nor on Android. You have to record data with Android tools and pass it to decoder with ps_process_raw, same for iOS. This approach is demonstrated in demo.

You can also implement ad in Android with OpenSL ES if you want to record audio without Java.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Rolin - 2015-07-30

Ok thank you.

I have a new question :

In the file continuous.c, you call ps_end_utt then ps_get_hyp.
In the Android demo, it seems that you first call get_hyp (Hypothesis var4 = SpeechRecognizer.this.decoder.hyp(); ) then you call endUtt.

Is there a reason for that (maybe the difference between onPartialResult and onResult ?) ? Or am I wrong ?

And so another question : do we have to call end_utt after/before getting any hypothesis ?

Thanks
Paul

Last edit: Paul Rolin 2015-07-30

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-31
  
  You can retrieve a current hypothesis any time, before or after utterance is over.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Rolin - 2015-08-19

Hello again,

I'm closed to the end but I still have a problem concerning the setting up of the decoder.

My code is C (in fact the main code is in LUA but I call C code in it) and it seems that the "ps_init" method is a kind of "blocking" (ie, when it's called nothing is done then).

Here is my code :

ps_decoder_t *ps; cmd_ln_t *config; void InitDecoder (char *path_acmod, char *path_dict) { config = cmd_ln_init(NULL, ps_args(), FALSE, "-hmm", path_acmod, "-dict", path_dict, NULL); ps = ps_init(config); print("BLOP"); }

And then I do :

InitDecoder(pathofmyacmod, pathofmydict)

When I call the ps_init, the code is "stopped", anything after it is not executed (for ex, BLOP is never printed). Do you have a explanation ?

Thanks

Last edit: Paul Rolin 2015-08-19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-08-19
  
  You can find more details in log output printed on console when you run this code.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ok so here is the code I execute (in LUA) :

speechRecognizer:initDecoder(MOAIEnvironment.rootDirectory..'r/acmod', MOAIEnvironment.rootDirectory..'r/en.dict')

initDecoder is :

void InitDecoder (char *path_acmod, char *path_dict) {
config = cmd_ln_init(NULL, ps_args(), FALSE, "-hmm", path_acmod, "-dict", path_dict, NULL);
ps = ps_init(config);
print("BLOP");
}

And in my logcat, I have :

08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx﹕ INFO: cmd_ln.c(697): Parsing command line:
08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx﹕ \
08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx﹕ -hmm
08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx﹕ /bundle/assets/lua/r/acmod
08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx﹕ \
08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx﹕ -dict
08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx﹕ /bundle/assets/lua/r/en.dict
08-19 18:16:39.351  12382-12382/com.plumzi.app.beta I/cmusphinx﹕ [ 08-19 18:16:39.351 12382:12382 I/cmusphinx ]
Current configuration:
08-19 18:16:46.363   1277-12352/? V/ConfigFetchTask﹕ ConfigFetchTask getDeviceDataVersionInfo(): ABFEt1UkwWhRUA_pbJpo9FqkPkKBmjcLzECx7Ge9zi_fzD_jDHRDLz9XxKc9VNrCspx-VaYu2NhHkki7Oyn3RmbvqzgijLoI9MxzQ6PyrWrsIiXWrx4_3O0P7bTggVnxaQMzvpmYVGyRu5HyCEu2RcyZcGV5JmYuTdlkiawaqgaOm8G2h4Fsd3vrQIdz-5aXnJYdTImk2hOAfchz8WGk8krmijOdifTxeKlXjvJIOjlSOxbolR7fRvgbyMCSPwJaw4OEMeS5qeb19ndIYyEc2sx9tN6-c-5YSRaRR1KpyYjwsNfs3UZ17f4QEij8CXCYflyG5so8XpSqZsvlLti3IqjLj-ngIDWzeEvqrzn1fE2xXWnTO9paZPs
08-19 18:16:46.366   1277-12352/? I/GoogleURLConnFactory﹕ Using platform SSLCertificateSocketFactory

Okay, if I do :

 ps = ps_init(cmd_ln_init(NULL, ps_args(), FALSE, NULL));

The log output is :

INFO: cmd_ln.c(697): Parsing command line:
08-19 18:44:41.063    2654-2654/com.plumzi.app.beta I/cmusphinx﹕ [ 08-19 18:44:41.064  2654: 2654 I/cmusphinx ]
Current configuration:
08-19 18:44:41.098    2654-2654/com.plumzi.app.beta I/cmusphinx﹕ INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
08-19 18:44:41.098    2654-2654/com.plumzi.app.beta I/cmusphinx﹕ INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
08-19 18:44:41.098    2654-2654/com.plumzi.app.beta E/cmusphinx﹕ ERROR: "acmod.c", line 80: Acoustic model definition is not specified either with -mdef option or with -hmm

But the program continues. (I know that the error comes from the fact that the program doesn't know where are the files but it continues whereas if I specify the path, it doesn't)

However, if I do :

ps_config = cmd_ln_init(NULL, ps_args(), FALSE, "-hmm", path_acmod, "-dict", path_dict, NULL);
ps = ps_init(ps_config);

The log output is :

08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx﹕ INFO: cmd_ln.c(697): Parsing command line:
08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx﹕ \
08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx﹕ -hmm
08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx﹕ /bundle/assets/lua/r/acmod
08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx﹕ \
08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx﹕ -dict
08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx﹕ /bundle/assets/lua/r/en.dict
08-19 18:48:20.507    6029-6029/com.plumzi.app.beta I/cmusphinx﹕ [ 08-19 18:48:20.507  6029: 6029 I/cmusphinx ]
Current configuration:

And the program never stops.

In fact, my only solution is to use a AsyncTask in Android (that calls a function init in C that only does ps_init(config) ) in order to have something non-blocking but I think it can be done differently (for example in the continuous.c file, there is a ps_init(config) )

Last edit: Paul Rolin 2015-08-19

Questions about the use of PocketSphinx in Ct

Speech Recognition Toolkit

Forums

Help

Questions about the use of PocketSphinx in Ct document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Questions about the use of PocketSphinx in Ct