CMU Sphinx / Forums / Help: Creating Turtorial for PocketSphinx on Windows Phone

Toine db - 2014-07-25

Hi Nickolay,

You asked me; "I hope you can help us to make us a demo like android one."
Of course, thats the least I could do after you helpt me with getting things to work.

But I have some questions, because I can't run the Androud version of Sphinx myself.

What does the demo do? (functionality/scenario)

What Sphinx source do you want me to use in the demo? (I only used the Trunk, but I don't know if there are any kind of stable versions?)

Hope to hear from you soI can start writing, or start continue researching demo functionalities.

Toine de Boer
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-07-25

What does the demo do? (functionality/scenario)

On start it displays "say oh mighty computer to activate" and listens for keyword "oh mighty computer". Once keyword occurs, it switches to grammar mode and recognizes digits from 0 to 10 and displays them on the screen. Then switches back to keyword search mode.

What Sphinx source do you want me to use in the demo? (I only used the Trunk, but I don't know if there are any kind of stable versions?)

Trunk of course.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-07-26
  
  OK, tnx for the scenario.
  
  I will start with getting just the basics to work, the "oh mighty computer" part.
  
  Then switching grammer and search mode, but I don't have a clue what that is...
  
  So I will be back, if you have any tips please don't hesitate....
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-07-26
  
  Already my first question:
  
  How to feed a continues pool of incomming microphone data?
  
  I'm looking at but think that isn't the way to do that : http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#decoding_audio_data_from_memory
  
  Can you give me some direction?
  
  I'm recording sounds:
  -16 bits
  -1 Channel
  -Samplerate 16000
  
  PS: incase you want to look at the code I'm testing with ...
  https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2132783 (27-7: added Native redorder)
  
  Last edit: Toine db 2014-07-27
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-07-30
  
  Can you (or someone else) give me hint what Methods I need to use in PocketSphinx to get your functionailties to work?
  
  recognize "oh mighty computer"
  
  switches to grammar mode
  
  recognizes digits from 0 to 10
  
  switch back to ??? mode
  
  I have constant stream of byte array from a 16bits 1channel 16k source.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-07-30

ps_init() ps_set_kws("kws", keyword); ps_set_jsgf("grammar", grammar); ps_set_search("kws"); ps_start_utt(); while (true) read_data(raw) ps_process_raw(data) if (ps_get_hyp().equals(keyword)) { ps_end_utt(); break; } } ps_set_search("grammar"); ps_start_utt(); while (true) read_data(raw) ps_process_raw(data) if (!ps_in_speech()) { ps_end_utt(); update_result(ps_get_hyp); break; } } ps_set_search("kws");
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-01
  
  Tnx, but this is still gone take a while because I need to discover each Method seperately.
  
  I'm not used to this kind of API documentation http://cmusphinx.sourceforge.net/doc/pocketsphinx/
  For example, what means a return value '1' from ps_set_kws?
  And are there any limits to the bytes you send in to ps_process_raw? do they need to be always the same etc....
  
  I was looking at the Android example, but the source where this happens is not accessible.
  
  Can you help some more with this?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-05
  
  Hello Nickolay,
  
  Can you help me with the following questions I have?
  (or have a source code as example, the Android code is closed when it gets intresting)
  
  what means a return value '1' from ps_set_kws?
  
  are there any limits to the bytes you send in to ps_process_raw?
  
  and do they need to be always the same?
  
  Hope to hear from you,
  
  and example source code would be fantastic if possible
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-08-05

what means a return value '1' from ps_set_kws?

Error occured. You can see details in the log. To store the log in filesystem you can add -logfn to decoder configuration

are there any limits to the bytes you send in to ps_process_raw?

No

and do they need to be always the same?

No

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-06
  
  Tnx for the feedback.
  
  But when I add -logfn (like below) I get the error "cannot redirect log output".
  
  config = cmd_ln_init(NULL, ps_args(), TRUE,
  "-hmm", hmmPath,
  "-lm", lmPath,
  "-dict", dictPath,
  "-mmap", "no",
  "-logfn", "",
  NULL);
  
  Am I missing some parameters?
  
  PS: I already used the following code that SOMETIMES producet an error log, but not always.
  const wchar_t wLogPath = Windows::Storage::ApplicationData::Current->LocalFolder->Path->Data();
  wcstombs(cpath, wLogPath, 1024);
  char logPath = concat(cpath, "\err.log");
  err_set_logfile(logPath);
  
  Hope to hear from you
  
  And thanks again for your support
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-08-06

"-logfn", "",

There should be filename here

PS: I already used the following code that SOMETIMES producet an error log, but not always.

This an alternative way. To make sure log remains after application exit, add fflush(stderr) call to the function in err.c which prints the message.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-14
  
  Tnx, I managed to get always a log but still there are problems.
  
  The log just stops at 7/8 kb.... https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2139023
  And the fflush(stderr) doesn't seem to do anything.
  
  I wait 10 seconds before closing the app or debug session, still the error log isn't full like I'm expecting.
  I'm expecting to find something from "ps_set_kws(ps, Cname, Ckeyphrase);"
  (I found the error itself but for development purposes I realy need a log)
  
  Do you have any ideas?
  
  PS: I put fflush(stderr) at the end of err_msg_system() and also as seperate method to raise directly from my code.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-08-14

PS: I put fflush(stderr) at the end of err_msg_system() and also as seperate method to raise directly from my code.

Not just err_msg_system but also err_msg. Or add fflush(fp) in err_logfp_cb.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-15
  
  Tnx, that works now.
  
  Some following questions about loading stuff:
  
  To load Grammar... is:
  
  int result = ps_set_jsgf_file(ps, Cname, CcompleteFilePath);
  
  the same as
  
  fsg_model_t *pNewFSGModel = jsgf_read_file(CcompleteFilePath, ps_get_logmath(ps), 6.5);
  int result = ps_set_fsg(ps, Cname, pNewFSGModel);
  
  ?
  
  And is
  
  ps_set_lm_file(ps, Cname, CcompleteFilePath);
  
  the way to load Language models like weather.dmp from the Android demo?
  
  PS: I'm trying to make the Android demo on Windows Phone
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-08-16

the same as

yes

the way to load Language models like weather.dmp from the Android demo?

yes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-18
  
  Tnx Nickolay,
  
  The Solution for the Windows Phone turtorial is making progress.
  
  Loading models, phrases etc and setting search type is working.
  
  Following (and last) is handling/processing incomming voice data
  
  I'll keep you informed
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2014-08-20
    
    Cool, I'd be glad to try it. Let me know if you need some help.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Toine db - 2014-08-23
      
      He Nickolay,
      
      I'm trying to process the bytes that are recorded from the microphone, and was hoping you could help me setup some simple realtime detection.
      
      int SpeechRecognizer::RegisterAudioBytes(const Platform::Array<uint8>^ audioBytes)
      {
      // source: http://blog.csdn.net/zouxy09/article/details/7978108
      int16 audioBuffer[4096];
      int32 k, ts, rem;
      char const hyp;
      char const uttid;
      char word[256];
      // Length for Int16[]
      k = audioBytes->Length / 2;
      // Convert ByteArray Array<uint8> to Int16[]
      for (size_t i = 0; i < audioBytes->Length; i += 2)
      {
      audioBuffer[i / 2] = audioBytes[i] + ((int16)audioBytes[i + 1] << 8);
      }
      // Proccess bytes
      int result = ps_process_raw(ps, audioBuffer, k, TRUE, FALSE);
      return result;
      }
      
      I began with the above, to procces a Byte array coming from a C# WP project. I'm not 100% sure that the conversion to In16{} is OK and if I'm using the ps_processRaw the write way.....
      
      I have looked to the True and False in ps_process_raw, but am not sure what and how to use it....
      
      Can you help me? (PS: the output of ps_process_raw always is 0, and the input of the method is a random filled array of 1280 and 960 bytes)
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-08-23

I began with the above, to procces a Byte array coming from a C# WP project. I'm not 100% sure that the conversion to In16{} is OK and if I'm using the ps_processRaw the write way.....

Looks mostly ok

int result = ps_process_raw(ps, audioBuffer, k, TRUE, FALSE);

It should be FALSE, FALSE

Can you help me? (PS: the output of ps_process_raw always is 0, and the input of the method is a random filled array of 1280 and 960 bytes)

This is because of TRUE (no_search argument in ps_process_raw). It must be FALSE, then ps will return the number of frames processed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-24
  
  OK, good to know.
  
  But now the big question; How do I detect words/phrases?
  
  1: Do I need to raise a Method after each ps_process_raw, or do I need to raise it once every X bytes, or....
  2: What will the result be? Just a word or sentence, collection of possible outcomes, collection of all outcomes with percentages ??..
  
  PS: I supose there isn't an easy event to hookup on :-)
  A event is realy what I want in the end, and I think I need to make myself.
  (I already placed an event in my code to raise, now only the mechanisme)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-08-24

1: Do I need to raise a Method after each ps_process_raw, or do I need to raise it once every X bytes, or....

I wrote you pseudocode above. After each ps_process_raw you call ps_get_hyp and if the result matches keyword you can proceed with further steps otherwise you process next chunk of raw data.

2: What will the result be? Just a word or sentence, collection of possible outcomes, collection of all outcomes with percentages ??..

In keyword spotting mode the result is a string containing keyword.

A event is realy what I want in the end, and I think I need to make myself.

Yes, that's up to you to design which events will the component emit.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-26
  
  Tnx for the response, friday I will continue on this.
  
  In keyword spotting mode the result is a string containing keyword.
  
  But before that, is there a location with some overview of the different Modes?
  Names, results etc....
  
  Last edit: Toine db 2014-08-29
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2014-08-29
  
  Hi Nickolay,
  
  First recognition finaly works! "The digits"
  
  But more and more question come up how the system/modes work and where I can find more info about how the different modes work.
  
  For example;
  
  When searching for digits each recignized digit gets added and added and added to the result of ps_get_hyp(). Is there a way to reset that?
  
  and/or
  
  (also for ps_get_hyp) And I see a Score as a result to? or ID? What can I do with that? is there a hidden list with possible outcomes maybe?
  
  and/or
  
  Maybe most important for the turtorial; I Start with recognizing woith Digits because I won't get any result when I say "Oh Mighty Computer". Could be my dutch dialect, or that I don't set the search the good way.... but nothing at ps_get_hyp
  
  For the last 'oh mighty computer' problem you can see my project at:
  https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2142806
  
  Hope you can help me again.
  
  PS: It was really great to see the digit thing already work!
  
  Last edit: Toine db 2014-08-29
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2014-08-29
    
    First recognition finaly works! "The digits"
    
    Great, congratulations
    
    When searching for digits each recignized digit gets added and added and added to the result of ps_get_hyp(). Is there a way to reset that?
    
    You can stop search (ps_end_utt) and start it again (ps_start_utt) when silence occurs (ps_is_speech) becomes false. I wrote you the pseudocode above.
    
    (also for ps_get_hyp) And I see a Score as a result to? or ID? What can I do with that? is there a hidden list with possible outcomes maybe?
    
    Score and outid are artificats, they are not really useful. You can ignore them.
    
    Maybe most important for the turtorial; I Start with recognizing woith Digits because I won't get any result when I say "Oh Mighty Computer". Could be my dutch dialect, or that I don't set the search the good way.... but nothing at ps_get_hyp
    
    You need to set keyword spotting threshold in config on initialization "-kws_threshold 1e-40".
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Toine db - 2014-08-30
      
      Tnx, all you tips worked great.
      
      can detect "oh mighty computer" and the digits :-)
      
      To completer your pseudocode, what do you mean with ps_set_search("grammar"); ???
      What is then in reference to the Android example, where I took the models from.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Creating Turtorial for PocketSphinx on Windows Phone

Speech Recognition Toolkit

Forums

Help

Creating Turtorial for PocketSphinx on Windows Phone document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Creating Turtorial for PocketSphinx on Windows Phone