Creating Turtorial for PocketSphinx on Windows Phone

Help
Toine db
2014-07-25
2014-09-29
1 2 > >> (Page 1 of 2)
  • Toine db

    Toine db - 2014-07-25

    Hi Nickolay,

    You asked me; "I hope you can help us to make us a demo like android one."
    Of course, thats the least I could do after you helpt me with getting things to work.

    But I have some questions, because I can't run the Androud version of Sphinx myself.

    • What does the demo do? (functionality/scenario)
    • What Sphinx source do you want me to use in the demo? (I only used the Trunk, but I don't know if there are any kind of stable versions?)

    Hope to hear from you soI can start writing, or start continue researching demo functionalities.

    Toine de Boer

     
  • Nickolay V. Shmyrev

    What does the demo do? (functionality/scenario)

    On start it displays "say oh mighty computer to activate" and listens for keyword "oh mighty computer". Once keyword occurs, it switches to grammar mode and recognizes digits from 0 to 10 and displays them on the screen. Then switches back to keyword search mode.

    What Sphinx source do you want me to use in the demo? (I only used the Trunk, but I don't know if there are any kind of stable versions?)

    Trunk of course.

     
    • Toine db

      Toine db - 2014-07-26

      OK, tnx for the scenario.

      I will start with getting just the basics to work, the "oh mighty computer" part.

      Then switching grammer and search mode, but I don't have a clue what that is...

      So I will be back, if you have any tips please don't hesitate....

       
    • Toine db

      Toine db - 2014-07-26

      Already my first question:

      How to feed a continues pool of incomming microphone data?

      I'm looking at but think that isn't the way to do that : http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#decoding_audio_data_from_memory

      Can you give me some direction?

      I'm recording sounds:
      -16 bits
      -1 Channel
      -Samplerate 16000

      PS: incase you want to look at the code I'm testing with ...
      https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2132783 (27-7: added Native redorder)

       
      Last edit: Toine db 2014-07-27
    • Toine db

      Toine db - 2014-07-30

      Can you (or someone else) give me hint what Methods I need to use in PocketSphinx to get your functionailties to work?

      • recognize "oh mighty computer"
      • switches to grammar mode
      • recognizes digits from 0 to 10
      • switch back to ??? mode

      I have constant stream of byte array from a 16bits 1channel 16k source.

       
  • Nickolay V. Shmyrev

    ps_init()
    
    ps_set_kws("kws", keyword);
    ps_set_jsgf("grammar", grammar);
    
    ps_set_search("kws");
    ps_start_utt();
    
    while (true)
          read_data(raw)
          ps_process_raw(data)
          if (ps_get_hyp().equals(keyword)) {
               ps_end_utt();
               break;
          }
    }
    
    ps_set_search("grammar");
    ps_start_utt();
    while (true)
          read_data(raw)
          ps_process_raw(data)
          if (!ps_in_speech()) {
               ps_end_utt();
               update_result(ps_get_hyp);
               break;
          }
    }
    
    ps_set_search("kws");
    
     
    • Toine db

      Toine db - 2014-08-01

      Tnx, but this is still gone take a while because I need to discover each Method seperately.

      I'm not used to this kind of API documentation http://cmusphinx.sourceforge.net/doc/pocketsphinx/
      For example, what means a return value '1' from ps_set_kws?
      And are there any limits to the bytes you send in to ps_process_raw? do they need to be always the same etc....

      I was looking at the Android example, but the source where this happens is not accessible.

      Can you help some more with this?

       
    • Toine db

      Toine db - 2014-08-05

      Hello Nickolay,

      Can you help me with the following questions I have?
      (or have a source code as example, the Android code is closed when it gets intresting)

      • what means a return value '1' from ps_set_kws?
      • are there any limits to the bytes you send in to ps_process_raw?
        • and do they need to be always the same?

      Hope to hear from you,

      and example source code would be fantastic if possible

       
  • Nickolay V. Shmyrev

    what means a return value '1' from ps_set_kws?

    Error occured. You can see details in the log. To store the log in filesystem you can add -logfn to decoder configuration

    are there any limits to the bytes you send in to ps_process_raw?

    No

    and do they need to be always the same?

    No

     
    • Toine db

      Toine db - 2014-08-06

      Tnx for the feedback.

      But when I add -logfn (like below) I get the error "cannot redirect log output".

      config = cmd_ln_init(NULL, ps_args(), TRUE,
      "-hmm", hmmPath,
      "-lm", lmPath,
      "-dict", dictPath,
      "-mmap", "no",
      "-logfn", "",
      NULL);

      Am I missing some parameters?

      PS: I already used the following code that SOMETIMES producet an error log, but not always.
      const wchar_t wLogPath = Windows::Storage::ApplicationData::Current->LocalFolder->Path->Data();
      wcstombs(cpath, wLogPath, 1024);
      char
      logPath = concat(cpath, "\err.log");
      err_set_logfile(logPath);

      Hope to hear from you

      And thanks again for your support

       
  • Nickolay V. Shmyrev

    "-logfn", "",

    There should be filename here

    PS: I already used the following code that SOMETIMES producet an error log, but not always.

    This an alternative way. To make sure log remains after application exit, add fflush(stderr) call to the function in err.c which prints the message.

     
    • Toine db

      Toine db - 2014-08-14

      Tnx, I managed to get always a log but still there are problems.

      The log just stops at 7/8 kb.... https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2139023
      And the fflush(stderr) doesn't seem to do anything.

      I wait 10 seconds before closing the app or debug session, still the error log isn't full like I'm expecting.
      I'm expecting to find something from "ps_set_kws(ps, Cname, Ckeyphrase);"
      (I found the error itself but for development purposes I realy need a log)

      Do you have any ideas?

      PS: I put fflush(stderr) at the end of err_msg_system() and also as seperate method to raise directly from my code.

       
  • Nickolay V. Shmyrev

    PS: I put fflush(stderr) at the end of err_msg_system() and also as seperate method to raise directly from my code.

    Not just err_msg_system but also err_msg. Or add fflush(fp) in err_logfp_cb.

     
    • Toine db

      Toine db - 2014-08-15

      Tnx, that works now.

      Some following questions about loading stuff:

      To load Grammar... is:

      int result = ps_set_jsgf_file(ps, Cname, CcompleteFilePath);

      the same as

      fsg_model_t *pNewFSGModel = jsgf_read_file(CcompleteFilePath, ps_get_logmath(ps), 6.5);
      int result = ps_set_fsg(ps, Cname, pNewFSGModel);

      ?

      And is

      ps_set_lm_file(ps, Cname, CcompleteFilePath);

      the way to load Language models like weather.dmp from the Android demo?

      PS: I'm trying to make the Android demo on Windows Phone

       
  • Nickolay V. Shmyrev

    the same as

    yes

    the way to load Language models like weather.dmp from the Android demo?

    yes

     
    • Toine db

      Toine db - 2014-08-18

      Tnx Nickolay,

      The Solution for the Windows Phone turtorial is making progress.

      Loading models, phrases etc and setting search type is working.

      Following (and last) is handling/processing incomming voice data

      I'll keep you informed

       
      • Nickolay V. Shmyrev

        Cool, I'd be glad to try it. Let me know if you need some help.

         
        • Toine db

          Toine db - 2014-08-23

          He Nickolay,

          I'm trying to process the bytes that are recorded from the microphone, and was hoping you could help me setup some simple realtime detection.

          int SpeechRecognizer::RegisterAudioBytes(const Platform::Array<uint8>^ audioBytes)
          {
          // source: http://blog.csdn.net/zouxy09/article/details/7978108
          int16 audioBuffer[4096];
          int32 k, ts, rem;
          char const hyp;
          char const
          uttid;
          char word[256];
          // Length for Int16[]
          k = audioBytes->Length / 2;
          // Convert ByteArray Array<uint8> to Int16[]
          for (size_t i = 0; i < audioBytes->Length; i += 2)
          {
          audioBuffer[i / 2] = audioBytes[i] + ((int16)audioBytes[i + 1] << 8);
          }
          // Proccess bytes
          int result = ps_process_raw(ps, audioBuffer, k, TRUE, FALSE);
          return result;
          }

          I began with the above, to procces a Byte array coming from a C# WP project. I'm not 100% sure that the conversion to In16{} is OK and if I'm using the ps_processRaw the write way.....

          I have looked to the True and False in ps_process_raw, but am not sure what and how to use it....

          Can you help me? (PS: the output of ps_process_raw always is 0, and the input of the method is a random filled array of 1280 and 960 bytes)

           
  • Nickolay V. Shmyrev

    I began with the above, to procces a Byte array coming from a C# WP project. I'm not 100% sure that the conversion to In16{} is OK and if I'm using the ps_processRaw the write way.....

    Looks mostly ok

    int result = ps_process_raw(ps, audioBuffer, k, TRUE, FALSE);

    It should be FALSE, FALSE

    Can you help me? (PS: the output of ps_process_raw always is 0, and the input of the method is a random filled array of 1280 and 960 bytes)

    This is because of TRUE (no_search argument in ps_process_raw). It must be FALSE, then ps will return the number of frames processed.

     
    • Toine db

      Toine db - 2014-08-24

      OK, good to know.

      But now the big question; How do I detect words/phrases?

      1: Do I need to raise a Method after each ps_process_raw, or do I need to raise it once every X bytes, or....
      2: What will the result be? Just a word or sentence, collection of possible outcomes, collection of all outcomes with percentages ??..

      PS: I supose there isn't an easy event to hookup on :-)
      A event is realy what I want in the end, and I think I need to make myself.
      (I already placed an event in my code to raise, now only the mechanisme)

       
  • Nickolay V. Shmyrev

    1: Do I need to raise a Method after each ps_process_raw, or do I need to raise it once every X bytes, or....

    I wrote you pseudocode above. After each ps_process_raw you call ps_get_hyp and if the result matches keyword you can proceed with further steps otherwise you process next chunk of raw data.

    2: What will the result be? Just a word or sentence, collection of possible outcomes, collection of all outcomes with percentages ??..

    In keyword spotting mode the result is a string containing keyword.

    A event is realy what I want in the end, and I think I need to make myself.

    Yes, that's up to you to design which events will the component emit.

     
    • Toine db

      Toine db - 2014-08-26

      Tnx for the response, friday I will continue on this.

      In keyword spotting mode the result is a string containing keyword.

      But before that, is there a location with some overview of the different Modes?
      Names, results etc....

       
      Last edit: Toine db 2014-08-29
    • Toine db

      Toine db - 2014-08-29

      Hi Nickolay,

      First recognition finaly works! "The digits"

      But more and more question come up how the system/modes work and where I can find more info about how the different modes work.

      For example;

      When searching for digits each recignized digit gets added and added and added to the result of ps_get_hyp(). Is there a way to reset that?

      and/or

      (also for ps_get_hyp) And I see a Score as a result to? or ID? What can I do with that? is there a hidden list with possible outcomes maybe?

      and/or

      Maybe most important for the turtorial; I Start with recognizing woith Digits because I won't get any result when I say "Oh Mighty Computer". Could be my dutch dialect, or that I don't set the search the good way.... but nothing at ps_get_hyp

      For the last 'oh mighty computer' problem you can see my project at:
      https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2142806

      Hope you can help me again.

      PS: It was really great to see the digit thing already work!

       
      Last edit: Toine db 2014-08-29
      • Nickolay V. Shmyrev

        First recognition finaly works! "The digits"

        Great, congratulations

        When searching for digits each recignized digit gets added and added and added to the result of ps_get_hyp(). Is there a way to reset that?

        You can stop search (ps_end_utt) and start it again (ps_start_utt) when silence occurs (ps_is_speech) becomes false. I wrote you the pseudocode above.

        (also for ps_get_hyp) And I see a Score as a result to? or ID? What can I do with that? is there a hidden list with possible outcomes maybe?

        Score and outid are artificats, they are not really useful. You can ignore them.

        Maybe most important for the turtorial; I Start with recognizing woith Digits because I won't get any result when I say "Oh Mighty Computer". Could be my dutch dialect, or that I don't set the search the good way.... but nothing at ps_get_hyp

        You need to set keyword spotting threshold in config on initialization "-kws_threshold 1e-40".

         
        • Toine db

          Toine db - 2014-08-30

          Tnx, all you tips worked great.

          can detect "oh mighty computer" and the digits :-)

          To completer your pseudocode, what do you mean with ps_set_search("grammar"); ???
          What is then in reference to the Android example, where I took the models from.

           
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks