Menu

Using Sphinx for Hey You, Pikachu! Emulation

Help
Daniel Eck
2015-04-16
2015-05-06
  • Daniel Eck

    Daniel Eck - 2015-04-16

    Forum thread with extra details

    I'm planning on using Sphinx for an upcoming HLE attempt at emulating the N64 game Hey You, Pikachu!, basically, the game has a list of 640 phrases that can be used in-game, however, many of them are duplicates for varying pronunciations, so the compressed list(duplicates removed) comes out to 459 phrases, 151 of which are Pokemon names.

    The issue with emulating this is that the original game came with a mic and a Voice Recognition Unit(VRU) that no one has figured out how to emulate. Rather than trying to emulate the circuitry, I decided that it'd be simpler to just use modern voice recognition technology and map the various voice commands through it to the output of the original VRU for each command.

    What I would like to know is which functionality of CMU Sphinx(Pocketsphinx, really) would be ideal for this kind of work, is it keyword recognition, or something more complex?

     
  • Daniel Eck

    Daniel Eck - 2015-04-16

    Wow! Thanks for the quick response!

    I noticed in the tutorial it says "live audio input is somewhat platform-specific," is there a tutorial which goes into cross-platform support for live audio, or perhaps a library that abstracts it away from the OS? Ideally I'm looking for something that can handle cross-platform mic abstraction as well, assuming such a thing exists.

    One more question: Will the library already detect non-English words, specifically Pokemon names? If not, how do I go about adding them in? Would using a simple English web service for building the DB work in my case?

    Very helpful, very fast so far, really appreciate it :D

     

    Last edit: Daniel Eck 2015-04-16
    • Nickolay V. Shmyrev

      Dear Daniel

      There is no cross-platform library since platforms we support are very different. For example there is no single way to record audio for Android and iOS and Windows Phone. Some libraries pretend to be cross-platform like portaudio, you can use them if needed.

      You can use webservice to create initial dictionary for pokemon names, still, it is recommended to review the dictionary manually afterward because automatic pronunciation might have errors.

       
  • Daniel Eck

    Daniel Eck - 2015-05-05

    So I've put together a library using the web service and the commands listed there, and it works quite well with pocketsphinx_continuous, but it left me with a few questions:

    1. How can I reduce the size of the language model? I only need it to recognize specific keywords, so I don't see a reason to need to include the entire model. Currently it's at 24 meg or so, and I'd like to try to get this whole package to under a meg if possible.

    2. How can I stop it from trying to find multiple phrases with each input? For example, if I say two words, one after the other, it outputs them as two words, even if a phrase with them together doesn't exist. I'd prefer it to simply wait until input is over and pick whatever single phrase seemed to be the closest to what was said, like the original game is.

    3. How do I go about improving the dictionary manually? Pikachu and some other Pokemon names are being troublesome, they work sometimes, but Pikachu in particular should be more easily detected.

    4. Finally, how can I embed the pocketsphinx_continuous.exe's functionality into a program, the tutorial went into doing a program that read from a file, how do I call the built-in mic reading capability and turn it on/off?

    Thank you so much, this was actually a lot easier than I expected it to be, you've been very helpful.

     

    Last edit: Daniel Eck 2015-05-05
    • Nickolay V. Shmyrev

      How can I reduce the size of the language model? I only need it to recognize specific keywords, so I don't see a reason to need to include the entire model. Currently it's at 24 meg or so, and I'd like to try to get this whole package to under a meg if possible.

      To search for keyphrases, use keyphrase spotting mode, then you do not need language model at all

      ow can I stop it from trying to find multiple phrases with each input? For example, if I say two words, one after the other, it outputs them as two words, even if a phrase with them together doesn't exist. I'd prefer it to simply wait until input is over and pick whatever single phrase seemed to be the closest to what was said, like the original game is.

      In keyphrase spotting mode there should be no such issue.

      How do I go about improving the dictionary manually? Pikachu and some other Pokemon names are being troublesome, they work sometimes, but Pikachu in particular should be more easily detected.

      Open dictionary with text editor and add entries you need

      Finally, how can I embed the pocketsphinx_continuous.exe's functionality into a program, the tutorial went into doing a program that read from a file, how do I call the built-in mic reading capability and turn it on/off?

      Open pocketsphinx-continuous source code (continuous.c) and see what is going on there.

       

      Last edit: Nickolay V. Shmyrev 2015-05-05
  • Daniel Eck

    Daniel Eck - 2015-05-05

    Thanks a lot! I'm just having issues implementing keyphrase spotting mode, I modified my phrase list to look like this:

    ~~~~~~~~~~~~
    lets check it out /1e-1/
    sleep tight /1e-1/
    see you tomorrow /1e-1/
    see you in the morning /1e-1/
    ~~~~~~~~~~~

    and tried putting it in with

    pocketpshinx-continuous.exe -inmic yes -kws phrases.txt
    

    And it said that no acoustical model was specified, so I passed in the -hmm to the model and it gave a whole bunch of "kws_search.c" line 165: The word 'ZAPDOS(etc)' is missing in the dictionary." errors.

    I've tried passing in the arguments to the dictionary and lm file I had before, but I get the same error.

     

    Last edit: Daniel Eck 2015-05-05
  • Daniel Eck

    Daniel Eck - 2015-05-05

    Yeah, I've got the latest version, and I used that exact command and I'm still getting a whole bunch of "The word is missing in the dictionary" errors.

    I've attached a picture of the exact command I used and the folder layout, I really appreciate the help, sorry to be getting so stuck.

     
    • Nickolay V. Shmyrev

      Sorry, but you are using some small hyp.dic and not cmudict-en-us.dict as I wrote.

      If your hyp.dic is created with lmtool please make sure that words are case sensitive. So words in your phrases.txt must be uppercase like in the dictionary. You also seem to use other words in phrases, not the ones you posted above. Please also note that phrases should not contain punctuation.

       
  • Daniel Eck

    Daniel Eck - 2015-05-05

    Oh yeah, I had tried both dictionaries, but the case sensitivity was the issue! Thanks a ton! I'll take out the punctuation as well.

    It does appear to still be listening for multiple phrases together (#2) and outputting them all at once, is there like a flag or something to eliminate that?

    Also, now that I have it using keyphrase spotting with the dictionary I generated, the detection quality is a lot worse(even though I was using the same .dic file before), any reason why that would be? I've actually run them side by side and it's clear the quality has gotten worse.

     

    Last edit: Daniel Eck 2015-05-05
    • Nickolay V. Shmyrev

      Keyphrase detection is controlled by threshold. If there are false alarms you can raise threshold, if detections are not stable you increase the threshold.

      Long phrases are hard to detect.

      You can provide audio data and phrase lists in order to get help on accuracy.

       
      • Nickolay V. Shmyrev

        Also, for 640 phrases lm indeed makes more sense, you can probably just use the small language model with phrase you created.

         
        • Nickolay V. Shmyrev

          It is not really possible to recognize single phrase at once in this case, it just recognizes what was said. You have to postprocess output to decide what to do with multiple phrases.

           
  • Daniel Eck

    Daniel Eck - 2015-05-06

    Yeah, and it actually comes out to only 459 phrases with repeats for different pronunciations removed. I'll probably need to look into the thresholds to fix my accuracy issues.

    So I noticed that I'm still needing to use the -hmm flag to link to the 2MB mdef file in en-us, is there any way this can be reduced or is it required?

    Also, thanks a ton again, your help has been amazing, I did not expect to have an essentially working solution this fast.

     
    • Nickolay V. Shmyrev

      So I noticed that I'm still needing to use the -hmm flag to link to the 2MB mdef file in en-us, is there any way this can be reduced or is it required?

      Acoustic model is required, it describes sounds of the langauge

       

Log in to post a comment.