Menu

kinect and the pi

Help
2016-10-13
2016-10-14
  • faceless105

    faceless105 - 2016-10-13

    Hello, I'm sorry if this seems like a lower level question, but I'm new to sphinx and having some issues that are hopefully minor that I just haven't been able to figure out.

    I've been playing around with a robotics project and had a cool idea to give it some Speech to Text capability. With this, I could ask it some questions and it could answer. It's all baby steps. I wanted to get pocketspinx working, then use the python wrappers, then integrate it into my program. I tried following some instructions here: http://cmusphinx.sourceforge.net/wiki/raspberrypi - but the documentation is pretty brief. Everything seems to work, but honestly, I just don't know if I'm doing it all correctly.

    The first issue I'm hitting is here, cat /proc/asound/cards check your device is at 0 How do I know if my device is at zero, is it just the first item in the list? How do I verify it's the right thing listed?

    When I run their final command pocketsphinx_continuous -inmic yes it does seem to work... well it doesn't generate any errors. It's definitely waiting for me to do something... but what!? if I start talking will it translate it? are there defined words only? The instruction of, "Run it, should work" is so vague that it's impossible to know if it's failing.

    Lastly, this is where I might be missing the mark all together.... I'm using an xbox 360 kinect as the mic. When I run "lsusb" I see everything register correctly, but for sphinx, how do I know if it's using the mic or even how do I know if the mic has registered since I don't see it when I run "cat /proc/asound/cards ". This is all newer territory for me, but I'd really love any help or advice. I feel like I'm pretty close, I just need to figure out the missing pieces and that's where I'm getting really stuck. Thanks!

     
    • Nickolay V. Shmyrev

      Using kinect is not trivial, our tutorial is for simple microphone. You need special driver and source localization software. For drivers you can use freenect:

      http://blog.bitcollectors.com/adam/2016/01/kinect-support-for-raspberry-pi-using-libfreenect/
      https://openkinect.org/wiki/FAQ

      however this gives you just 4 raw streams without localization. To localize sources you need special software like hark, check:

      http://hark.jp/wiki.cgi?page=HARK%2DKINECT

      With hark you can track sources with kinect and pass them to pocketsphinx. However, I'm not sure if hark-kinect works on raspberry pi, it should be pretty resource intensive.

      As for reading /proc/asound/cards, you need to familarize yourself with linux drivers and alsa a bit, if entry is missing from /proc/asound/cards, it is not properly supported.

       

      Last edit: Nickolay V. Shmyrev 2016-10-13
  • faceless105

    faceless105 - 2016-10-14

    I appreciate the feedback. I think for the time being, I'll start with just a simple USB mic as opposed to the kinect.

    I am still curious if you could expand on some of this, I should also probably include that I'm running Ubuntu-Mate on my pi incase that makes a difference.

    The first question is, when I run cat /proc/asound/cards, I see two options (I'm not able to grab the list from my pi right now, otherwise I'd provide them), how do I know if I'm seeing the right things? I have no reason to doubt that I see right things already, but it still feels worth asking. For example, should I see something comparable to how my mic is labeled when I run lsusb? This is my deeper lack of understanding towards linux I'm sure.

    The other question is about running the final "command pocketsphinx_continuous -inmic yes" - This seemed to run and it was definitely holding and waiting. My question is what should I see if it's working. If I start talking, will it attempt to spit it all right back out? Will it only dictate predefined phrases? I'd just like something to verify that it's doing what it's doing.

    The last question is really just an opinion I'd love to get. I'm new to this kind of programming. I've got a decade under my belt as a php developer. I've dabbled in a ton of languages, but there's always been a web focus and I've generally had very little work with application programming and even more, interacting with hardware. I am however determined to get some speech recognition in place and really hoping to use sphinx because it can work locally and doesn't need a wifi connection. So here's the question... I've been working on a python program that runs my robot and I'm curious what you'd recommend as the best way to monitor sphinx. Should sphinx run independantly and spit out the output to a file that I monitor or should I look for a python wrapper and try interacting with it directly? I'll admit I'm not sure what might go into either of these options off the top of my head but it's what I anticipate to be my options and with your knowledge of sphinx, I'd really appreciate your thoughts on how to impliment it.

    Thanks again for the help!

     
    • Nickolay V. Shmyrev

      For example, should I see something comparable to how my mic is labeled when I run lsusb?

      The good output is like this:

      cat /proc/asound/cards 
       0 [Device         ]: USB-Audio - C-Media USB Audio Device
                            C-Media USB Audio Device at usb-bcm2708_usb-1.3, full speed
      
      lsusb
      Bus 001 Device 007: ID 0d8c:000e C-Media Electronics, Inc.
      

      This seemed to run and it was definitely holding and waiting.

      It is waiting for speech, your microphone is silent.

      If I start talking, will it attempt to spit it all right back out?

      No, raspberry pi is too slow for dictation. You have to specify a grammar or keyword list with choices.

      Will it only dictate predefined phrases?

      On desktop it will recognize any phrase.

      I'd just like something to verify that it's doing what it's doing.

      You will see messages about decoded results if it is working properly.

      I've been working on a python program that runs my robot and I'm curious what you'd recommend as the best way to monitor sphinx. Should sphinx run independantly and spit out the output to a file that I monitor or should I look for a python wrapper and try interacting with it directly?

      You need to use python wrapper. You also can try to setup ROS and work with it instead.

       
  • faceless105

    faceless105 - 2016-10-14

    So with the lmtool, I've created a short list of commands already. For this command line example on getting it to work on the pi, how could I point that to those files?

    And additionally, for working with python, I found this example, that I believe should work (https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/continuous_test.py) the only thing I didn't understand is where gofarward.raw would come from. I found a very similar example on stackoverflow where it looked like they swapped out that refference to use the following code to create their stream:

    import pyaudio
    p = pyaudio.PyAudio()
    stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
    stream.start_stream()

    I'm not sure what works best but I'm planning to try them both. I didn't know the pi had such a hard time processing this data, but I suppose I shouldn't be too surprised that a computer that fits in the palm of your hand can't do everything that a desktop can do, lol. Again, I really appreciate the help. This is really good information to have to help clear things up and give me some direction to move forward.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.