Menu

Voice Command Light Switch/Dimmer

2010-12-01
2012-09-22
  • Tyler Brooks

    Tyler Brooks - 2010-12-01

    I am working on a wireless light switch/dimmer (like the ones you have in your
    house). It is based on the Freescale MC13224 CPU. This is a 32 bit ARM7 CPU
    with 128K of flash and 96K of RAM. It runs at 24MHz. It has an integrated
    2.4GHz radio and 802.15.4 stack. Nice part but it doesn't have an FPU.

    During a design discussion it was suggested that somebody look into adding
    voice command. I was elected. :)

    The number of voice commands that the switch would have to recognize seem
    small. Thing like 'Lights xxx' where xxx is one of 'On, Off, Up, Down, Sleep,
    Party, Away...etc'. With a little thought, I would think the set could be
    limited to a couple dozen. The commands would have to work for various
    speakers (speaker independent) and be able to be identified within a moderate
    amount of background noise.

    I have downloaded pocketsphinx and I was starting to go through it. I have the
    demos working. The functionality of 'tidigits' is close to what I think I
    need.

    I have also contacted the Sensory guys thinking that maybe one of their RSC-4x
    chips might help.

    So, I was wondering if I could tap the experience base at this site for a
    little guidance:
    1) Would a 'cut down' version of pocketsphinx have a chance of fitting on my
    CPU? The compiled sizes of the libraries and applications seem a little large
    (compared to 96K RAM), but I assume that an experienced pocketsphinx person
    could improve on that.
    2) How much CPU horsepower is generally required. The MC13224 is a 32 bit ARM
    running at 24MHz. Is that enough computing power for a small vocabulary?
    3) If my CPU is not large enough, any suggestions? What would be the minimum
    mircrocontroller (non-FPU) I would have to use to run pocketsphinx on a small
    vocabulary? What speed would it need to be? How much program and data memory
    would it need?
    4) Anybody tried the Sensory chips (the ones that they used in the Furby)? How
    do they compare to the pocketsphinx technology?

     
  • Nickolay V. Shmyrev

    During a design discussion it was suggested that somebody look into adding
    voice command. I was elected. :)

    I think it's a great project

    1) Would a 'cut down' version of pocketsphinx have a chance of fitting on my
    CPU? The compiled sizes of the libraries and applications seem a little large
    (compared to 96K RAM), but I assume that an experienced pocketsphinx person
    could improve on that.

    I think it's too small for generic HMM engine. You can probably implement DTW
    recognizer with such amount of memory but even good frontend will require a
    lot. Overall, processors are getting more powerful nowdays, spend a few more
    bucks to get 10 times more power ;)

    There are specialized solutions for such a small chips, but they are carefully
    developed with the limited hardware in mind.
    For example there is an engine for similar Fujitsu chip, but it took

    https://mcu.emea.fujitsu.com/emea_content/downloads/MICRO/fme/micros
    /Fujitsu_FlexRay_Solutions_-_from_systems_support_to_silicon.pdf

    2) How much CPU horsepower is generally required. The MC13224 is a 32 bit
    ARM running at 24MHz. Is that enough computing power for a small vocabulary?

    The target hardware platform for pockesphinx was the Sharp Zaurus SL-5500
    hand-held computer. The Zaurus is typical of the previous generation of hand-
    held PCs, having a 206MHz StrongARM R processor, 64MB of SDRAM, 16MB of flash
    memory, and a quarter-VGA color LCD screen.

    3) If my CPU is not large enough, any suggestions? What would be the minimum
    mircrocontroller (non-FPU) I would have to use to run pocketsphinx on a small
    vocabulary? What speed would it need to be? How much program and data memory
    would it need?

    See above. FPU is not an issue. Pocketsphinx can work with fixed-point
    processor.

    4) Anybody tried the Sensory chips (the ones that they used in the Furby)?
    How do they compare to the pocketsphinx technology?

    Sorry, no idea.

     
  • Tyler Brooks

    Tyler Brooks - 2010-12-01

    Thanks for the feedback nshmyrev.

    So, a quick search/read on the web tells me that DTW has largely been replaced
    by HMM because HMM does a better job. However, it is your guess that my
    processor and memory are too small for HMM.

    Just one more question...
    1) I really like the MC13224 (price is right. radio is easy to use. does a
    bang-up job of running my dimmer) but it appears to be about 10x too small for
    HMM (even for a small vocabulary I guess). However, there is a chance it could
    do DTW. My question is, would you ship a product like mine with a DTW
    recognizer? In other words, would it work in most situations or should I step-
    up to a processor that can handle HMM ?? The thing has got to work or I will
    just get a bunch of returned dimmers...

     
  • Antoine Raux

    Antoine Raux - 2010-12-01

    Another important question is that of distant vs close-talk speech. Is the
    goal (as one would expect) to allow the user to speak from anywhere in the
    room and control the lights? In that case, do you plan to use only a single
    microphone on the dimmer? That's not gonna work very well, even for small
    microphones. Typically, people use microphone arrays for that.
    My recommendation would be to write an iPhone/Android app to wirelessly
    control the dimmer. Then you can do ASR on the phone ;)

     
  • Tyler Brooks

    Tyler Brooks - 2010-12-02

    Doh!
    Thanks anchan77. That is a good idea. In the back of our heads was always to
    write an iPhone/Android app that let us control an entire house full of these
    things. Of course, it makes sense to just add voice command to the program.
    Thanks.

    Although, it is a little disappointing as well. Wouldn't it be neat if you
    could just go to Home Depot and pick up some light switches that responded to
    voice command? No iPhone/Android required.

    Also, smart phones go to sleep. They aren't on all the time. Once you have it
    in your hand, you might as well poke a button as opposed to speaking a
    command. Hmm...

    You say people use arrays of microphones for room coverage. By this, do you
    mean that you place microphones all over the room? So, in a normal sized room
    (say 20' x 15'), how many microphones would you require to do a decent job? Or
    is it just one high quality microphone?

     
  • Nickolay V. Shmyrev

    My question is, would you ship a product like mine with a DTW recognizer?

    DTW is just a little bit different approach but overall it's not really bad.
    It was successfully used widely. Moreover, if you'll add the functionality to
    record user samples to recognize them later. That could solve many issues.

    how many microphones would you require to do a decent job? Or is it just one
    high quality microphone?

    The issue here is not to cover the room but to fight with reverberation echo.
    Most systems are trained on close-distance microphones where there is no
    reverberation. In room recording echo significantly corrupts spectrum and
    lowers HMM accuracy. The problem also is that corruption depends on the
    position in the room. In research system you need to collect data from all
    microphones, take room geometry into account to be able to clean up speech.
    Well, you can test this effect with pocketsphinx on your computer.

     
  • Antoine Raux

    Antoine Raux - 2010-12-02

    I agree with Nickolay, the number, position of microphones, as well as the
    effect on performance are all empirical questions... I don't have any hands-on
    experience with distant speech recognition so I can't really give you any
    insight. It could be that with very limited vocabulary (as in your case) and a
    small room that is not too reverberant (e.g. no glass walls, etc), you can
    still get away with one microphone and reasonable accuracy but that's not
    sure. You could have several dimmers for the same light, in which case, you
    could have an algorithm to pick the ASR result from the one closest from the
    speaker (using power or SNR). But then the switches need to talk to each other
    (what happens if two switches hear each a different command?). That might be
    more complex than what you want for light switches... As Nick suggests, trying
    out with a laptop that you put at different places in different rooms and see
    how good it is might be a good first step.

     
  • Tyler Brooks

    Tyler Brooks - 2010-12-03

    Thank you nshmyrev and anchan77.
    I took your advice and walked around my house with my laptop and tidigits. I
    was using a Logitech QuickCAM Pro 9000 as a microphone (I use it for Skype and
    GTalk with no problems). I also set it up in several rooms and walked around.

    I could get it to work in most locations but 'distance speech' was a problem.
    I have trained myself to speak clearly to get a better response (although I
    can't seem to get 'six' to work very often). Overall, I would say that if I
    was slowly/loudly talking directly into the microphone, it worked in just
    about any room. If I was wandering around the room, however, accuracy was
    pretty poor (or zero).

    My wife had a lot more trouble. I think that was because she hadn't taken the
    time to train herself on the device and she wasn't speaking very
    loudly/clearly.

    It does appear to me that putting voice command into a light switch is beyond
    the cost/performance curve right now. Also, as anchan77 points out, voice
    command in a light switch is probably misplaced.

    Instead, I think the world needs a 'voice pod'. Something you set around the
    house that can understand a limited number of voice commands and that has a
    radio in it to transmit those commands to a device (or server of some sort).
    The time for this device is pretty soon. Smart energy issues are starting to
    make home device manufacturers seriously consider networking their devices.
    When that happens, a market will form for command and control software. That
    software will run on a server. That server will need input devices (something
    more convenient than just a web page).

    Voice processing is outside my expertise... but if there are some EE/CS
    students reading this blog and looking for something interesting to do...
    <fill_in_your_own_plan_here>. The endpoint of this project would be a device
    you set around the house that understood your commands. Then take that device
    to <fill_in_your_favorite_gadget_manufacturer_here> and ask them to buy you so
    you can continue to advance this work. </fill_in_your_favorite_gadget_manufacturer_here></fill_in_your_own_plan_here>

     

Log in to post a comment.