Menu

Detect any words are spoken in short recording using PocketSphinx on Android

Help
Tim Hunt
2016-08-19
2016-08-25
  • Tim Hunt

    Tim Hunt - 2016-08-19

    Hi,

    I want to make recordings of bird song, but automatically ignore any recordings that also have accidently recorded people talking. I'm doing this on an Android phone and thought PocketSphinx might be a good way to achieve this. If you agree that this sounds reasonable, are you able to help me (point me in the right direction ) to configure pocketspinx-android-demo to do this?

    I've started to modify the code so that I have a button that calls switchSearch() when pressed, but then I don't know how to configure this method to listen for any (or a list of most common) words.

    Thanks in advance for any assistance you can give.

    ps this is part of project to help save the native birds of New Zealand https://cacophony.org.nz/

    Cheers

    Tim

     
    • Nickolay V. Shmyrev

      This is not really a speech recognition problem, so pocketsphinx will not work out of box.

      To separate speech from bird singing you might want to build a HMM-GMM voice activity detection. You can read about algorithms in detail here:
      http://static.googleusercontent.com/media/research.google.com/ru//pubs/archive/40362.pdf

      In pocketsphinx you need to train an acoustic model with three single-phoneme word - speech, silence and bird singing. Each GMM should have many mixtures, like 128. You can download any speech database to emulate speech and you can use your clean bird singing recordings to emulate bird singing and other noises. Then you can recognize incoming audio with a simple grammar of three variants, it will give you segments with speech, birds and other noises. It might require slight modification of pocketsphinx code since pocketsphinx HMMs are three-states and you need single-state HMM.

       
  • Tim Hunt

    Tim Hunt - 2016-08-19

    Thanks Nickolay for your quick and knowledgeable/infomative response. Looks like I have a lot to learn - will keep me busy for a while!

    BUT - I don't need to separate the speech from birdsong - just indentify if a recording has speech in it (any speech) and then just ignore that recording. Can that be done?

    cheers
    Tim

     
    • Nickolay V. Shmyrev

      Identification of something is always a separation of that something from alternatives.

       
  • Tim Hunt

    Tim Hunt - 2016-08-25

    I tried OK Google on my phone - it recongised my talking, but ignored bird song. So do you think Google has has already done something similar to what you suggest above? I don't what to use OK
    Google as I'll be using old phones with very limited data connectivity.

     
    • Nickolay V. Shmyrev

      So do you think Google has has already done something similar to what you suggest above?

      Google built a similar classifier to distinguish between "ok google" phrase and everything else. You can read details here:

      http://static.googleusercontent.com/media/research.google.com/ru//pubs/archive/42537.pdf

       
  • Tim Hunt

    Tim Hunt - 2016-08-25

    Thanks Nickolay ( I'm very impressed that you have the time/will to answer all these questions) -

    Even once you go past the OK Google phrase, ie when it enters the listening stage, it is able to distinguish between 'general' speech (I tried many different phrases) and the recorded bird song. I was hoping that pocketsphinx could also do this 'out of the box', as I don't think I have the skill 'to build a HMM-GMM voice activity detection' that you suggest - I'll ponder some more. Thanks again.

     

    Last edit: Tim Hunt 2016-08-27

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.