Pocketsphinx - background music question

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Pocketsphinx - background music question

Forum: Help

Creator: Jake Nix

Created: 2012-06-23

Updated: 2012-09-22

Jake Nix - 2012-06-23

Hello,

I'm creating my own voice activated voice internet radio for the raspberry pi.
I have a basic grammar and pocketsphinx working excellently.

I'm using a modified version of pocketsphinx_continuous.c to check for
keywords and perform actions.

Accuracy is great and I have no problems, however; when I have music (radio)
playing in the background (from the device), pocketsphinx listens to the music
and attempts to decode/understand it.

Is there a way to prevent this, or make pocketsphinx ignore the music? (maybe
phase invert the output and input to cancel out the music? I'm not sure how to
do this though)

Any ideas would be greatly appreciated.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

The Grand Janitor - 2012-06-25

Hey jakenix_1,

That is not a simple problem, mainly because the acoustic model, or the
statistical model which capture the acoustic characteristics of speech, was
trained under no condition of music. It is also not easy to 'tweak' the
acoustic model.

So I would say you might want to make sure that your system runs in an
environment which is not too noisy.

The Grand Janitor

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2012-06-25

Hi jakenix_1

If the acoustic model is not trained under music condition, you'll have to
ensure that speech input that reaches your
decoding engine is clean enough (has as little music noise as possible).
Following tricks could be used:

Multi mic approach: 2 or more mic system which tries cancelling the common mode music noise from getting into
the decoding system (ref: Springer handbook of speech processing by Benesty,
Sondhi and Huang).

A simpler approach where you configure the system so that it decodes on cue. For example the system is not
always in the listening mode rather it starts looking for keyword only when
prompted by another input for example
a key-press or something. This will avoid false triggering of the system and
improve user experience.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.