I've been playing with using pocket sphinx to recognize the various calls of a single bird species. I am currently training using only six calls.
Following advice in the tutorials, and in the forums, I have gotten some good recognition results, even with only a small amount of training data (~10 minutes).
My problem is that while pocketsphinx appears to distinguish between the calls of the bird fairly well, it still generates hypothesis for non-bird sounds.
Any pointers for cutting down on the spurious hypothesis? I am running with -fillprob = 0.1, which helps a bit, but am still getting lots of unwanted hypothesis.
Do I need more training data? More filler sound units? Currently I'm only using SIL and ++NOISE++.
Thanks,
-Pete
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If this can work with birdsong then what about telephone ring tones? Seems very similar. Also, is there a write up on how to create "specialized fillers" like this I can look at.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've been playing with using pocket sphinx to recognize the various calls of a single bird species. I am currently training using only six calls.
Following advice in the tutorials, and in the forums, I have gotten some good recognition results, even with only a small amount of training data (~10 minutes).
My problem is that while pocketsphinx appears to distinguish between the calls of the bird fairly well, it still generates hypothesis for non-bird sounds.
Any pointers for cutting down on the spurious hypothesis? I am running with -fillprob = 0.1, which helps a bit, but am still getting lots of unwanted hypothesis.
Do I need more training data? More filler sound units? Currently I'm only using SIL and ++NOISE++.
Thanks,
-Pete
I think the more types of specialized fillers you'll create the better. That will help decoder to distinguish them as well.
If this can work with birdsong then what about telephone ring tones? Seems very similar. Also, is there a write up on how to create "specialized fillers" like this I can look at.
Sure, you can just train them as described in sphinxtrain tutorial. You'll need a database for training like voxforge for example though.