I would like to add limited vocabulary speech recognition to a larger project. For my purposes I need to classify a speech sample but it is not necessary to decode it to phonemes or a word. As I do not know much about speech recognition, my initial approach would be clustering MFCC features, but I need a speech corpus of single words with multiple speakers for each word, which I have so far been unable to find.
If anyone can provide information on a good technique for my purpose or a freely-available corpus, I would be grateful.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
I would like to add limited vocabulary speech recognition to a larger project. For my purposes I need to classify a speech sample but it is not necessary to decode it to phonemes or a word. As I do not know much about speech recognition, my initial approach would be clustering MFCC features, but I need a speech corpus of single words with multiple speakers for each word, which I have so far been unable to find.
If anyone can provide information on a good technique for my purpose or a freely-available corpus, I would be grateful.
Thanks.
This is called voice activity detection or VAD
You can download a database for training here http://www.openslr.org/17/
You can read paper about it here https://arxiv.org/pdf/1510.08484v1.pdf
Thanks for the prompt reply.
It is not speech/non-speech I need to classify, but I need to classify the same word being spoken across different speakers.
This is called keyword spotting, you can use pocketsphinx for that
http://cmusphinx.sourceforge.net/wiki/tutoriallm#keyword_lists