I am working on a project that listens into telephone conversations (English) and pick up say 10 key phrases eg Good Morning using the Keyword spotting feature
Am new to using speech recognition tool kit / pocket sphinx and would greatly appreciate some help with my questions as follows
• Based on the above description, is this something that the current version of pocket sphinx able to accomplish?
• I understand that telephony data is recorded at 8kHZ, would there be a ready acoustic model for use?
• What is a realistic accuracy that can be obtained based on the above parameters? i.e. About 10 keyphrases of about 3-4 syllable and using call centre conversation quality
Thank you very much in advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Based on the above description, is this something that the current version of pocket sphinx able to accomplish?
Yes
I understand that telephony data is recorded at 8kHZ, would there be a ready acoustic model for use?
Yes, in downloads
What is a realistic accuracy that can be obtained based on the above parameters? i.e. About 10 keyphrases of about 3-4 syllable and using call centre conversation quality
Keyword spotting is evaluated not in terms of accuracy but in terms of false alarm rate and detection rate or DET curve. You can expect about 80% detection rate at 5 false alarm per hour for keyphrase of 5 syllables. You can figure out exact numbers yourself.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello all,
I am working on a project that listens into telephone conversations (English) and pick up say 10 key phrases eg Good Morning using the Keyword spotting feature
Am new to using speech recognition tool kit / pocket sphinx and would greatly appreciate some help with my questions as follows
• Based on the above description, is this something that the current version of pocket sphinx able to accomplish?
• I understand that telephony data is recorded at 8kHZ, would there be a ready acoustic model for use?
• What is a realistic accuracy that can be obtained based on the above parameters? i.e. About 10 keyphrases of about 3-4 syllable and using call centre conversation quality
Thank you very much in advance.
Yes
Yes, in downloads
Keyword spotting is evaluated not in terms of accuracy but in terms of false alarm rate and detection rate or DET curve. You can expect about 80% detection rate at 5 false alarm per hour for keyphrase of 5 syllables. You can figure out exact numbers yourself.