CMU Sphinx / Forums / Help: specific word detection from Telephonic audio

Radhika Patel - 2016-02-04

I have designed a new app for detecting a word which makes use of pocketsphinx and I have a requirement to re-adjust the dictonary values for that so I created text file with my keywords and then uploaded it on http://www.speech.cs.cmu.edu/tools/lmtool-new.html also created Dictonary and converted .lm file to .dmp file using command "pocketsphinx_continuous -inmic yes -lm 8521.lm -dict 8521.dic" then I implemented hmm.en-us-semi languvge model. I set my threshold value as 'HELLO /1e-1/' It's easily detecting human voice but not able to detect a telephonic Voice So I Implemented 8khz language Model as per soluton given in forum but found error in mdef file and dmp file. I followed these steps severel times and got the same error in mdef file which displays like "ERROR: "acmod.c",Folder '/path/hmm/en-us-8khz' does not contain acoustic model definition 'mdef' and error-: new_Decoder returned -1".

Can anybody Help Me fix this error and create a dictonary for Telephonic word detection. And are there any other changes required?

Last edit: Radhika Patel 2016-02-04

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-04
  
  It is better to use
  
  http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-ptm-8khz-5.2.tar.gz/download
  
  Can anybody Help Me fix this error and create a dictonary for Telephonic word detection. And are there any other changes required?
  
  It is easy to fix, just make sure that model is properly placed in the folder and that you correctly specify the path to it. No other changes required.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-05

Thank you for your quick response...

But now I am facing another problem that says how to list language model and dictionary in 'assets.list', Can you please provide me syntax or snap? Currently I have written 'models/hmm/en-us-8khz/README'in this way but having error 'E/error-: sync/models/hmm/en-us-8khz/README.md5'. I dont have any idea how to set files in assests and also one another thing in gram file how should I mentaion threshold value for particular word?

Please reply me as soon as possible.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-05
  
  http://cmusphinx.sourceforge.net/wiki/tutorialandroid#including_resource_files
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-05

It's Working, Thank you :)

Can you please tell me how to set a threshold value for getting high accuracy in world detection? Because Where there is background noise ramdom words made up from the dictionary will be decoded.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David - 2016-02-05

Hi Radhika,

Find your answer here

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-08

Thank you It's works 80% cases. Where there is background noise ramdom words made up from the dictionary will be decoded.

Can you please tell me what is the best model for teliphonic audio detection in CMU Sphinx?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David - 2016-02-09

The best model for telephony is mentionned in Post 2. To address accuracy issues you need to send your config and the that you try to decode

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-15

Thank you David

Below is my digit.gram file:
HELLO /1-e5/
HI /1-e1/
I HELP YOU /1-e20/
MY NAME IS /1-e20/
GOOD MORNING /1-e20/
GOOD AFTERNOON /1-e35/
GOOD EVENING /1-e25/

In which all words are working fine just Good Afternoon is not getting detected. So can you please tell me the threshold value for Good Afternoon?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-15
  
  The format for float numbers is 1e-35, not 1-e35. Not strange your gram does not work.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-16

Thank you Nickolay It's working

Can I tune my threshold value at runtime? For example if I want to set threshold value "HELLO /1-e5/" in my gram file then can I increase or decries it at run time as per user input ?? Is it possible?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-16
  
  1e-5, not 1-e5. In runtime you can remove the current search and add a new one.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-16

HELLO 1e-5/
HI /1e-1/
I HELP YOU /1e-20/
MY NAME IS /1e-20/
GOOD MORNING /1e-20/
GOOD AFTERNOON /1e-30/
GOOD EVENING /1e-25/

How to tune this 7 key

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-16
  
  Tutorial says:
  
  Threshold must be tuned to balance between false alarms and missed detections, the best way to tune threshold is to use a prerecorded audio file.
  
  http://cmusphinx.sourceforge.net/wiki/tutoriallm
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-17

Thank you Nickolay

I read the suggested tutorial Now I have a query with build Language Model that I have a word "hello" in my .dic file and there are 2 pronounciations option available for it as
HELLO HH AH L OW
HELLO(2) HH EH L OW
If I want to add 3rd pronounce for Hello Is it possible to add it in .disc file? Mainly .dic file is editable by us or not?

And another question is which library is better for telephony use 'en-us-ptm' or 'cmusphinx-en-us-ptm-8khz-5.2' ?? I got same result in both library so can you tell me which one is best ?

Can Anybody Help me As Soon as possible ?

Last edit: Radhika Patel 2016-02-18

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-19
  
  Mainly .dic file is editable by us or not?
  
  It is editable.
  
  And another question is which library is better for telephony use 'en-us-ptm' or 'cmusphinx-en-us-ptm-8khz-5.2' ??
  
  For telephony you need to use 8khz.
  
  Last edit: Nickolay V. Shmyrev 2016-02-19
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-22

Thank you,

I am using cmusphinx-en-us-ptm-8khz-5.2 library and my theshold is
HELLO /1e-5/
HI /1e-1/
I HELP YOU /1e-20/
MY NAME IS /1e-20/
WELCOME /1e-15/
GOOD MORNING /1e-25/
GOOD AFTERNOON /1e-30/
GOOD EVENING /1e-28/
This works fine with all except Google Translater's voice so can you please suggest me solution for this

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-22
  
  You need to provide data files, command line, logs, results in order to get help on accuracy issues.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-23

I attached my custom dictionary and gram file along with my recognizer setup,

private void setupRecognizer(File assetsDir) throws IOException {
Log.e("setupRecognizer", "setupRecognizer");
recognizer = defaultSetup()
.setAcousticModel(new File(assetsDir, "cmusphinx-en-us-ptm-8khz-5.2"))
.setDictionary(new File(assetsDir, "6199.dic"))
.setBoolean("-allphone_ci", true)
.getRecognizer();
recognizer.addListener(this);
File digitsGrammar = new File(assetsDir, "digits.gram");
recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}

If I increase my threshold value it also increases false alerms. This works fine with all except Google Translater's voice so can you please suggest me solution for this

Last edit: Radhika Patel 2016-02-23

6199.dic

digits.gram

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-23
  
  You need to share the audio file
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-23

I have attached vedio file the way I test app

In testing the app, it seems that my voice can be picked up for most of the key phrases, yet when I got my client to speak into it, it doesn't pick up any of the key phrase. Can you please tell me why it happen? and what is the solutions ?

Last edit: Radhika Patel 2016-02-23

IMG_3651.MOV

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-23
  
  You need to provide audio file recordings to let me reproduce your problems.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Radhika Patel - 2016-02-26

This is my audio recording. Please give me suggection as soon as possible

Last edit: Radhika Patel 2016-02-29

welcome-test.mp3

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Radhika Patel - 2016-03-08
  
  Hi Nickolay did you got chance to reproduce our problems?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

specific word detection from Telephonic audio

Speech Recognition Toolkit

Forums

Help

specific word detection from Telephonic audio document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

specific word detection from Telephonic audio