CMU Sphinx / Forums / Help: How to config "sphinx-train.cfg" for command control acoustic model?

rezaee - 2018-05-13

I am trying to create a single speaker command control acoustic model..
I have about 100 little sentences like what is your name , how are you, that I repeated each sentence 30 times, and put them in a structure like this:

==sentence_one 1.wav 2.wav . 30.wav ==sentence_two 1.wav 2.wav . 30.wav

I think I have about 30-45mins of .wav voices totally.
I used SRILM to creat language model, then trained acoustic model using sphinx train, but my accuracy is low!

I tried to change senonses from 200 to 2000 but couldn't help! Also tried to use 8 and 16 for DENSITIES but didn't help too! So, what is the best configuration for this purpose? Should I record more sounds and repeat my sentences more than 30 times(up to how many times?)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Q3Varnam - 2018-05-14

In my opinion, you don't need a language model created by SRILM - can you change it to grammar based - it is much more simpler as you don't need to worry about word weightages. I am sure you have sufficient data for a 99% accruacy as I have achieved such an accuracy with a data similar to yours.

Are you measuring your accuracy by running sphinxtrain -s decode? This will use a set of audio files for decoding, while decoding try to use the same audio files used for training and provide the grammar file which can recognise your training sentences.

If you get poor recognition or misalignment then you can debug the grammar , cmninit, beam width in that order.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rezaee - 2018-05-14
  
  Thank you!
  But may you explain more about how can I build and use a grammer with pocketsphinx?
  
  Can I use with C++ code? How?
  
  Last edit: rezaee 2018-05-14
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Q3Varnam - 2018-05-14

grammar is just a plain text file, read here
https://cmusphinx.github.io/wiki/tutoriallm/#building-a-grammar

then you use the grammar file in your config uncomment the grammar model for decoding stage

# This variables, used by the decoder, have to be user defined, and # may affect the decoder output #$DEC_CFG_LANGUAGEMODEL = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}.lm"; # Or can be JSGF or FSG too, used if uncommented $DEC_CFG_GRAMMAR = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}.jsgf"; # $DEC_CFG_FSG = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}.fsg";
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thanks again, but how to change this command in my C++ code:

                     config = cmd_ln_init(NULL, ps_args(), TRUE,
                 "-hmm", "/home/m/robot/model_parameters   /robot.cd_cont_2000",
                     "-lm","/home/m/robot/etc/robot.lm",
                     "-dict", "/home/m/robot/etc/robot.dic",
                     NULL);

Q3Varnam - 2018-05-14

"-lm","/home/m/robot/etc/robot.lm"
replace with
-lm with -jsgf and /home/m/robot/etc/yourgrammar.jsgf

run pocketsphinx without options to see all the parameters it takes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rezaee - 2018-05-14
  
  May you upload a grammar folder if you have! I am confused about the sructure. I don't know should I write all my possible sentences that I have in a single .jsgf file? Or I must create many files for each sentence and import all in one? How should I address them in < > to import?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Q3Varnam - 2018-05-14

Read about JSGF, single file or multiple files are all for de-cluttering, why don't you start with a single sentence and proceed systematically.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rezaee - 2018-05-14
  
  OK! but I don't know how to use import<> command. In the .jsgf documentation it uses java like addressing <com.example.x> but I don't know how should I do that in Linux with C++? </com.example.x>
  
  I think: import </home/m/robot/grammer/x> ?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rezaee - 2018-05-14
  
  Is this a right grammar file?
  
  #JSGF V1.0; grammar introduction; <hello> = hello | hi; //isn't problem same name of hello? <feeling> = how are you; <name> = what is your name; <old> = how old are you; <origin> = where are you from; <live> = where is your home; <food> = what is your favorite food; <love> = do you love me | do you like (red | blue | green); <education> = are you educated | what grade are you in;
  
  These are some of my recorded voices for my command&control system.
  
  Last edit: rezaee 2018-05-14
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - rezaee - 2018-05-14
    
    I tried the above .jsfg file but when I wanted to test it got this error message:
    
    ERROR: "pocketsphinx.c", line 680: No public rules found in /home/m/myrobot3/robot/etc/robot.jsgf
    
    But I don't know which rule should be publiced? I have only one .jsfg file like above with 5 times more sentences!
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2018-05-14

I tried to add public in front of all the rules but I get terrible results with this error messages:

ERROR: "fsg_search.c", line 940: Final result does not match the grammar in frame xxx

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Q3Varnam - 2018-05-14

you just need to add one more line, add a bracket () -> this means the word is mandatory for the
[]-> optional word

<hello> = (hello | hi); //isn't problem same name of hello?
<feeling> = how are you;
<name> = what is your name;
<old> = how old are you;
<origin> = where are you from;
<live> = where is your home;
<food> = what is your favorite food;
<love> = do you love me | do you like (red | blue | green);
<education> = are you educated | what grade are you in;</education></love></food></live></origin></old></name></feeling></hello>

public = (<hello>)|(<feeling>)|(<name>);</name></feeling></hello>

This will recognise if either Hello/Hi - only one of them being said
or
how are you
or
what is your name
If you want to recognise all of those sentences just add the OR | and append the tag to the command.

you first try it with pocketsphinx and then go to your C++ code

pocketsphinx_continuous -hmm ./model_parameters -jsgf ./grammar.jsgf -infile ./audiousedfortraining.wav -dict ./dictionary.dic -backtrace yes -cmninit 60,3,1 -beam 1e-80 -pbeam 1e-60 -lw 10 -wip 0.9

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rezaee - 2018-05-15
  
  I attached my .jsgf file. it gives me :
  
  SENTENCE ERROR: 100.0% (10/10) WORD ERROR RATE: 100.0% (22/22)
  
  After training and
  
  ERROR: "fsg_search.c", line 940: Final result does not match the grammar in frame 97
  
  In run-time!
  
  Last edit: rezaee 2018-05-15
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Q3Varnam - 2018-05-15

Now you check the logs under the result directory you will find the xyz.align file - look in to what is matching and what is not matching. It looks like you don't have the _test.transcription /test.fileids mapped correctly. From here on you are on your own - you have to re-read the documentation and create the models correctly.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rezaee - 2018-05-15
  
  Okay, thank you so much!
  Did you see my attachment? was that true?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Q3Varnam - 2018-05-15

the grammar file is good, make sure your test_transcription/fileids are having the same sentences as your grammar file

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rezaee - 2018-05-15
  
  I have these sentences in my transcription file for example:
  
  <s> become blue <\s> <s> become red <\s>
  
  But I wrote them like this in my grammar .jsgf file:
  
  <colors> = (red | blue); <action> = become <colors> public <commands> = <action>
  
  Is this okay or the problem is here?
  
  Last edit: rezaee 2018-05-15
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2018-05-15

This is robot.align file but the result is strange!

SALAM (test-1)
*** (test-1)
Words: 1 Correct: 0 Errors: 1 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 1 Substitutions: 0
HALET CHETORE (test-2)
*** *** (test-2)
Words: 2 Correct: 0 Errors: 2 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 2 Substitutions: 0
ESMET CHIE (test-3)
*** *** (test-3)
Words: 2 Correct: 0 Errors: 2 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 2 Substitutions: 0
CHAND SALETE (test-4)
*** *** (test-4)
Words: 2 Correct: 0 Errors: 2 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 2 Substitutions: 0
AHLE KOJAEI (test-5)
*** *** (test-5)
Words: 2 Correct: 0 Errors: 2 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 2 Substitutions: 0
KHUNATUN KOJAS (test-6)
*** *** (test-6)
Words: 2 Correct: 0 Errors: 2 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 2 Substitutions: 0
CHE GHAZAEI DUS DARI (test-7)
*** *** *** *** (test-7)
Words: 4 Correct: 0 Errors: 4 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 4 Substitutions: 0
MANO DUS DARI (test-8)
*** *** *** (test-8)
Words: 3 Correct: 0 Errors: 3 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 3 Substitutions: 0
SAVAD DARI (test-9)
*** *** (test-9)
Words: 2 Correct: 0 Errors: 2 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 2 Substitutions: 0
CLAS CHANDOMI (test-10)
*** *** (test-10)
Words: 2 Correct: 0 Errors: 2 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 2 Substitutions: 0
TOTAL Words: 22 Correct: 0 Errors: 22
TOTAL Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
TOTAL Insertions: 0 Deletions: 22 Substitutions: 0

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2018-05-15

I have these sentences in my transcription file for example:

<s> become blue <\s> <s> become red <\s>

But I wrote them like this in my grammar .jsgf file:

<colors> = (red | blue); <action> = become <colors> public <commands> = <action>

Is this okay or the problem is here?

Last edit: rezaee 2018-05-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2018-05-15

Great @Q3Varnam , finally I solved the problem!

It was because of my missing to run this command

sphinxtrain -t robot setup

Before

sphinxtrain run

Thank you again!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to config "sphinx-train.cfg" for command control acoustic model?

Speech Recognition Toolkit

Forums

Help

How to config "sphinx-train.cfg" for command control acoustic model? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How to config "sphinx-train.cfg" for command control acoustic model?