I am trying to create a single speaker command control acoustic model..
I have about 100 little sentences like what is your name , how are you, that I repeated each sentence 30 times, and put them in a structure like this:
I think I have about 30-45mins of .wav voices totally.
I used SRILM to creat language model, then trained acoustic model using sphinx train, but my accuracy is low!
I tried to change senonses from 200 to 2000 but couldn't help! Also tried to use 8 and 16 for DENSITIES but didn't help too! So, what is the best configuration for this purpose? Should I record more sounds and repeat my sentences more than 30 times(up to how many times?)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In my opinion, you don't need a language model created by SRILM - can you change it to grammar based - it is much more simpler as you don't need to worry about word weightages. I am sure you have sufficient data for a 99% accruacy as I have achieved such an accuracy with a data similar to yours.
Are you measuring your accuracy by running sphinxtrain -s decode? This will use a set of audio files for decoding, while decoding try to use the same audio files used for training and provide the grammar file which can recognise your training sentences.
If you get poor recognition or misalignment then you can debug the grammar , cmninit, beam width in that order.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
then you use the grammar file in your config uncomment the grammar model for decoding stage
# This variables, used by the decoder, have to be user defined, and
# may affect the decoder output
#$DEC_CFG_LANGUAGEMODEL = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}.lm";
# Or can be JSGF or FSG too, used if uncommented
$DEC_CFG_GRAMMAR = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}.jsgf";
# $DEC_CFG_FSG = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}.fsg";
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
May you upload a grammar folder if you have! I am confused about the sructure. I don't know should I write all my possible sentences that I have in a single .jsgf file? Or I must create many files for each sentence and import all in one? How should I address them in < > to import?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK! but I don't know how to use import<> command. In the .jsgf documentation it uses java like addressing <com.example.x> but I don't know how should I do that in Linux with C++?
I think: import </home/m/robot/grammer/x> ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
you just need to add one more line, add a bracket () -> this means the word is mandatory for the
[]-> optional word
<hello> = (hello | hi); //isn't problem same name of hello?
<feeling> = how are you;
<name> = what is your name;
<old> = how old are you;
<origin> = where are you from;
<live> = where is your home;
<food> = what is your favorite food;
<love> = do you love me | do you like (red | blue | green);
<education> = are you educated | what grade are you in;
public = (<hello>)|(<feeling>)|(<name>);
This will recognise if either Hello/Hi - only one of them being said
or
how are you
or
what is your name
If you want to recognise all of those sentences just add the OR | and append the tag to the command.
you first try it with pocketsphinx and then go to your C++ code
Now you check the logs under the result directory you will find the xyz.align file - look in to what is matching and what is not matching. It looks like you don't have the _test.transcription /test.fileids mapped correctly. From here on you are on your own - you have to re-read the documentation and create the models correctly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to create a single speaker command control acoustic model..
I have about 100 little sentences like
what is your name
,how are you
, that I repeated each sentence 30 times, and put them in a structure like this:I think I have about 30-45mins of
.wav
voices totally.I used SRILM to creat language model, then trained acoustic model using
sphinx train
, but my accuracy is low!I tried to change
senonses
from 200 to 2000 but couldn't help! Also tried to use8 and 16
forDENSITIES
but didn't help too! So, what is the best configuration for this purpose? Should I record more sounds and repeat my sentences more than 30 times(up to how many times?)?In my opinion, you don't need a language model created by SRILM - can you change it to grammar based - it is much more simpler as you don't need to worry about word weightages. I am sure you have sufficient data for a 99% accruacy as I have achieved such an accuracy with a data similar to yours.
Are you measuring your accuracy by running sphinxtrain -s decode? This will use a set of audio files for decoding, while decoding try to use the same audio files used for training and provide the grammar file which can recognise your training sentences.
If you get poor recognition or misalignment then you can debug the grammar , cmninit, beam width in that order.
Thank you!
But may you explain more about how can I build and use a grammer with pocketsphinx?
Can I use with C++ code? How?
Last edit: rezaee 2018-05-14
grammar is just a plain text file, read here
https://cmusphinx.github.io/wiki/tutoriallm/#building-a-grammar
then you use the grammar file in your config uncomment the grammar model for decoding stage
Thanks again, but how to change this command in my C++ code:
"-lm","/home/m/robot/etc/robot.lm"
replace with
-lm with -jsgf and /home/m/robot/etc/yourgrammar.jsgf
run pocketsphinx without options to see all the parameters it takes
May you upload a grammar folder if you have! I am confused about the sructure. I don't know should I write all my possible sentences that I have in a single
.jsgf
file? Or I must create many files for each sentence and import all in one? How should I address them in< >
to import?Read about JSGF, single file or multiple files are all for de-cluttering, why don't you start with a single sentence and proceed systematically.
OK! but I don't know how to use
import<>
command. In the.jsgf
documentation it uses java like addressing <com.example.x> but I don't know how should I do that in Linux with C++?I think:
import </home/m/robot/grammer/x>
?Is this a right grammar file?
These are some of my recorded voices for my command&control system.
Last edit: rezaee 2018-05-14
I tried the above
.jsfg
file but when I wanted to test it got this error message:But I don't know which rule should be publiced? I have only one
.jsfg
file like above with 5 times more sentences!I tried to add
public
in front of all the rules but I get terrible results with this error messages:you just need to add one more line, add a bracket () -> this means the word is mandatory for the
[]-> optional word
<hello> = (hello | hi); //isn't problem same name of hello?
<feeling> = how are you;
<name> = what is your name;
<old> = how old are you;
<origin> = where are you from;
<live> = where is your home;
<food> = what is your favorite food;
<love> = do you love me | do you like (red | blue | green);
<education> = are you educated | what grade are you in;
public = (<hello>)|(<feeling>)|(<name>);
This will recognise if either Hello/Hi - only one of them being said
or
how are you
or
what is your name
If you want to recognise all of those sentences just add the OR | and append the tag to the command.
you first try it with pocketsphinx and then go to your C++ code
pocketsphinx_continuous -hmm ./model_parameters -jsgf ./grammar.jsgf -infile ./audiousedfortraining.wav -dict ./dictionary.dic -backtrace yes -cmninit 60,3,1 -beam 1e-80 -pbeam 1e-60 -lw 10 -wip 0.9
I attached my
.jsgf
file. it gives me :Last edit: rezaee 2018-05-15
Now you check the logs under the result directory you will find the xyz.align file - look in to what is matching and what is not matching. It looks like you don't have the _test.transcription /test.fileids mapped correctly. From here on you are on your own - you have to re-read the documentation and create the models correctly.
Okay, thank you so much!
Did you see my attachment? was that true?
the grammar file is good, make sure your test_transcription/fileids are having the same sentences as your grammar file
I have these sentences in my transcription file for example:
But I wrote them like this in my grammar .jsgf file:
Is this okay or the problem is here?
Last edit: rezaee 2018-05-15
This is
robot.align
file but the result is strange!I have these sentences in my transcription file for example:
But I wrote them like this in my grammar .jsgf file:
Is this okay or the problem is here?
Last edit: rezaee 2018-05-15
Great @Q3Varnam , finally I solved the problem!
It was because of my missing to run this command
Before
Thank you again!