I am working on a project where I have to integrate the speech functionalities of Pocketsphinx into an android application. In fact, I have to integrate the phoneme recognition functionality provided by Pocketpshinx that should be able to recongize phonemes in French language, e.g. the speech recongnizer should be able to recognize syllabes( like "de", "re", "se", etc), consonants (like "m", "f", "g", etc), double-consonants(like "kl", "ks", "gr",etc) and vowels(like "a", "o","e",etc).
Right now, I have integrated the Pocketsphinx for recognizing the phonemes mentioned above, but I have really bad results. For example, when I pronounce the "o", the recognized result sometimes is: "SIL ff ei au" (even I did not pronounce the letter "f" and "e" at all), or something else is appeared at the beggining that is not pronounced. The letters that are appeared at the beggining are not allways the same (sometimes I get "ll", "uu", etc), they change according to the environment I am doing the test. But, I get sometimes the letter I pronunce at the beggining (e.g. for "a", I get "SIL aa SIL") , but this happens really rarely.
So, could you guys please help and let me know what could be the problem and any suggestions for solving this problem?
Thank you very much!
Leutrim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just noticed that I did not have at all the "assets.xml" and also this code:
" ant.importBuild 'assets.xml'
preBuild.dependsOn(list, checksum)
clean.dependsOn(clean_assets) " was not added in the build.gradle file.
Could this be a problem since this is a way for accessing the necessary files for doing the recognition? But, I am wondering, how is that possible then to have a recognized result?
Thank you!
Leutrim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I would like to ask something related to the "test database setup", I would like just to make sure myself.
So I have to create audio files for each sound that is supposed to be recognized (e.g for "la" a single audio file, then for "de" another single audio file, and so on)? Then, I have to create the "test.fieldids". Afterwards, I have to create the "test.transcription" file (this should be of the form, eg. 1st row: la (arctic_01), 2nd row: de(arctic_02), and so on.
Then, I should put the audio files in a folder named "wav", and in order to run this on android, I just need to change the parameters of the decoder?
Could you please let me know if this all is correct?
Thank you very much in advance!
Leutrim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did the test database, and I have very poor results. As I told you, the speech recongnizer should be able to recognize syllabes( like "de", "re", "se", etc), consonants (like "m", "f", "g", etc), double-consonants(like "kl", "ks", "gr",etc) and vowels(like "a", "o","e",etc). But, it does not recognize them as it is supposed. I still get as a result phonemes that were not pronounced at all.
Could you please let me know what could be possible solutions for increasing the accuracy somehow? Or, whether pocketsphinx is able to do such recognition as recognizing a single vowel, or a syllable?
Thank you very much in advance!
Leutrim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your reference transcription does not match the audio content, for example in test_ke.wav you say "ke" two times and in reference transcription it is listed only once.
What are the arguments of the decoding command you run exactly, what is the error rate you see?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, you are right, my reference transcription does not match the audio content. Without having changed the reference transcription, I have the followng results:
TOTAL Words: 47 Correct: 9 Errors: 96
TOTAL Percent correct = 19.15% Error = 204.26% Accuracy = -104.26%
TOTAL Insertions: 58 Deletions: 0 Substitution: 38
The arguments that I am using for the decoder are the ones suggested on the website of CMUSphinx [http://cmusphinx.sourceforge.net/wiki/tutorialtuning]:
pocketsphinx_batch \
-adcin yes\
-cepdir wav \
-cepext .wav \
-ctl test.fileids \
-lm <your.lm, for="" example="" en-us.lm.dmp="" from="" pocketsphinx=""> \
-dict <your.dic, for="" example="" cmudict-en-us.dict="" from="" pocketsphinx=""> \
-hmm <your_hmm, for="" example="" en-us=""> \
-hyp test.hyp</your_hmm,></your.dic,></your.lm,>
But, I have run every single .wav file to see the recognized result, and I never get the right output. For running the a single .wav file I have used the following commands (also suggested on the CMUSphinx's website) :
pocketsphinx_continuous -infile test/data/wav/test_ke.wav
-hmm model/french/french \
-allphone model/french/fr-phone.lm.dmp -backtrace yes \
-beam 1e-20 -pbeam 1e-20 -lw 2.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This the command I am using with specified paths to each required file:
pocketsphonx_batch.exe -adcin yes -cepdir wav -cepext .wav -ctl /path to/test.fileids -lm /path to/fr-phone.lm.dmp -dict /path to/fr-dict.dict -hmm /path to/french/french -hyp test.hyp
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just finished the test. I also added all the arguments as in continous for doing phoneme recognition. The results are the following:
TOTAL Words: 112 Correct: 52 Errors: 370
TOTAL Percent correct = 46% Error = 330.36% Accuracy = -230.36%
TOTAL Insertions: 310 Deletions: 0 Substitutions: 60
You can find also attached a screenshot while running the pocketsphinx_batch.exe with the changed arguments.
I hope you are doing fine.
I am writing to you again, since I have a deadline for sending my solution. Could you please let me know if you have already found any way for solving my problem?
Thank you very much in advance!
Best,
Leutrim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello everyone,
I am working on a project where I have to integrate the speech functionalities of Pocketsphinx into an android application. In fact, I have to integrate the phoneme recognition functionality provided by Pocketpshinx that should be able to recongize phonemes in French language, e.g. the speech recongnizer should be able to recognize syllabes( like "de", "re", "se", etc), consonants (like "m", "f", "g", etc), double-consonants(like "kl", "ks", "gr",etc) and vowels(like "a", "o","e",etc).
Right now, I have integrated the Pocketsphinx for recognizing the phonemes mentioned above, but I have really bad results. For example, when I pronounce the "o", the recognized result sometimes is: "SIL ff ei au" (even I did not pronounce the letter "f" and "e" at all), or something else is appeared at the beggining that is not pronounced. The letters that are appeared at the beggining are not allways the same (sometimes I get "ll", "uu", etc), they change according to the environment I am doing the test. But, I get sometimes the letter I pronunce at the beggining (e.g. for "a", I get "SIL aa SIL") , but this happens really rarely.
So, could you guys please help and let me know what could be the problem and any suggestions for solving this problem?
Thank you very much!
Leutrim
You need to collect a test set to investigate decoding accuracy as described in our tutorial:
http://cmusphinx.sourceforge.net/wiki/tutorialtuning
Hello Nickolay,
Thank you very much for your fast repsonse.
I will try this tutorial, and will see what will happen.
Best,
Leutrim
Hello Nickolay,
I just noticed that I did not have at all the "assets.xml" and also this code:
" ant.importBuild 'assets.xml'
preBuild.dependsOn(list, checksum)
clean.dependsOn(clean_assets) " was not added in the build.gradle file.
Could this be a problem since this is a way for accessing the necessary files for doing the recognition? But, I am wondering, how is that possible then to have a recognized result?
Thank you!
Leutrim
If some file is missing the demo simply will not start
Hello Nickolay,
I would like to ask something related to the "test database setup", I would like just to make sure myself.
So I have to create audio files for each sound that is supposed to be recognized (e.g for "la" a single audio file, then for "de" another single audio file, and so on)? Then, I have to create the "test.fieldids". Afterwards, I have to create the "test.transcription" file (this should be of the form, eg. 1st row:
la(arctic_01), 2nd row:de(arctic_02), and so on.Then, I should put the audio files in a folder named "wav", and in order to run this on android, I just need to change the parameters of the decoder?
Could you please let me know if this all is correct?
Thank you very much in advance!
Leutrim
You run test on desktop, not on android.
Hello Nickolay,
I did the test database, and I have very poor results. As I told you, the speech recongnizer should be able to recognize syllabes( like "de", "re", "se", etc), consonants (like "m", "f", "g", etc), double-consonants(like "kl", "ks", "gr",etc) and vowels(like "a", "o","e",etc). But, it does not recognize them as it is supposed. I still get as a result phonemes that were not pronounced at all.
Could you please let me know what could be possible solutions for increasing the accuracy somehow? Or, whether pocketsphinx is able to do such recognition as recognizing a single vowel, or a syllable?
Thank you very much in advance!
Leutrim
Sure, as soon as you provide the required data files.
Hello Nickolay,
Thank you very much for your reply.
Here you have the test.fileids, test.transcriptions and the .wav files.
Thanks!
Leutrim
Ok, and what model do you use?
I am using the French acoustic and language model provided online on the following link:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/You
You can find them attached here as well. So, I am using the language model that is designed for the recognition of the phonemes in French language.
Your reference transcription does not match the audio content, for example in test_ke.wav you say "ke" two times and in reference transcription it is listed only once.
What are the arguments of the decoding command you run exactly, what is the error rate you see?
Yes, you are right, my reference transcription does not match the audio content. Without having changed the reference transcription, I have the followng results:
TOTAL Words: 47 Correct: 9 Errors: 96
TOTAL Percent correct = 19.15% Error = 204.26% Accuracy = -104.26%
TOTAL Insertions: 58 Deletions: 0 Substitution: 38
The arguments that I am using for the decoder are the ones suggested on the website of CMUSphinx [http://cmusphinx.sourceforge.net/wiki/tutorialtuning]:
pocketsphinx_batch \
-adcin yes\
-cepdir wav \
-cepext .wav \
-ctl test.fileids \
-lm <your.lm, for="" example="" en-us.lm.dmp="" from="" pocketsphinx=""> \
-dict <your.dic, for="" example="" cmudict-en-us.dict="" from="" pocketsphinx=""> \
-hmm <your_hmm, for="" example="" en-us=""> \
-hyp test.hyp</your_hmm,></your.dic,></your.lm,>
But, I have run every single .wav file to see the recognized result, and I never get the right output. For running the a single .wav file I have used the following commands (also suggested on the CMUSphinx's website) :
pocketsphinx_continuous -infile test/data/wav/test_ke.wav
-hmm model/french/french \
-allphone model/french/fr-phone.lm.dmp -backtrace yes \
-beam 1e-20 -pbeam 1e-20 -lw 2.0
What exact pocketsphinx_batch command do you run?
Provide an updated reference file.
This the command I am using with specified paths to each required file:
pocketsphonx_batch.exe -adcin yes -cepdir wav -cepext .wav -ctl /path to/test.fileids -lm /path to/fr-phone.lm.dmp -dict /path to/fr-dict.dict -hmm /path to/french/french -hyp test.hyp
You should have used -allphone instead of -lm in batch like in continuous with all other arguments recommended.
I'm still waiting for the updated reference file.
Yes, I did not relize that(since I need phoneme recognition, I need -allphone argument).
You can find attached the updated reference file.
Thank you!
Ok, so what are your results with allphone?
I just finished the test. I also added all the arguments as in continous for doing phoneme recognition. The results are the following:
TOTAL Words: 112 Correct: 52 Errors: 370
TOTAL Percent correct = 46% Error = 330.36% Accuracy = -230.36%
TOTAL Insertions: 310 Deletions: 0 Substitutions: 60
You can find also attached a screenshot while running the pocketsphinx_batch.exe with the changed arguments.
Hello Nickolay,
Could you please let me know what should I do, as you see I have really bad results?
Thank you very much!
Leutrim
Be patient, it will take some time for me to look on your issues.
Thank you very much for your help!
Looking forward to hearing from you.
Leutrim
Dear Nickolay,
I hope you are doing fine.
I am writing to you again, since I have a deadline for sending my solution. Could you please let me know if you have already found any way for solving my problem?
Thank you very much in advance!
Best,
Leutrim
Hello Nickolay,
please let me know if you need any further information?
Thank you!
Leutrim