I am trying to implement Trascriber.java as given in the CMUSphinx tutorial. When the accuracy of the recognized words was low, I tried "Adapting existing acoustic model".
I have executed the following steps as given in the tutorial "Adapting the default acoustic model":
1. Creating an adaptation corpus
2. Generating acoustic feature files (sphinx_fe)
3. Converting the sendump and mdef files (pocketsphinx_mdef_convert)
4. Accumulating observation counts (bw - continuous model)
5. Creating transformation with MLLR (mllr_solve) The files I have used in this process are attached as "ForumQuery.zip".
Ran the word_align.pl provided under sphinxtrain\scripts\decode. The output of which is
author of the danger TRAIL philip STEELS etc (arctic_0001)
author of the danger TRAIL, philip STEELS, etc (arctic_0001)
Words: 8 Correct: 6 Errors: 2 Percent correct = 75.00% Error = 25.00% Accuracy = 75.00%
Insertions: 0 Deletions: 0 Substitutions: 2
i'm playing a single hand in what looks like a losing game (arctic_0011)
i'm playing a single hand in what looks like a losing game (arctic_0011)
Words: 12 Correct: 12 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
Insertions: 0 Deletions: 0 Substitutions: 0
a combination of canadian capital quickly organized and petitioned for the same privileges (arctic_0024)
a combination of canadian capital quickly organized and petitioned for the same privileges (arctic_0024)
Words: 13 Correct: 13 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
Insertions: 0 Deletions: 0 Substitutions: 0
i was about to do this when cooler judgment prevailed *** (arctic_0026)
i was about to do this when cooler judgment prevailed AND (arctic_0026)
Words: 10 Correct: 10 Errors: 1 Percent correct = 100.00% Error = 10.00% Accuracy = 90.00%
Insertions: 1 Deletions: 0 Substitutions: 0
TOTAL Words: 43 Correct: 41 Errors: 3
TOTAL Percent correct = 95.35% Error = 6.98% Accuracy = 93.02%
TOTAL Insertions: 1 Deletions: 0 Substitutions: 2
Then executed the java program TranscriberWithMLLR.java which uses
recognizer.loadTransform("mllr_matrixCont", 1);
to use the created mllr matrix (using continuous mode).
The files I have used in this test process are attached as "ForumPost2.zip"
Here is my question.
In the ouput.txt attached, the text marked as "Actual" is the correct one and it is recognized with 5 errors by the java program. My aim is the java program only and the adaptation I have done is only an intermediate step. Still the error rate is very high. What should I do to improve the accuracy of recognition in the java program?
This has given me improvement in recognition accuracy in the range of 70% upto 100% for some of the test data. Will try to improve the lower value also. Btw, I am not able to include variances in mllr_transform execution. Does mllr_transform have impact only on means?
Can you guide me to a tutorial which explains MLLR, the importance of the means, variances and the counts calculated in this procedure.
Thank You very much.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A couple of times, the bw.exe in the Sphinxtrain\bin\release folder is deleted by my anti virus software in my system. I am using "Norton Security with Backup" software. Any known issues? Kindly let me know.
Thanks.
Balaji.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, and what is the md5 sum of the file in question? What is the version of sphinxtrain you are trying? In latest version there is no bin/release/bw.exe, there are bin/release/x86/bw.exe and bin/release/x64/bw.exe
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
sphinxtrain - the latest version dated 23rd Feb 2016 creates 2 folders : Win32 and x64. Right? However, due to errors like "LPACK not available", I am using one previous version :
Name : sphinxtrain-5prealpha-win32.zip
Size : 5102 KB
Number of files : 772
If required, I will attached that zip file here.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am working on improving the speech recognition accuracy with 2 sample voices. I'm following the tutorial "Adapting the default acoustic model".
A part of my transcriber.java is here (the configuration information):
The files AllSentences.dic and AllSentences.lm were created using the online lmTool. The source file for this contains the same statements twice, because, my speakers are 2 and their voices are recorded for the same sentences. I have attached 2 sets of sample voice files herewith.
Now, I see a great difference in the recognition accuracy between the 2. The observations.txt shows the difference. As an obvious fact, the second speaker's voice is a little different (I assume first one is normal). However, I am thinking by providing sample recordings for each of the statements, I am training the recognizer to understand both. Is this correct?
Kindly advise as to where should I improve the algorithm? Should I go back to the principles of changing the wav file itself, by some pre processing, cleaning, etc. Or, can I using Sphinx4 to achieve the goal.
Step1. Created a small corpus of sentences (AllSentences.txt attached herewith).
Step2. The 2 speakers separately spoke the sentences and recorded.
Step3. Converted the voice files to wav with the following parameters:
bit resolution : 16 bit
sampling rate : 16000 Hz
audio channels : mono
PCM format : PCM signed 16-bit little-endian
Step4. The first speaker has more data recorded - read out all the statements in the text file and adaptation done as per the tutorial. Following is the sequence of programs executed:
If required, I will give the above with parameters. As they are working fine and I have earlier communicated the same in this forum, I am not repeating them. Hope this is okay.
Step5. I have attached the mllr matrix created also herewith (mllr_matrixCont). The result of this speaker's adaptation has been quite satisfactory.
Now the problem starts:
Step6. The second speaker could speak only a few sentences (line no.27 to 38 only in the text file) and the process is repeated as in step4 above. The results were given in my previous post.
My aim is to improve the recognition accuracy for the second speaker. How to do?
Thank you.
I also want to understand the contents of the folder cmusphinx-en-us-5.2. This is called the "acoustic model" right? The files inside this folder - how are they created? What is the significance, etc. Can I get some reference material regarding this?
Thank you.
Last edit: Balaji 2016-11-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to implement Trascriber.java as given in the CMUSphinx tutorial. When the accuracy of the recognized words was low, I tried "Adapting existing acoustic model".
I have executed the following steps as given in the tutorial "Adapting the default acoustic model":
1. Creating an adaptation corpus
2. Generating acoustic feature files (sphinx_fe)
3. Converting the sendump and mdef files (pocketsphinx_mdef_convert)
4. Accumulating observation counts (bw - continuous model)
5. Creating transformation with MLLR (mllr_solve)
The files I have used in this process are attached as "ForumQuery.zip".
Later I have tried to test the adapted model before using it in java program, as per the tutorial http://cmusphinx.sourceforge.net/wiki/tutorialtuning Following are the steps:
Creating the test adaptation corpus - 4 wav files, test.Transcription, test.fileids and generated .dic and .lm using the lmtool.
Executed the test using
to use the created mllr matrix (using continuous mode).
The files I have used in this test process are attached as "ForumPost2.zip"
Here is my question.
In the ouput.txt attached, the text marked as "Actual" is the correct one and it is recognized with 5 errors by the java program. My aim is the java program only and the adaptation I have done is only an intermediate step. Still the error rate is very high. What should I do to improve the accuracy of recognition in the java program?
Thanks.
Last edit: Balaji 2016-04-18
You can try to transform the model before loading in sphinx4 with
mllr_transform
binary from sphinxtrain first.Thanks Nickolay. I ran mllr_transform using this command line:
And later changed the acoustic model path in my java program as
This has given me improvement in recognition accuracy in the range of 70% upto 100% for some of the test data. Will try to improve the lower value also. Btw, I am not able to include variances in mllr_transform execution. Does mllr_transform have impact only on means?
Can you guide me to a tutorial which explains MLLR, the importance of the means, variances and the counts calculated in this procedure.
Thank You very much.
https://www.ee.columbia.edu/~dpwe/papers/Gales97-mllr.pdf
Hello Nickolay,
A couple of times, the bw.exe in the Sphinxtrain\bin\release folder is deleted by my anti virus software in my system. I am using "Norton Security with Backup" software. Any known issues? Kindly let me know.
Thanks.
Balaji.
I am not aware of such. What does your antivirus say in the logs, why does it delete this file?
The message was
The log is attached herewith.
Ok, and what is the md5 sum of the file in question? What is the version of sphinxtrain you are trying? In latest version there is no bin/release/bw.exe, there are bin/release/x86/bw.exe and bin/release/x64/bw.exe
sphinxtrain - the latest version dated 23rd Feb 2016 creates 2 folders : Win32 and x64. Right? However, due to errors like "LPACK not available", I am using one previous version :
If required, I will attached that zip file here.
I don't think this version is relevant, but thank you for information.
I am working on improving the speech recognition accuracy with 2 sample voices. I'm following the tutorial "Adapting the default acoustic model".
A part of my transcriber.java is here (the configuration information):
The files AllSentences.dic and AllSentences.lm were created using the online lmTool. The source file for this contains the same statements twice, because, my speakers are 2 and their voices are recorded for the same sentences. I have attached 2 sets of sample voice files herewith.
Now, I see a great difference in the recognition accuracy between the 2. The observations.txt shows the difference. As an obvious fact, the second speaker's voice is a little different (I assume first one is normal). However, I am thinking by providing sample recordings for each of the statements, I am training the recognizer to understand both. Is this correct?
Kindly advise as to where should I improve the algorithm? Should I go back to the principles of changing the wav file itself, by some pre processing, cleaning, etc. Or, can I using Sphinx4 to achieve the goal.
Thank you.
Balaji.
It is not quite clear how do you perform the adaptation - amount of data, etc.
This is what I did:
Step1. Created a small corpus of sentences (AllSentences.txt attached herewith).
Step2. The 2 speakers separately spoke the sentences and recorded.
Step3. Converted the voice files to wav with the following parameters:
Step4. The first speaker has more data recorded - read out all the statements in the text file and adaptation done as per the tutorial. Following is the sequence of programs executed:
If required, I will give the above with parameters. As they are working fine and I have earlier communicated the same in this forum, I am not repeating them. Hope this is okay.
Step5. I have attached the mllr matrix created also herewith (mllr_matrixCont). The result of this speaker's adaptation has been quite satisfactory.
Now the problem starts:
Step6. The second speaker could speak only a few sentences (line no.27 to 38 only in the text file) and the process is repeated as in step4 above. The results were given in my previous post.
My aim is to improve the recognition accuracy for the second speaker. How to do?
Thank you.
Last edit: Balaji 2016-11-17
Adapt with the large amount of sentences of the second speaker.
I also want to understand the contents of the folder cmusphinx-en-us-5.2. This is called the "acoustic model" right? The files inside this folder - how are they created? What is the significance, etc. Can I get some reference material regarding this?
Thank you.
Last edit: Balaji 2016-11-18
http://cmusphinx.sourceforge.net/wiki/acousticmodelformat