CMU Sphinx / Forums / Help: How to improve the recognition accuracy using mllr

Balaji - 2016-04-18

I am trying to implement Trascriber.java as given in the CMUSphinx tutorial. When the accuracy of the recognized words was low, I tried "Adapting existing acoustic model".

I have executed the following steps as given in the tutorial "Adapting the default acoustic model":
1. Creating an adaptation corpus
2. Generating acoustic feature files (sphinx_fe)
3. Converting the sendump and mdef files (pocketsphinx_mdef_convert)
4. Accumulating observation counts (bw - continuous model)
5. Creating transformation with MLLR (mllr_solve)
The files I have used in this process are attached as "ForumQuery.zip".

Later I have tried to test the adapted model before using it in java program, as per the tutorial http://cmusphinx.sourceforge.net/wiki/tutorialtuning Following are the steps:

Creating the test adaptation corpus - 4 wav files, test.Transcription, test.fileids and generated .dic and .lm using the lmtool.

Executed the test using

pocketsphinx_batch -adcin yes -cepdir TestAdapt -cepext .wav -ctl TestAdapt\test.fileids \ -lm en-us.lm -dict cmudict-en-us.dict -hmm cmusphinx-en-us-5.2 -hyp test.hyp

Ran the word_align.pl provided under sphinxtrain\scripts\decode. The output of which is

author of the danger TRAIL philip STEELS etc (arctic_0001) author of the danger TRAIL, philip STEELS, etc (arctic_0001) Words: 8 Correct: 6 Errors: 2 Percent correct = 75.00% Error = 25.00% Accuracy = 75.00% Insertions: 0 Deletions: 0 Substitutions: 2 i'm playing a single hand in what looks like a losing game (arctic_0011) i'm playing a single hand in what looks like a losing game (arctic_0011) Words: 12 Correct: 12 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00% Insertions: 0 Deletions: 0 Substitutions: 0 a combination of canadian capital quickly organized and petitioned for the same privileges (arctic_0024) a combination of canadian capital quickly organized and petitioned for the same privileges (arctic_0024) Words: 13 Correct: 13 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00% Insertions: 0 Deletions: 0 Substitutions: 0 i was about to do this when cooler judgment prevailed *** (arctic_0026) i was about to do this when cooler judgment prevailed AND (arctic_0026) Words: 10 Correct: 10 Errors: 1 Percent correct = 100.00% Error = 10.00% Accuracy = 90.00% Insertions: 1 Deletions: 0 Substitutions: 0 TOTAL Words: 43 Correct: 41 Errors: 3 TOTAL Percent correct = 95.35% Error = 6.98% Accuracy = 93.02% TOTAL Insertions: 1 Deletions: 0 Substitutions: 2

Then executed the java program TranscriberWithMLLR.java which uses

recognizer.loadTransform("mllr_matrixCont", 1);

to use the created mllr matrix (using continuous mode).

The files I have used in this test process are attached as "ForumPost2.zip"

Here is my question.
In the ouput.txt attached, the text marked as "Actual" is the correct one and it is recognized with 5 errors by the java program. My aim is the java program only and the adaptation I have done is only an intermediate step. Still the error rate is very high. What should I do to improve the accuracy of recognition in the java program?

Thanks.

Last edit: Balaji 2016-04-18

ForumPost2.zip

ForumQuery.zip
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-04-19
  
  You can try to transform the model before loading in sphinx4 with mllr_transform binary from sphinxtrain first.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Balaji - 2016-04-19

Thanks Nickolay. I ran mllr_transform using this command line:

mllr_transform.exe -inmeanfn cmusphinx-en-us-5.2\means -varfn cmusphinx-en-us-5.2\variances -mllrmat mllr_matrixCont -outmeanfn cmusphinx-en-us-5.2-ADAPT\means

And later changed the acoustic model path in my java program as

configuration.setAcousticModelPath("file:cmusphinx-en-us-5.2-ADAPT");

This has given me improvement in recognition accuracy in the range of 70% upto 100% for some of the test data. Will try to improve the lower value also. Btw, I am not able to include variances in mllr_transform execution. Does mllr_transform have impact only on means?

Can you guide me to a tutorial which explains MLLR, the importance of the means, variances and the counts calculated in this procedure.
Thank You very much.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-04-19
  
  Can you guide me to a tutorial which explains MLLR, the importance of the means, variances and the counts calculated in this procedure.
  
  https://www.ee.columbia.edu/~dpwe/papers/Gales97-mllr.pdf
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Balaji - 2016-05-04

Hello Nickolay,

A couple of times, the bw.exe in the Sphinxtrain\bin\release folder is deleted by my anti virus software in my system. I am using "Norton Security with Backup" software. Any known issues? Kindly let me know.
Thanks.

Balaji.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-05
  
  I am not aware of such. What does your antivirus say in the logs, why does it delete this file?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Balaji - 2016-05-06

The message was

A program was behaving suspiciously on your computer.
This program was removed.

The log is attached herewith.

CMU bw LOG.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-06
  
  Ok, and what is the md5 sum of the file in question? What is the version of sphinxtrain you are trying? In latest version there is no bin/release/bw.exe, there are bin/release/x86/bw.exe and bin/release/x64/bw.exe
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Balaji - 2016-05-09

sphinxtrain - the latest version dated 23rd Feb 2016 creates 2 folders : Win32 and x64. Right? However, due to errors like "LPACK not available", I am using one previous version :

Name : sphinxtrain-5prealpha-win32.zip
Size : 5102 KB
Number of files : 772

If required, I will attached that zip file here.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-10
  
  I don't think this version is relevant, but thank you for information.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Balaji - 2016-11-10

I am working on improving the speech recognition accuracy with 2 sample voices. I'm following the tutorial "Adapting the default acoustic model".
A part of my transcriber.java is here (the configuration information):

Configuration configuration = new Configuration(); configuration.setAcousticModelPath("file:cmusphinx-en-us-5.2-adapt"); configuration.setDictionaryPath("file:AllSentences.dic"); configuration.setLanguageModelPath("file:AllSentences.lm");

The files AllSentences.dic and AllSentences.lm were created using the online lmTool. The source file for this contains the same statements twice, because, my speakers are 2 and their voices are recorded for the same sentences. I have attached 2 sets of sample voice files herewith.

Now, I see a great difference in the recognition accuracy between the 2. The observations.txt shows the difference. As an obvious fact, the second speaker's voice is a little different (I assume first one is normal). However, I am thinking by providing sample recordings for each of the statements, I am training the recognizer to understand both. Is this correct?

Kindly advise as to where should I improve the algorithm? Should I go back to the principles of changing the wav file itself, by some pre processing, cleaning, etc. Or, can I using Sphinx4 to achieve the goal.

Thank you.

Balaji.

Nancy01.wav

Nancy03.wav

Observations.txt

TNancy01.wav

TNancy03.wav
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-11-10
  
  It is not quite clear how do you perform the adaptation - amount of data, etc.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Balaji - 2016-11-17

This is what I did:

Step1. Created a small corpus of sentences (AllSentences.txt attached herewith).
Step2. The 2 speakers separately spoke the sentences and recorded.
Step3. Converted the voice files to wav with the following parameters:

bit resolution : 16 bit
sampling rate : 16000 Hz
audio channels : mono
PCM format : PCM signed 16-bit little-endian

Step4. The first speaker has more data recorded - read out all the statements in the text file and adaptation done as per the tutorial. Following is the sequence of programs executed:

sphinx_fe pocketsphinx_mdef_convert bw mllr_solve copy cmusphinx-en-us-5.2 cmusphinx-en-us-5.2-adapt map_adapt mllr_transform

If required, I will give the above with parameters. As they are working fine and I have earlier communicated the same in this forum, I am not repeating them. Hope this is okay.

Step5. I have attached the mllr matrix created also herewith (mllr_matrixCont). The result of this speaker's adaptation has been quite satisfactory.

Now the problem starts:
Step6. The second speaker could speak only a few sentences (line no.27 to 38 only in the text file) and the process is repeated as in step4 above. The results were given in my previous post.

My aim is to improve the recognition accuracy for the second speaker. How to do?
Thank you.

Last edit: Balaji 2016-11-17

AllSentences.txt

mllr_matrixCont
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-11-24
  
  Adapt with the large amount of sentences of the second speaker.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Balaji - 2016-11-18

I also want to understand the contents of the folder cmusphinx-en-us-5.2. This is called the "acoustic model" right? The files inside this folder - how are they created? What is the significance, etc. Can I get some reference material regarding this?

Thank you.

Last edit: Balaji 2016-11-18

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-11-24
  
  http://cmusphinx.sourceforge.net/wiki/acousticmodelformat
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to improve the recognition accuracy using mllr

Speech Recognition Toolkit

Forums

Help

How to improve the recognition accuracy using mllr document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How to improve the recognition accuracy using mllr