Menu

How to improve the recognition accuracy using mllr

Help
Balaji
2016-04-18
2016-11-24
  • Balaji

    Balaji - 2016-04-18

    I am trying to implement Trascriber.java as given in the CMUSphinx tutorial. When the accuracy of the recognized words was low, I tried "Adapting existing acoustic model".

    I have executed the following steps as given in the tutorial "Adapting the default acoustic model":
    1. Creating an adaptation corpus
    2. Generating acoustic feature files (sphinx_fe)
    3. Converting the sendump and mdef files (pocketsphinx_mdef_convert)
    4. Accumulating observation counts (bw - continuous model)
    5. Creating transformation with MLLR (mllr_solve)
    The files I have used in this process are attached as "ForumQuery.zip".

    Later I have tried to test the adapted model before using it in java program, as per the tutorial http://cmusphinx.sourceforge.net/wiki/tutorialtuning Following are the steps:

    • Creating the test adaptation corpus - 4 wav files, test.Transcription, test.fileids and generated .dic and .lm using the lmtool.

    • Executed the test using

    pocketsphinx_batch -adcin yes -cepdir TestAdapt -cepext .wav -ctl TestAdapt\test.fileids \
    -lm en-us.lm -dict cmudict-en-us.dict -hmm cmusphinx-en-us-5.2 -hyp test.hyp
    
    • Ran the word_align.pl provided under sphinxtrain\scripts\decode. The output of which is
    author of  the danger TRAIL  philip STEELS  etc  (arctic_0001)
    author of  the danger TRAIL, philip STEELS, etc  (arctic_0001)
    Words: 8 Correct: 6 Errors: 2 Percent correct = 75.00% Error = 25.00% Accuracy = 75.00%
    Insertions: 0 Deletions: 0 Substitutions: 2
    i'm playing a   single hand in  what looks like a   losing game  (arctic_0011)
    i'm playing a   single hand in  what looks like a   losing game  (arctic_0011)
    Words: 12 Correct: 12 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
    Insertions: 0 Deletions: 0 Substitutions: 0
    a   combination of  canadian capital quickly organized and petitioned for the same privileges  (arctic_0024)
    a   combination of  canadian capital quickly organized and petitioned for the same privileges  (arctic_0024)
    Words: 13 Correct: 13 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
    Insertions: 0 Deletions: 0 Substitutions: 0
    i   was about to  do  this when cooler judgment prevailed ***  (arctic_0026)
    i   was about to  do  this when cooler judgment prevailed AND  (arctic_0026)
    Words: 10 Correct: 10 Errors: 1 Percent correct = 100.00% Error = 10.00% Accuracy = 90.00%
    Insertions: 1 Deletions: 0 Substitutions: 0
    TOTAL Words: 43 Correct: 41 Errors: 3
    TOTAL Percent correct = 95.35% Error = 6.98% Accuracy = 93.02%
    TOTAL Insertions: 1 Deletions: 0 Substitutions: 2
    
    • Then executed the java program TranscriberWithMLLR.java which uses
            recognizer.loadTransform("mllr_matrixCont",  1);
    

    to use the created mllr matrix (using continuous mode).

    The files I have used in this test process are attached as "ForumPost2.zip"

    Here is my question.
    In the ouput.txt attached, the text marked as "Actual" is the correct one and it is recognized with 5 errors by the java program. My aim is the java program only and the adaptation I have done is only an intermediate step. Still the error rate is very high. What should I do to improve the accuracy of recognition in the java program?

    Thanks.

     

    Last edit: Balaji 2016-04-18
    • Nickolay V. Shmyrev

      You can try to transform the model before loading in sphinx4 with mllr_transform binary from sphinxtrain first.

       
  • Balaji

    Balaji - 2016-04-19

    Thanks Nickolay. I ran mllr_transform using this command line:

    mllr_transform.exe -inmeanfn cmusphinx-en-us-5.2\means -varfn cmusphinx-en-us-5.2\variances -mllrmat mllr_matrixCont -outmeanfn cmusphinx-en-us-5.2-ADAPT\means
    

    And later changed the acoustic model path in my java program as

    configuration.setAcousticModelPath("file:cmusphinx-en-us-5.2-ADAPT");
    

    This has given me improvement in recognition accuracy in the range of 70% upto 100% for some of the test data. Will try to improve the lower value also. Btw, I am not able to include variances in mllr_transform execution. Does mllr_transform have impact only on means?

    Can you guide me to a tutorial which explains MLLR, the importance of the means, variances and the counts calculated in this procedure.
    Thank You very much.

     
    • Nickolay V. Shmyrev

      Can you guide me to a tutorial which explains MLLR, the importance of the means, variances and the counts calculated in this procedure.

      https://www.ee.columbia.edu/~dpwe/papers/Gales97-mllr.pdf

       
  • Balaji

    Balaji - 2016-05-04

    Hello Nickolay,

    A couple of times, the bw.exe in the Sphinxtrain\bin\release folder is deleted by my anti virus software in my system. I am using "Norton Security with Backup" software. Any known issues? Kindly let me know.
    Thanks.

    Balaji.

     
    • Nickolay V. Shmyrev

      I am not aware of such. What does your antivirus say in the logs, why does it delete this file?

       
  • Balaji

    Balaji - 2016-05-06

    The message was

    A program was behaving suspiciously on your computer.
    This program was removed.

    The log is attached herewith.

     
    • Nickolay V. Shmyrev

      Ok, and what is the md5 sum of the file in question? What is the version of sphinxtrain you are trying? In latest version there is no bin/release/bw.exe, there are bin/release/x86/bw.exe and bin/release/x64/bw.exe

       
  • Balaji

    Balaji - 2016-05-09

    sphinxtrain - the latest version dated 23rd Feb 2016 creates 2 folders : Win32 and x64. Right? However, due to errors like "LPACK not available", I am using one previous version :

    Name : sphinxtrain-5prealpha-win32.zip
    Size : 5102 KB
    Number of files : 772

    If required, I will attached that zip file here.

     
    • Nickolay V. Shmyrev

      I don't think this version is relevant, but thank you for information.

       
  • Balaji

    Balaji - 2016-11-10

    I am working on improving the speech recognition accuracy with 2 sample voices. I'm following the tutorial "Adapting the default acoustic model".
    A part of my transcriber.java is here (the configuration information):

            Configuration configuration = new Configuration();
            configuration.setAcousticModelPath("file:cmusphinx-en-us-5.2-adapt");
            configuration.setDictionaryPath("file:AllSentences.dic");
            configuration.setLanguageModelPath("file:AllSentences.lm");
    

    The files AllSentences.dic and AllSentences.lm were created using the online lmTool. The source file for this contains the same statements twice, because, my speakers are 2 and their voices are recorded for the same sentences. I have attached 2 sets of sample voice files herewith.

    Now, I see a great difference in the recognition accuracy between the 2. The observations.txt shows the difference. As an obvious fact, the second speaker's voice is a little different (I assume first one is normal). However, I am thinking by providing sample recordings for each of the statements, I am training the recognizer to understand both. Is this correct?

    Kindly advise as to where should I improve the algorithm? Should I go back to the principles of changing the wav file itself, by some pre processing, cleaning, etc. Or, can I using Sphinx4 to achieve the goal.

    Thank you.

    Balaji.

     
    • Nickolay V. Shmyrev

      It is not quite clear how do you perform the adaptation - amount of data, etc.

       
  • Balaji

    Balaji - 2016-11-17

    This is what I did:

    Step1. Created a small corpus of sentences (AllSentences.txt attached herewith).
    Step2. The 2 speakers separately spoke the sentences and recorded.
    Step3. Converted the voice files to wav with the following parameters:


    bit resolution : 16 bit
    sampling rate : 16000 Hz
    audio channels : mono
    PCM format : PCM signed 16-bit little-endian


    Step4. The first speaker has more data recorded - read out all the statements in the text file and adaptation done as per the tutorial. Following is the sequence of programs executed:

    sphinx_fe
    pocketsphinx_mdef_convert
    bw
    mllr_solve
    copy cmusphinx-en-us-5.2 cmusphinx-en-us-5.2-adapt
    map_adapt
    mllr_transform
    

    If required, I will give the above with parameters. As they are working fine and I have earlier communicated the same in this forum, I am not repeating them. Hope this is okay.

    Step5. I have attached the mllr matrix created also herewith (mllr_matrixCont). The result of this speaker's adaptation has been quite satisfactory.

    Now the problem starts:
    Step6. The second speaker could speak only a few sentences (line no.27 to 38 only in the text file) and the process is repeated as in step4 above. The results were given in my previous post.

    My aim is to improve the recognition accuracy for the second speaker. How to do?
    Thank you.

     

    Last edit: Balaji 2016-11-17
    • Nickolay V. Shmyrev

      Adapt with the large amount of sentences of the second speaker.

       
  • Balaji

    Balaji - 2016-11-18

    I also want to understand the contents of the folder cmusphinx-en-us-5.2. This is called the "acoustic model" right? The files inside this folder - how are they created? What is the significance, etc. Can I get some reference material regarding this?

    Thank you.

     

    Last edit: Balaji 2016-11-18

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.