Menu

Training not consistently improving accuracy

Help
Matt
2012-07-02
2012-09-22
  • Matt

    Matt - 2012-07-02

    Hi,

    I'm working on a project where we take small clips, use them to train the
    default acoustic model, and test the new resulting accuracy. What we want to
    show is that as the number of clips increases, the accuracy (as a whole) tends
    to increase over a period of time. So we train on 1 clip, then 2 clips, then
    3, etc., testing accuracy at each stage. Both the training and testing clips
    are from the same audio segment (a lecture from MIT OpenCourseWare). The
    problem is, after testing, some clips increase accuracy, some decrease it, and
    even when we train with the "good" clips, we get wildly different results. So
    my question is, firstly, is this claim feasible? Should we be getting steady
    increases in accuracy as the training corpus increases? Secondly, if this is
    feasible, are we doing anything wrong in the training/testing process?

    Here's the (mostly) complete data we are using for training/testing. It
    includes 29 example training clips, as well as a simple Java program that
    trains a model automatically based on the input. The testing file is
    "ELECtestN.wav".

    https://www.dropbox.com/sh/0l3ua164hoqg0cb/O26fIS8KPB

    Thank you for any help.

     
  • Matt

    Matt - 2012-07-02

    Typo: when I say "training" I mean "adapting." We are using the default WSJ 8k
    model and adapting that.

     
  • Nickolay V. Shmyrev

    The problem is, after testing, some clips increase accuracy, some decrease
    it, and even when we train with the "good" clips, we get wildly different
    results.

    These short chunks of audio are called utterances, not clips.

    So my question is, firstly, is this claim feasible? Should we be getting
    steady increases in accuracy as the training corpus increases?

    For map adaptation reasonable improvement should start with 5 minutes of
    adapation audio (way more than you have
    tried) and increase up to 20 hours of adaptation audio. The data you are using
    for adaptation is too small for MAP adaptation.

    You can try MLLR adaptation, but for that you need a continuous model.

    There is also an issue with your utterances. They MUST have a small period of
    silence (0.2s) on boundaries. You cut too much.

     
  • Matt

    Matt - 2012-07-03

    Thank you very much for your help. We have about 30 minutes of adaptation
    audio to work with in total. It seems the only issue now is to correctly
    segment that audio into utterances that can be used by the adaptation program.
    How would you suggest doing this? Is manually segmenting the only feasible
    way?

     
  • Matt

    Matt - 2012-07-17

    Okay, we have been able to use the long audio aligner to segment audio, but we
    still haven't been getting consistent improvement from adaptation. Here are
    the utterances, from 10 minutes of audio:

    https://www.dropbox.com/sh/hmqsij2195jdpck/4XxmSbk0LU

    There was also a 6-minute test clip for testing accuracy (from the same
    chemistry lecture). Do you think there is something wrong with these clips, or
    should the adaptation be working?

     
  • Nickolay V. Shmyrev

    There was also a 6-minute test clip for testing accuracy (from the same
    chemistry lecture). Do you think there is something wrong with these clips, or
    should the adaptation be working?

    Sorry since you didn't provide the information about your experiments, neither
    the data that was used for testing nor the exact decoder configuration it's
    hard to give you a detailed answer.

     
  • Matt

    Matt - 2012-07-23

    The test clip we are using is called "sschemTEST.wav" and is in the "testing"
    folder, with the accompanying transcription:

    https://www.dropbox.com/sh/hmqsij2195jdpck/4XxmSbk0LU

    This is decoded with pocketsphinx_continuous with the adapted hubwsj 8k model,
    the giga64k LM, and the cmu07a dictionary.

     
  • Nickolay V. Shmyrev

    And what is the WER before and after adaptation? How do you calculate it?

     
  • Matt

    Matt - 2012-08-02

    I used the word_align.pl script to calculate WER. It started off at about 45%
    and was around 44% by the end of adaptation, but along the way, it went as low
    as 39% and as high as 55%.

     
  • Nickolay V. Shmyrev

    I used the word_align.pl script to calculate WER. It started off at about
    45% and was around 44% by the end of adaptation, but along the way, it went as
    low as 39% and as high as 55%.

    Sorry, it's not clear what is "along the way". Are you doing something during
    adaptation not described in the tutorial? There is nothing about "along the
    way" in the tutorial. Let me repeat:

    Since you didn't provide the information about your experiments, neither
    the data that was used for testing and adaptation including all required
    files
    nor the exact decoder configuration and the command line it's hard
    to give you a detailed answer.

     

Log in to post a comment.