Menu

Acoustic adaptation options by version

Help
fweber
2010-02-20
2012-09-22
  • fweber

    fweber - 2010-02-20

    I am looking into an adaptation study for a program that builds models in
    Sphinx 3 and decodes with Sphinx 2. I've looked through the documentation, run
    the tutorials, and examined the scripts in SphinxTrain. If I have understood
    correctly, these are the current options:

    Sphinx 2: VTLN, CMN, MAP
    Sphinx 3: VTLN, CMN, MAP, LDA, MLLT, and a single global MLLR transform
    Sphinx 4: as S2

    I'd hoped there would be more out of the box for Sphinx 3, like full MLLR with
    multiple classes built from the data, SAT, discriminative training. Any model-
    space transforms could be applied directly to model means/vars, so thus should
    be transparent to sphinx 2 decoding. In principle, so could LDA and other
    feature xfms, by altering the input mfcc files to "fool" the Sphinx 2 decoder.

    However, I have about 2 months to work out this aspect of the project, thus I
    don't have the bandwidth to do a lot of implementation of the standard
    techniques.

    Could the forum please confirm that I have the list of options right? If so,
    can anyone suggest a source (current/former student, a research group) of an
    implementation of MLLR (with multiple classes built by regression tree) that
    has been well debugged?

    What is the long-term goal for SphinxTrain? Is there a plan to expand the
    acoustic adaptation options?

    Thanks for your help,
    Fred

     
  • Nickolay V. Shmyrev

    However, I have about 2 months to work out this aspect of the project, thus
    I don't have the bandwidth to do a lot of
    implementation of the standard techniques.

    Great!, that would be much appreciated I'm sure.

    It would be nice to get this in pocketsphinx/sphinx4 instead of sphinx2. http
    ://cmusphinx.sourceforge.net/versions/

    Could the forum please confirm that I have the list of options right?

    Well, I don't think it's properly organized. First of all you need to split
    adaptations on purely offline adaptations (MAP), mostly online adaptation
    (MLLR), online-offline adaptatoins (VTLN). Offline part should be supported in
    SphinxTrain, online in the decoder of your choice (pocketsphinx or sphinx4)

    MLLR for example requires offline implementation for regression class tree
    building (hard to implement in SphinxTrain) and online part (easy in
    pocketsphinx) with probably runtime online adaptation (hard to implement in
    pocketsphinx). It's probably better to discuss features one by one than all of
    them as a whole.

    If so, can anyone suggest a source (current/former student, a research
    group) of an implementation of MLLR (with multiple classes built by regression
    tree) that has been well debugged?

    HTK?

    What is the long-term goal for SphinxTrain? Is there a plan to expand the
    acoustic adaptation options?

    There are plans, but no schedule.

     
  • fweber

    fweber - 2010-02-23

    Thanks, Nikolai.

    I meant that I don't have time now to add things to the codebase. You're
    right, if it's to be done it should be in one of the actively developed
    versions.

    I'm also getting feedback from some other sources, so let me collect the
    responses and repost here when I have more specific questions.

     
  • Nickolay V. Shmyrev

    Hm, my eyes see what I want not what is written ;)

    Anyway, I hope MLLR + speaker classification will land sphinx4 and probably
    pocketsphinx soon. As for discriminative trainig you probably want to visit
    CMUSphinx workshop where will be nice talk on this topic

    http://cmusphinx.sourceforge.net/2010/01/important-workshop-date-
    change/

    so, thinks will change quickly

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.