I am looking into an adaptation study for a program that builds models in
Sphinx 3 and decodes with Sphinx 2. I've looked through the documentation, run
the tutorials, and examined the scripts in SphinxTrain. If I have understood
correctly, these are the current options:
Sphinx 2: VTLN, CMN, MAP
Sphinx 3: VTLN, CMN, MAP, LDA, MLLT, and a single global MLLR transform
Sphinx 4: as S2
I'd hoped there would be more out of the box for Sphinx 3, like full MLLR with
multiple classes built from the data, SAT, discriminative training. Any model-
space transforms could be applied directly to model means/vars, so thus should
be transparent to sphinx 2 decoding. In principle, so could LDA and other
feature xfms, by altering the input mfcc files to "fool" the Sphinx 2 decoder.
However, I have about 2 months to work out this aspect of the project, thus I
don't have the bandwidth to do a lot of implementation of the standard
techniques.
Could the forum please confirm that I have the list of options right? If so,
can anyone suggest a source (current/former student, a research group) of an
implementation of MLLR (with multiple classes built by regression tree) that
has been well debugged?
What is the long-term goal for SphinxTrain? Is there a plan to expand the
acoustic adaptation options?
Thanks for your help,
Fred
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
However, I have about 2 months to work out this aspect of the project, thus
I don't have the bandwidth to do a lot of
implementation of the standard techniques.
Could the forum please confirm that I have the list of options right?
Well, I don't think it's properly organized. First of all you need to split
adaptations on purely offline adaptations (MAP), mostly online adaptation
(MLLR), online-offline adaptatoins (VTLN). Offline part should be supported in
SphinxTrain, online in the decoder of your choice (pocketsphinx or sphinx4)
MLLR for example requires offline implementation for regression class tree
building (hard to implement in SphinxTrain) and online part (easy in
pocketsphinx) with probably runtime online adaptation (hard to implement in
pocketsphinx). It's probably better to discuss features one by one than all of
them as a whole.
If so, can anyone suggest a source (current/former student, a research
group) of an implementation of MLLR (with multiple classes built by regression
tree) that has been well debugged?
HTK?
What is the long-term goal for SphinxTrain? Is there a plan to expand the
acoustic adaptation options?
There are plans, but no schedule.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I meant that I don't have time now to add things to the codebase. You're
right, if it's to be done it should be in one of the actively developed
versions.
I'm also getting feedback from some other sources, so let me collect the
responses and repost here when I have more specific questions.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hm, my eyes see what I want not what is written ;)
Anyway, I hope MLLR + speaker classification will land sphinx4 and probably
pocketsphinx soon. As for discriminative trainig you probably want to visit
CMUSphinx workshop where will be nice talk on this topic
I am looking into an adaptation study for a program that builds models in
Sphinx 3 and decodes with Sphinx 2. I've looked through the documentation, run
the tutorials, and examined the scripts in SphinxTrain. If I have understood
correctly, these are the current options:
Sphinx 2: VTLN, CMN, MAP
Sphinx 3: VTLN, CMN, MAP, LDA, MLLT, and a single global MLLR transform
Sphinx 4: as S2
I'd hoped there would be more out of the box for Sphinx 3, like full MLLR with
multiple classes built from the data, SAT, discriminative training. Any model-
space transforms could be applied directly to model means/vars, so thus should
be transparent to sphinx 2 decoding. In principle, so could LDA and other
feature xfms, by altering the input mfcc files to "fool" the Sphinx 2 decoder.
However, I have about 2 months to work out this aspect of the project, thus I
don't have the bandwidth to do a lot of implementation of the standard
techniques.
Could the forum please confirm that I have the list of options right? If so,
can anyone suggest a source (current/former student, a research group) of an
implementation of MLLR (with multiple classes built by regression tree) that
has been well debugged?
What is the long-term goal for SphinxTrain? Is there a plan to expand the
acoustic adaptation options?
Thanks for your help,
Fred
Great!, that would be much appreciated I'm sure.
It would be nice to get this in pocketsphinx/sphinx4 instead of sphinx2. http
://cmusphinx.sourceforge.net/versions/
Well, I don't think it's properly organized. First of all you need to split
adaptations on purely offline adaptations (MAP), mostly online adaptation
(MLLR), online-offline adaptatoins (VTLN). Offline part should be supported in
SphinxTrain, online in the decoder of your choice (pocketsphinx or sphinx4)
MLLR for example requires offline implementation for regression class tree
building (hard to implement in SphinxTrain) and online part (easy in
pocketsphinx) with probably runtime online adaptation (hard to implement in
pocketsphinx). It's probably better to discuss features one by one than all of
them as a whole.
HTK?
There are plans, but no schedule.
Thanks, Nikolai.
I meant that I don't have time now to add things to the codebase. You're
right, if it's to be done it should be in one of the actively developed
versions.
I'm also getting feedback from some other sources, so let me collect the
responses and repost here when I have more specific questions.
Hm, my eyes see what I want not what is written ;)
Anyway, I hope MLLR + speaker classification will land sphinx4 and probably
pocketsphinx soon. As for discriminative trainig you probably want to visit
CMUSphinx workshop where will be nice talk on this topic
http://cmusphinx.sourceforge.net/2010/01/important-workshop-date-
change/
so, thinks will change quickly