I almost know the answer to my question (namely no), but I nevertheless wanted
to make sure I'm not missing something.
Is there a way to get Sphinx 3 and SphinxTrain to train and decode using
arbitrary custom features instead of features derived from the MFC
coefficients? I would like to "abuse" CMU Sphinx to train and decode based on
a feature set which is extracted from the sound files by my own routine. What
I have in mind is something like HTK's ability to cope with any user-defined
features, as long as they come in the HTK feature file format.
Thanks for your advice!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, what I don't understand is why I need to tell a bunch of routines
(init_gau, norm, bw, sphinx3_decode) things like the number of cepstral coeffs
(-ceplen) again, after I already passed them to wave2feat to extract the base
features (when MFCC as features)? Is this just for factorizing the full number
of features into number of vectors times number of features per vector? What
will I have to set for -ceplen if I use any number of custom features, and
what influence will -agc and -cmn (for sphinx3_decode for instance) have on
them? Or, in general, what parameters need to be set to make the sphinx
modules understand that they should use my custom features and not
modify/transform them further, as they perhaps would do when the default MFC
coeffs were used? Can I just set -feat 1s_c (instead of 1s_c_d_dd for example)
and that's it?
As you can see, I am confused, but full of hope. :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, what I don't understand is why I need to tell a bunch of routines
(init_gau, norm, bw, sphinx3_decode) things like the number of cepstral coeffs
(-ceplen) again, after I already passed them to wave2feat to extract the base
features (when MFCC as features)
You don't need that, you can just point new feature vector length in config
file
Is this just for factorizing the full number of features into number of
vectors times number of features per vector?
Sphinx format for feature files doesn't hold the number of vectors, only total
length of the data
What will I have to set for -ceplen if I use any number of custom features
ceplen should be the lenght of your feature vector
influence will -agc and -cmn (for sphinx3_decode for instance) have on them?
agc is disabled by default. You can disable CMN if needed in config file.
You don't need that, you can just point new feature vector length in config
file
I don't use the config files, and neither the provided perl wrapper scripts
(for various reasons). So my case is a bit trickier here.
Sphinx format for feature files doesn't hold the number of vectors, only
total length of the data
I know that the sphinx feature file format only stores the number of features
in total. Thats why I'm asking. How do the training tools know how to
calculate delta and delta-delta coeffs, if e.g. 1s_c_d_dd is set? They will
have to know how the total number of features factorizes into number of
vectors and number of features per vector, because the deltas are calculated
for each vector separately, if I get that right. Imagine there are 60 features
in a feature file. Are these 10 vectors with 6 cepstral coeffs each, or 5
vectors with 12 coeffs, or ...? That's not clear. This is the only thing that
comes to my mind as reason why sphinx3_decode (among others) needs the -ceplen
parameter. But when using custom features, there is no such thing as cepstral
coefficients. Do you get my point? When dropping the -ceplen option, the
(wrong) default value will be used and e.g. sphinx3_decode will complain:
FATAL_ERROR: "kbcore.c", line 633: Feature streamlen(39) != mgau streamlen(60)
ceplen should be the lenght of your feature vector
But that doesn't seem to be true: I'm using feature files with 520 features
for instance, and ceplen (based on online documentation) specifies the number
of one base cepstral vector, which I MUST set to 20 for sphinx3_decode to work
in this case. That yields 26 base vectors. But what if I wanted to use 521
features?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
HMMs need a time-series of feature vectors (26 in my example above). The
minimum number of them is 9, according to bw/main.c:
if (n_frame < 9) {
E_WARN("utt %s too short\n", corpus_utt());
...
}
So, this means that when using custom features, they must come as a time-
series of length >= 9. Is that correct? So, I can't train/decode with 521
arbitrary features, but I must use at least 9 vectors with any number of
coefficients?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One last thing, though: What's the motive behind inhibiting frame numbers
(numbers of vectors) smaller than 9? Why 9? Could I just remove that
restricting ckeck (n_frame < 9) in the code and happily work with less, or
would that compromise the power of the underlying HMMs?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
I almost know the answer to my question (namely no), but I nevertheless wanted
to make sure I'm not missing something.
Is there a way to get Sphinx 3 and SphinxTrain to train and decode using
arbitrary custom features instead of features derived from the MFC
coefficients? I would like to "abuse" CMU Sphinx to train and decode based on
a feature set which is extracted from the sound files by my own routine. What
I have in mind is something like HTK's ability to cope with any user-defined
features, as long as they come in the HTK feature file format.
Thanks for your advice!
There is no problem to do that. You need to convert your custom features to
CMUSphinx format.
Great! Wow, I'm glad I asked... :-)
Well, what I don't understand is why I need to tell a bunch of routines
(init_gau, norm, bw, sphinx3_decode) things like the number of cepstral coeffs
(-ceplen) again, after I already passed them to wave2feat to extract the base
features (when MFCC as features)? Is this just for factorizing the full number
of features into number of vectors times number of features per vector? What
will I have to set for -ceplen if I use any number of custom features, and
what influence will -agc and -cmn (for sphinx3_decode for instance) have on
them? Or, in general, what parameters need to be set to make the sphinx
modules understand that they should use my custom features and not
modify/transform them further, as they perhaps would do when the default MFC
coeffs were used? Can I just set -feat 1s_c (instead of 1s_c_d_dd for example)
and that's it?
As you can see, I am confused, but full of hope. :-)
You don't need that, you can just point new feature vector length in config
file
Sphinx format for feature files doesn't hold the number of vectors, only total
length of the data
ceplen should be the lenght of your feature vector
agc is disabled by default. You can disable CMN if needed in config file.
Should work
I don't use the config files, and neither the provided perl wrapper scripts
(for various reasons). So my case is a bit trickier here.
I know that the sphinx feature file format only stores the number of features
in total. Thats why I'm asking. How do the training tools know how to
calculate delta and delta-delta coeffs, if e.g. 1s_c_d_dd is set? They will
have to know how the total number of features factorizes into number of
vectors and number of features per vector, because the deltas are calculated
for each vector separately, if I get that right. Imagine there are 60 features
in a feature file. Are these 10 vectors with 6 cepstral coeffs each, or 5
vectors with 12 coeffs, or ...? That's not clear. This is the only thing that
comes to my mind as reason why sphinx3_decode (among others) needs the -ceplen
parameter. But when using custom features, there is no such thing as cepstral
coefficients. Do you get my point? When dropping the -ceplen option, the
(wrong) default value will be used and e.g. sphinx3_decode will complain:
FATAL_ERROR: "kbcore.c", line 633: Feature streamlen(39) != mgau streamlen(60)
But that doesn't seem to be true: I'm using feature files with 520 features
for instance, and ceplen (based on online documentation) specifies the number
of one base cepstral vector, which I MUST set to 20 for sphinx3_decode to work
in this case. That yields 26 base vectors. But what if I wanted to use 521
features?
Hmmmm, wait a second, I think I got it:
HMMs need a time-series of feature vectors (26 in my example above). The
minimum number of them is 9, according to bw/main.c:
So, this means that when using custom features, they must come as a time-
series of length >= 9. Is that correct? So, I can't train/decode with 521
arbitrary features, but I must use at least 9 vectors with any number of
coefficients?
exactly
OK, I see. Then all of the above makes sense.
One last thing, though: What's the motive behind inhibiting frame numbers
(numbers of vectors) smaller than 9? Why 9? Could I just remove that
restricting ckeck (n_frame < 9) in the code and happily work with less, or
would that compromise the power of the underlying HMMs?
It's just a heuristic, you can remove it.
Alright, thank you for answering.