I have implemented speaker adaptation (MAP in SPHINX 4) I get 2 to 3 % of
increase in accuracy ..
To get better results for speaker adaptation ,how the adaptation data should
be ? if it covers all the phones in the model will suffice .
Kindly suggest me the document for speaker adaptation
(the way it is implemented in SPHINX )
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
"Speaker Adaptation Based on MAP Estimation of HMM Parameters", Chin-Hui Lee
and Jean-Luc Gauvain, Proceedings
of ICASSP 1993, p. II-558
MLLR
C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for
speaker adaptation of the parameters of continuous density hidden Markov
models,” Computer Speech andLanguage, vol. 9, pp. 171–185, 1995..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for
speaker adaptation of the parameters of continuous density hidden Markov
models,” Computer Speech andLanguage, vol. 9, pp. 171–185, 1995..
the adaptation performance varies according to the adapted data size and
regression classes
I would like to know how to input the regression classes ...
whether using cb2mllrfn in mllr_solve
if so how to generate regression class
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thank u,
Is it possible to use binary regression class tree generated using HTK
or
You have to do it manually or create a program that does it yourself.
Kindly suggest any algorithm to generate regression classes automatically that
better suits in sphinx..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
if I develop a program for automatic construction of regression class tree
(above said)
is it possible to integrate it with mllr_solve -cb2mllr
or what is the input file format for cb2mllrfn arg
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok.
what is the need of -nmap argument in mk_mllr_class ?
what is the input file format for cb2mllrfn arg
It's just an integer array that for every senone has it's class id. You can
find code to write and read such array in s3ts2cb_io.c in SphinxTrain.
how to input the regression tree classes generated by clustering method to
mllr_solve
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is no sense to ask "what is the need of the letter in the template".
This code is an unfinished template, it doesn't do anything. Look, you need to
write the program that reads some data, calculates the result and writes it
into a file. Writing into a file is the easiest part of it. First of all get
the mapping, it's conversion to required format is trivial. If you'll post the
program that builds the mapping, we'll help you with the rest.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
when non trained person is adapted using MLLR ,there is increase in his
accuracy by 4 to 5 %,but when the trained person is adapted ,the accuracy for
him is decreased by 3 to 4 %.
Kindly say what will be the reason?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to implement automatic regression class tree for MLLR (following
this www.iis.sinica.edu.tw/papers/whm/2644-F.pdf)
I have following doubts...I may be wrong, kindly correct me..
1.here, we are clustering the Mean file.My mean file contains 8 gaussians for
each senone.
is it enough to consider only 8th gaussian component of each senone for
clustering?.
2.while applying BIC ..
what is # M Number of parameters means?
by searching in google ...I got
M =d +0.5 d(d+1) where d is the dimensionality of mean vector (39)
is it correct
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1.here, we are clustering the Mean file.My mean file contains 8 gaussians
for each senone. is it enough to consider only 8th gaussian component of each
senone for clustering?
No, just one component is certainly wrong. You need to assign each guassian to
regression class. In HTK (see htkbook) all gaussians are clustered separately,
i.e. gaussian 1-st could go to regression class 1 and second of the same
senone could go into different regression class 15. In CMUSphinx regression
classes are counted per senone so such detailed clustering will require some
additional work to extend the senone-to-transform mapping to mixture-to-
transform mapping.
Alternatively you can cluster 1-mixture model to get senone-to-transform
mapping. 8-gau model could just do the recognition.
2.while applying BIC .. what is # M Number of parameters means? by searching
in google ...I got #M =d +0.5 d(d+1) where d is the dimensionality of mean
vector (39) is it correct
M is a number of elements you store. M is 78 = 2 * 39 for 1-mixture GMM and
156 = 4 * 39 for 2-mixture GMM if you are using diagonal variance during BIC.
If you are using ful GMM, it d + 0.5 * d * (d+1) for 1-mixture and two times
more for 2-mixture.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Assuming two clusters modeled by Gaussian model N(µ1,Σ1 ) and N( µ2,Σ2)
separately, and the sample size of these N1 and N2 .to calculate BIC to merge
the clusters , it is given as
I have implemented speaker adaptation (MAP in SPHINX 4) I get 2 to 3 % of
increase in accuracy ..
To get better results for speaker adaptation ,how the adaptation data should
be ? if it covers all the phones in the model will suffice .
Kindly suggest me the document for speaker adaptation
(the way it is implemented in SPHINX )
The answer follows from the question. The data should be very good :)
No
Which adaptation exactly
Thanks for ur reply..
MAP , MLLR
"Speaker Adaptation Based on MAP Estimation of HMM Parameters", Chin-Hui Lee
and Jean-Luc Gauvain, Proceedings
of ICASSP 1993, p. II-558
C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for
speaker adaptation of the parameters of continuous density hidden Markov
models,” Computer Speech andLanguage, vol. 9, pp. 171–185, 1995..
thank u,
After following this
C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for
speaker adaptation of the parameters of continuous density hidden Markov
models,” Computer Speech andLanguage, vol. 9, pp. 171–185, 1995..
the adaptation performance varies according to the adapted data size and
regression classes
I would like to know how to input the regression classes ...
whether using cb2mllrfn in mllr_solve
if so how to generate regression class
Right now there is no way to generate regression classes automatically. You
have to do it manually or create a program that does it yourself.
thank u,
Is it possible to use binary regression class tree generated using HTK
or
whether it is possible to generate regression classes using
mk_mllr_class
Not directly, you need to convert between formats.
This program is useless because it just converts mapping from text to binary
form. It was meant to build classes, but it can't do it right now.
http://www-speech.sri.com/papers/mandal_icslp06_clustered.ps.gz
www.iis.sinica.edu.tw/papers/whm/2644-F.pdf
if I develop a program for automatic construction of regression class tree
(above said)
is it possible to integrate it with mllr_solve -cb2mllr
or what is the input file format for cb2mllrfn arg
Yes
It's just an integer array that for every senone has it's class id. You can
find code to write and read such array in s3ts2cb_io.c in SphinxTrain.
I would like to implement regression class tree ...
whether i need to program mk_mllr_class from the scratch....
kindly help me ,from where to start?
Yes, you need to implement it from scratch
Start with writing function main :)
Ok.
what is the need of -nmap argument in mk_mllr_class ?
There is no sense to ask "what is the need of the letter in the template".
This code is an unfinished template, it doesn't do anything. Look, you need to
write the program that reads some data, calculates the result and writes it
into a file. Writing into a file is the easiest part of it. First of all get
the mapping, it's conversion to required format is trivial. If you'll post the
program that builds the mapping, we'll help you with the rest.
when non trained person is adapted using MLLR ,there is increase in his
accuracy by 4 to 5 %,but when the trained person is adapted ,the accuracy for
him is decreased by 3 to 4 %.
Kindly say what will be the reason?
kindly say the difference between the above said regression tree and the
decision tree generated during training
I am trying to implement automatic regression class tree for MLLR (following
this www.iis.sinica.edu.tw/papers/whm/2644-F.pdf)
I have following doubts...I may be wrong, kindly correct me..
1.here, we are clustering the Mean file.My mean file contains 8 gaussians for
each senone.
is it enough to consider only 8th gaussian component of each senone for
clustering?.
2.while applying BIC ..
what is # M Number of parameters means?
by searching in google ...I got
M =d +0.5 d(d+1) where d is the dimensionality of mean vector (39)
is it correct
No, just one component is certainly wrong. You need to assign each guassian to
regression class. In HTK (see htkbook) all gaussians are clustered separately,
i.e. gaussian 1-st could go to regression class 1 and second of the same
senone could go into different regression class 15. In CMUSphinx regression
classes are counted per senone so such detailed clustering will require some
additional work to extend the senone-to-transform mapping to mixture-to-
transform mapping.
Alternatively you can cluster 1-mixture model to get senone-to-transform
mapping. 8-gau model could just do the recognition.
M is a number of elements you store. M is 78 = 2 * 39 for 1-mixture GMM and
156 = 4 * 39 for 2-mixture GMM if you are using diagonal variance during BIC.
If you are using ful GMM, it d + 0.5 * d * (d+1) for 1-mixture and two times
more for 2-mixture.
thanks a lot....
Inputs :
Mean file ( 1 mixture model) ,Diagonal variance gaussians
then
BIC( M , X ) = log p ( X | Θ ) − # ( M ) log n
X =mean file
n=39
M= clusters
log p ( X | Θ )= sum of log likelihood of senone
M =2*39
whether it is right?
Assuming two clusters modeled by Gaussian model N(µ1,Σ1 ) and N( µ2,Σ2)
separately, and the sample size of these N1 and N2 .to calculate BIC to merge
the clusters , it is given as
BIC=(N1+N2) log |Σ | - N1 log |Σ 1| - N2 log |Σ 2| - no_of_parameters
how to find Σ ,the new variance of merged cluster...
Same way as you find Σ1 and Σ2. You just take data set and get mean with EM
and calculate variance as sqrt(Σ(data-mean)^2).