CMU Sphinx / Forums / Speech Recognition Theory: speaker adaptation

Madhav Kishore - 2010-01-27

I have implemented speaker adaptation (MAP in SPHINX 4) I get 2 to 3 % of
increase in accuracy ..
To get better results for speaker adaptation ,how the adaptation data should
be ? if it covers all the phones in the model will suffice .

Kindly suggest me the document for speaker adaptation
(the way it is implemented in SPHINX )

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-01-28

To get better results for speaker adaptation ,how the adaptation data should
be ?

The answer follows from the question. The data should be very good :)

If it covers all the phones in the model will suffice .

No

Kindly suggest me the document for speaker adaptation
(the way it is implemented in SPHINX )

Which adaptation exactly

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-01-28

Thanks for ur reply..

Which adaptation exactly

MAP , MLLR

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-01-31

MAP

"Speaker Adaptation Based on MAP Estimation of HMM Parameters", Chin-Hui Lee
and Jean-Luc Gauvain, Proceedings
of ICASSP 1993, p. II-558

MLLR

C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for
speaker adaptation of the parameters of continuous density hidden Markov
models,” Computer Speech andLanguage, vol. 9, pp. 171–185, 1995..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-02-05

thank u,

After following this

C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for
speaker adaptation of the parameters of continuous density hidden Markov
models,” Computer Speech andLanguage, vol. 9, pp. 171–185, 1995..

the adaptation performance varies according to the adapted data size and
regression classes

I would like to know how to input the regression classes ...
whether using cb2mllrfn in mllr_solve
if so how to generate regression class

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-02-05

Right now there is no way to generate regression classes automatically. You
have to do it manually or create a program that does it yourself.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-02-08

thank u,
Is it possible to use binary regression class tree generated using HTK
or

You have to do it manually or create a program that does it yourself.
Kindly suggest any algorithm to generate regression classes automatically that
better suits in sphinx..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-02-08

whether it is possible to generate regression classes using
mk_mllr_class

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-02-08

Is it possible to use binary regression class tree generated using HTK

Not directly, you need to convert between formats.

whether it is possible to generate regression classes using mk_mllr_class

This program is useless because it just converts mapping from text to binary
form. It was meant to build classes, but it can't do it right now.

Kindly suggest any algorithm to generate regression classes automatically
that better suits in sphinx..

http://www-speech.sri.com/papers/mandal_icslp06_clustered.ps.gz

www.iis.sinica.edu.tw/papers/whm/2644-F.pdf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-03-09

if I develop a program for automatic construction of regression class tree
(above said)
is it possible to integrate it with mllr_solve -cb2mllr
or what is the input file format for cb2mllrfn arg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-03-09

is it possible to integrate it with mllr_solve -cb2mllr

Yes

what is the input file format for cb2mllrfn arg

It's just an integer array that for every senone has it's class id. You can
find code to write and read such array in s3ts2cb_io.c in SphinxTrain.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-03-29

I would like to implement regression class tree ...

whether i need to program mk_mllr_class from the scratch....

kindly help me ,from where to start?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-03-29

whether i need to program mk_mllr_class from the scratch....

Yes, you need to implement it from scratch

kindly help me ,from where to start?

Start with writing function main :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-04-13

Ok.
what is the need of -nmap argument in mk_mllr_class ?

what is the input file format for cb2mllrfn arg

It's just an integer array that for every senone has it's class id. You can
find code to write and read such array in s3ts2cb_io.c in SphinxTrain.
how to input the regression tree classes generated by clustering method to
mllr_solve

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-13

There is no sense to ask "what is the need of the letter in the template".
This code is an unfinished template, it doesn't do anything. Look, you need to
write the program that reads some data, calculates the result and writes it
into a file. Writing into a file is the easiest part of it. First of all get
the mapping, it's conversion to required format is trivial. If you'll post the
program that builds the mapping, we'll help you with the rest.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

priya - 2010-05-06

when non trained person is adapted using MLLR ,there is increase in his
accuracy by 4 to 5 %,but when the trained person is adapted ,the accuracy for
him is decreased by 3 to 4 %.
Kindly say what will be the reason?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

priya - 2010-05-17

kindly say the difference between the above said regression tree and the
decision tree generated during training

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-08-02

I am trying to implement automatic regression class tree for MLLR (following
this www.iis.sinica.edu.tw/papers/whm/2644-F.pdf)

I have following doubts...I may be wrong, kindly correct me..

1.here, we are clustering the Mean file.My mean file contains 8 gaussians for
each senone.
is it enough to consider only 8th gaussian component of each senone for
clustering?.

2.while applying BIC ..
what is # M Number of parameters means?
by searching in google ...I got

M =d +0.5 d(d+1) where d is the dimensionality of mean vector (39)

is it correct

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-02

1.here, we are clustering the Mean file.My mean file contains 8 gaussians
for each senone. is it enough to consider only 8th gaussian component of each
senone for clustering?

No, just one component is certainly wrong. You need to assign each guassian to
regression class. In HTK (see htkbook) all gaussians are clustered separately,
i.e. gaussian 1-st could go to regression class 1 and second of the same
senone could go into different regression class 15. In CMUSphinx regression
classes are counted per senone so such detailed clustering will require some
additional work to extend the senone-to-transform mapping to mixture-to-
transform mapping.

Alternatively you can cluster 1-mixture model to get senone-to-transform
mapping. 8-gau model could just do the recognition.

2.while applying BIC .. what is # M Number of parameters means? by searching
in google ...I got #M =d +0.5 d(d+1) where d is the dimensionality of mean
vector (39) is it correct

M is a number of elements you store. M is 78 = 2 * 39 for 1-mixture GMM and
156 = 4 * 39 for 2-mixture GMM if you are using diagonal variance during BIC.
If you are using ful GMM, it d + 0.5 * d * (d+1) for 1-mixture and two times
more for 2-mixture.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-08-03

thanks a lot....

Inputs :
Mean file ( 1 mixture model) ,Diagonal variance gaussians
then

BIC( M , X ) = log p ( X | Θ ) − # ( M ) log n

X =mean file
n=39
M= clusters
log p ( X | Θ )= sum of log likelihood of senone

M =2*39

whether it is right?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhav Kishore - 2010-08-18

Assuming two clusters modeled by Gaussian model N(µ1,Σ1 ) and N( µ2,Σ2)
separately, and the sample size of these N1 and N2 .to calculate BIC to merge
the clusters , it is given as

BIC=(N1+N2) log |Σ | - N1 log |Σ 1| - N2 log |Σ 2| - no_of_parameters

how to find Σ ,the new variance of merged cluster...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-21

Same way as you find Σ1 and Σ2. You just take data set and get mean with EM
and calculate variance as sqrt(Σ(data-mean)^2).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

speaker adaptation

Speech Recognition Toolkit

Forums

Help

speaker adaptation

M =d +0.5 d(d+1) where d is the dimensionality of mean vector (39)

M =2*39

speaker adaptation

Speech Recognition Toolkit

Forums

Help

speaker adaptation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

M =d +0.5 d(d+1) where d is the dimensionality of mean vector (39)

M =2*39

speaker adaptation