When I use kaldi to do gmm online decoding, I found
I use asus notebook to record, the accuracy is low
I use lenovo notebook to record, the accuracy is high
We can see the waveform and spectrum here http://i.imgur.com/Cviw03T.png
I found the high frequency part of asus nb spectrum is depressed
My question are
1.How to handle different devices with different background noise and spectrum depressed problem?
What is the solution? Is the solution we should collect speech data of different devices to train acoustic model ?
2.Does the amplitude high or low impact the accuracy? Should I do amplitude or frequency normalization during getting feature extraction?
or CMVN already done there normalization ???
Last edit: gary 2015-06-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When I use kaldi to do gmm online decoding, I found
I use asus notebook to record, the accuracy is low
I use lenovo notebook to record, the accuracy is high
We can see the waveform and spectrum here http://i.imgur.com/Cviw03T.png
I found the high frequency part of asus nb spectrum is depressed
My question are
1.How to handle different devices with different background noise and
spectrum depressed problem?
What is the solution? Is the solution we should collect speech data of
different devices to train acoustic model ?
Yes, it's probably necessary to either add data from different
devices, or simulate it somehow.
2.Is the amplitude high or low impact the accuracy , should I do
normalization before getting feature extraction?
The amplitude affects the accuracy for online-nnet2 models (and
online-gmm models) but not for other models. We are trying to do
volume-perturbation in online-nnet2 training to make the trained
models more robust to varying accuracy in future, but the models on
kaldi-asr.org mostly don't have this yet.
Online normalization is hard, and it might not always interact well
with the decoding if you do it online because the factor you multiply
by will change as you go, but you might want to make sure your signals
are at least roughly in the right range.
That paper seems to be addressing a different problem- namely, how to
make use of narrowband speech while training a wideband system - and
they did not really try to do any data simulation. The kind of thing
I had in mind is to put the data through a linear filter and then add
some kind of noise.
Dan
hi all
When I use kaldi to do gmm online decoding, I found
I use asus notebook to record, the accuracy is low
I use lenovo notebook to record, the accuracy is high
We can see the waveform and spectrum here
http://i.imgur.com/Cviw03T.png
I found the high frequency part of asus nb spectrum is depressed
My question are
1.How to handle different devices with different background noise and spectrum depressed problem?
What is the solution? Is the solution we should collect speech data of different devices to train acoustic model ?
2.Does the amplitude high or low impact the accuracy? Should I do amplitude or frequency normalization during getting feature extraction?
or CMVN already done there normalization ???
Last edit: gary 2015-06-25
Yes, it's probably necessary to either add data from different
devices, or simulate it somehow.
The amplitude affects the accuracy for online-nnet2 models (and
online-gmm models) but not for other models. We are trying to do
volume-perturbation in online-nnet2 training to make the trained
models more robust to varying accuracy in future, but the models on
kaldi-asr.org mostly don't have this yet.
Online normalization is hard, and it might not always interact well
with the decoding if you do it online because the factor you multiply
by will change as you go, but you might want to make sure your signals
are at least roughly in the right range.
Dan
hi Dan
Thanks for your quick reply.
Do you think the below paper can solve the problem of spectrum depressed problem?
improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
http://research.microsoft.com/apps/pubs/default.aspx?id=179159
Thank you.
That paper seems to be addressing a different problem- namely, how to
make use of narrowband speech while training a wideband system - and
they did not really try to do any data simulation. The kind of thing
I had in mind is to put the data through a linear filter and then add
some kind of noise.
Dan
On Fri, Jun 26, 2015 at 3:35 AM, gary gary2015@users.sf.net wrote: