I am trying to train models in another language. Unfortunately, the training
and testing data is a bit poor. Nevertheless, I want to tune the system as
much as I can. Which parameters can I fiddle around with? Is there a procedure
for the tuning or is it mostly trial and error?
Also, can we model different phones differently? Model shorter phones with
less states and longer ones with more?
Any help is appreciated. Thanks :D.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By poor, I mean there is ambient noise and quite a few fillers which I'm sure
are heavily weighing down the system's ability to recognise accurately.
Needless to say, the test data is also of a similar nature :|. Some other
people are working on getting better data. In the meantime, I'm not aiming for
perfect, only for an increase in accuracy that perhaps I can implement once I
get the good data.
"" Is there a procedure for the tuning or is it mostly trial and error?"
Yes"
I'm assuming by this you meant there is a procedure. Where can I read up on
this? And also about writing phone tree questions?
What about trying to optimize values of wip, lw, uw and beams? Do they affect
the accuracy significantly?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By poor, I mean there is ambient noise and quite a few fillers which I'm
sure are heavily weighing down the system's ability to recognise accurately.
Needless to say, the test data is also of a similar nature :|. Some other
people are working on getting better data. In the meantime, I'm not aiming for
perfect, only for an increase in accuracy that perhaps I can implement once I
get the good data.
From my practice it's usually helpful to identify the source of the issues not
just try to tune the parameters. I don't believe that fillers make any issue
but if you have ambient noise it's way more helpful to look on the ways to
remove it than to optimize the number of senones. Senones can give you 2%
improvement over the wild guess. If you will fix the issue you have (not
necessary noise, it might be to tight decoding beam or bad language model) it
can give you way bigger improvement.
I'm assuming by this you meant there is a procedure. Where can I read up on
this? And also about writing phone tree questions?
Yes meant it's mostly trial and error. Basically you need to try all
reasonable values to select the ones which work best.
Feature extaction parameters for example is something to try first, then there
are different things.
What about trying to optimize values of wip, lw, uw and beams? Do they
affect the accuracy significantly?
Those are decoder optimization, not training. Yes, they can have some
significant effect. You can find optimization guide on wiki:
Overall I suggest you to share your accuracy results first just to check
everything is ok and not completely wrong in your setup. Database type,
vocabulary size and accuracy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey,
I am trying to train models in another language. Unfortunately, the training
and testing data is a bit poor. Nevertheless, I want to tune the system as
much as I can. Which parameters can I fiddle around with? Is there a procedure
for the tuning or is it mostly trial and error?
Also, can we model different phones differently? Model shorter phones with
less states and longer ones with more?
Any help is appreciated. Thanks :D.
What do you mean by poor
The perfect is the enemy of the good
During training - the number of senones, number of mixtures and maybe
phoneset. You also need to write phone tree questions manually.
Yes
No, you can't do that. There is no much need to do that either. For most cases
it's not helpful.
Cool, thanks a lot :).
By poor, I mean there is ambient noise and quite a few fillers which I'm sure
are heavily weighing down the system's ability to recognise accurately.
Needless to say, the test data is also of a similar nature :|. Some other
people are working on getting better data. In the meantime, I'm not aiming for
perfect, only for an increase in accuracy that perhaps I can implement once I
get the good data.
"" Is there a procedure for the tuning or is it mostly trial and error?"
Yes"
I'm assuming by this you meant there is a procedure. Where can I read up on
this? And also about writing phone tree questions?
What about trying to optimize values of wip, lw, uw and beams? Do they affect
the accuracy significantly?
From my practice it's usually helpful to identify the source of the issues not
just try to tune the parameters. I don't believe that fillers make any issue
but if you have ambient noise it's way more helpful to look on the ways to
remove it than to optimize the number of senones. Senones can give you 2%
improvement over the wild guess. If you will fix the issue you have (not
necessary noise, it might be to tight decoding beam or bad language model) it
can give you way bigger improvement.
Yes meant it's mostly trial and error. Basically you need to try all
reasonable values to select the ones which work best.
Feature extaction parameters for example is something to try first, then there
are different things.
Those are decoder optimization, not training. Yes, they can have some
significant effect. You can find optimization guide on wiki:
http://cmusphinx.sourceforge.net/wiki/decodertuning
There are some more uncovered topics though. There is also some interesting
ongoing research on this subject, see
http://www.isca-speech.org/archive/interspeech_2010/i10_1497.html
Overall I suggest you to share your accuracy results first just to check
everything is ok and not completely wrong in your setup. Database type,
vocabulary size and accuracy.