It is mentioned some of the Thumb rule figures for setting senones ..
My training set contains 200 sentences (~1 hour data) for 15 speakers
so ,Amount of training data for setting SENONES should be 1 hour or 15
hours....
2.If I create a model for Command and Control Application ,is there need for
composite triphones..How these composite triphones are trained if it is not in
the transcript of training ...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
,Amount of training data for setting number of senones should be 1 hour or
15 hours..
Total amount of your training data is 15 hours. So you should choose number of
senones for 15 hours. 4000 would be a good guess
If I create a model for Command and Control Application ,is there need for
composite triphones..How these composite triphones are trained if it is not in
the transcript of training ...
It's not clear what composite triphones are you asking about. My suggestion is
if you don't know how to train such "composite triphones" don't train them.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am confused with senones...I think senone is a sub phonetic unit .
if that is the case,why senones for 1 hour and 15 hour(same training data so
same phonetic sentences )is different ?
2.from dictionary to triphones.c file
\brief Building triphones for a dictionary.
This is one of the more complicated parts of a cross-word
triphone model decoder. The first and last phones of each word
get their left and right contexts, respectively, from other
words. For single-phone words, both its contexts are from other
words, simultaneously. As these words are not known beforehand,
life gets complicated. In this implementation, when we do not
wish to distinguish between distinct contexts, we use a COMPOSITE
triphone (a bit like BBN's fast-match implementation), by
clubbing together all possible contexts
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am confused with senones...I think senone is a sub phonetic unit .
if that is the case,why senones for 1 hour and 15 hour(same training data so
same phonetic sentences )is different ?
No, they aren't. Senone as triphone is just a collection of probablistic
models to match specific phone in specific context. They are different from
phones or diphones which correspond to actual audio chunk. The amount of
context in your 15 hours recording is enough to train 4000 senones. Even if
there are same phonetic content for different speakers, the amount of contexts
in 1 hour is enough. The different situation will be of course if you have
1000 recordings of 1 minutes long reading the same small sentence. Then number
of contexts to train will be way smaller.
2.from dictionary to triphones.c file
\brief Building triphones for a dictionary.
This is one of the more complicated parts of a cross-word
triphone model decoder. The first and last phones of each word
get their left and right contexts, respectively, from other
words. For single-phone words, both its contexts are from other
words, simultaneously. As these words are not known beforehand,
life gets complicated. In this implementation, when we do not
wish to distinguish between distinct contexts, we use a COMPOSITE
triphone (a bit like BBN's fast-match implementation), by
clubbing together all possible contexts
Those composite senones are internals of sphinx3 large vocabulary decoding
used to optimize speed on word boundaries where most lextree expansion
happens. You can read about lextree in ASR textbook if you are interested, but
such composite senones arent' visible to the user and you shouldn't care about
them.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The amount of context in your 15 hours recording is enough to train 4000
senones. Even if there are same phonetic content for different speakers, the
amount of contexts in 1 hour is enough
then you are suggesting me to use 4000 senones...
but such composite senones arent' visible to the user and you shouldn't care
about them.
I see such a composite triphones in my MDEF file(default training
settings),whether it will affect my system (Command and control
app)performance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No, you don't see them. Model definition file lists known triphones and
senone sequencies for them. You have some issues with terminology it seems.
since my mdef file is huge file , I am pasting few lines
a SIL v b n/a 2 390 626 682 N
a SIL y b n/a 2 393 629 767 N
a SIL yy b n/a 2 393 629 767 N
a SIL z b n/a 2 393 629 729 N a a dd b n/a 2 360 485 797 N
a a h b n/a 2 350 485 838 N
a a j b n/a 2 360 485 797 N
a a k b n/a 2 360 503 777 N
a a l b n/a 2 353 502 685 N
a a m b n/a 2 360 485 667 N
a a n b n/a 2 360 554 667 N
a a n' b n/a 2 361 514 753 N
a a n1 b n/a 2 353 502 685 N
a a ng' b n/a 2 353 502 685 N
a a ng'ng' b n/a 2 353 502 685 N
a a nj' b n/a 2 353 502 685 N
these triphones (Bold ) are not in my training dictionary...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Triphones are taken from the transcription of the training prompts, not from
the dictionary. All triphones above are present in your prompts.
I checked with dict2tri exe which generates triphones from dictionary.It
generates between word triphones (with default option
-btwtri yes Compute between-word triphone set )it lists all ( 10742)triphones which are seen in the MDEF file.
They aren't trained. First of all on untied stage they will be just ignored.
Later on cd stage when states will be tied, they will have same senone
sequence as word-internal triphones. And this tied senone-sequence like
A I CH e n/a 0 137 227 255 N
A I CH i n/a 0 137 227 255 N
will be trained from word-internal material. If there will be no word-internal
material as well you'll get a warning:
if (wt_var_ < 0) {
_
E_ERROR("Variance (mgau= %u, feat= %u, "
"density=%u, component=%u) is less then 0. "
"Most probably the number of senones is "
"too high for such a small training "
"database. Use smaller $CFG_N_TIED_STATES.\n",
_
in norm log on stage 50._
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think
1. even if there is no word internal triphone,it is tied with other similar triphones...
2.then,the Maximum number of senones will be 3*number of triphones listed by
dictotri.exe
3.whether it is possible to train only the internal word triphones and reduce
the number of senones (for my command and control application to increase the
speed and accuracy)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
whether it is possible to train only the internal word triphones and reduce
the number of senones (for my command and control application to increase the
speed and accuracy)
Did you try to change the script to run dict2tri with -btwtri no?
Anyway, I think you have way more effective way to reduce amount of senones -
N_TIED_STATES configuration in sphinx_train.cfg. Why don't you want to set it
properly and this way get the amount of senones you want. I think if cross-
word triphones will not be in training transcription, model will not have
separate senones for them. Even more, they will not be considered in decoder
if your grammar doesn't have self-loops.
If you have only limited amount of word-internal triphones, set the number of
states so that only they will be in final model. Yes, documentation doesn't
consider that in detail, we'll update it accordingly to explain this.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Did you try to change the script to run dict2tri with -btwtri no?
I tried my level,I could't point out where the dic2tri is called...
(when I removed dic2tri from bin folder of training ,still it is working
.....)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have some doubts on training
1.In sphinx-3FAQ
http://www.speech.cs.cmu.edu/sphinxman/FAQ.html
It is mentioned some of the Thumb rule figures for setting senones ..
My training set contains 200 sentences (~1 hour data) for 15 speakers
so ,Amount of training data for setting SENONES should be 1 hour or 15
hours....
2.If I create a model for Command and Control Application ,is there need for
composite triphones..How these composite triphones are trained if it is not in
the transcript of training ...
Total amount of your training data is 15 hours. So you should choose number of
senones for 15 hours. 4000 would be a good guess
It's not clear what composite triphones are you asking about. My suggestion is
if you don't know how to train such "composite triphones" don't train them.
if that is the case,why senones for 1 hour and 15 hour(same training data so
same phonetic sentences )is different ?
2.from dictionary to triphones.c file
No, they aren't. Senone as triphone is just a collection of probablistic
models to match specific phone in specific context. They are different from
phones or diphones which correspond to actual audio chunk. The amount of
context in your 15 hours recording is enough to train 4000 senones. Even if
there are same phonetic content for different speakers, the amount of contexts
in 1 hour is enough. The different situation will be of course if you have
1000 recordings of 1 minutes long reading the same small sentence. Then number
of contexts to train will be way smaller.
Those composite senones are internals of sphinx3 large vocabulary decoding
used to optimize speed on word boundaries where most lextree expansion
happens. You can read about lextree in ASR textbook if you are interested, but
such composite senones arent' visible to the user and you shouldn't care about
them.
No, you don't see them. Model definition file lists known triphones and senone
sequencies for them. You have some issues with terminology it seems.
Sorry, I don't understand your question here.
since my mdef file is huge file , I am pasting few lines
a SIL v b n/a 2 390 626 682 N
a SIL y b n/a 2 393 629 767 N
a SIL yy b n/a 2 393 629 767 N
a SIL z b n/a 2 393 629 729 N
a a dd b n/a 2 360 485 797 N
a a h b n/a 2 350 485 838 N
a a j b n/a 2 360 485 797 N
a a k b n/a 2 360 503 777 N
a a l b n/a 2 353 502 685 N
a a m b n/a 2 360 485 667 N
a a n b n/a 2 360 554 667 N
a a n' b n/a 2 361 514 753 N
a a n1 b n/a 2 353 502 685 N
a a ng' b n/a 2 353 502 685 N
a a ng'ng' b n/a 2 353 502 685 N
a a nj' b n/a 2 353 502 685 N
these triphones (Bold ) are not in my training dictionary...
Triphones are taken from the transcription of the training prompts, not from
the dictionary. All triphones above are present in your prompts.
0.3
80 n_base
10742 n_tri
43288 n_state_map
740 n_tied_state
240 n_tied_ci_state
80 n_tied_tmat
note: my transcript contains only words no sentences
Great, now we have found the truth as well as the proper name for triphones :)
Any other question?
They aren't trained. First of all on untied stage they will be just ignored.
Later on cd stage when states will be tied, they will have same senone
sequence as word-internal triphones. And this tied senone-sequence like
will be trained from word-internal material. If there will be no word-internal
material as well you'll get a warning:
if (wt_var_ < 0) {
_
_
in norm log on stage 50._
I think
1. even if there is no word internal triphone,it is tied with other similar triphones...
2.then,the Maximum number of senones will be 3*number of triphones listed by
dictotri.exe
3.whether it is possible to train only the internal word triphones and reduce
the number of senones (for my command and control application to increase the
speed and accuracy)
Did you try to change the script to run dict2tri with -btwtri no?
Anyway, I think you have way more effective way to reduce amount of senones -
N_TIED_STATES configuration in sphinx_train.cfg. Why don't you want to set it
properly and this way get the amount of senones you want. I think if cross-
word triphones will not be in training transcription, model will not have
separate senones for them. Even more, they will not be considered in decoder
if your grammar doesn't have self-loops.
If you have only limited amount of word-internal triphones, set the number of
states so that only they will be in final model. Yes, documentation doesn't
consider that in detail, we'll update it accordingly to explain this.
Hello
Sorry for confusion. I've checked the source and let me try to state
everything as it is
Correct me if I'm wrong
Sorry for late response...
All the above said are correct....
I think in final mdef , all the triphones in dictionary is listed and
clustered with trained triphones