Hi, I just have some questions about the two feature vector representations 1s_c_d_dd and s2_4x. Could someone explain more clearly the fourth stream in the s2_4x feature vector representation. I also read (https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/d674f4a5/?limit=25) that 1s_c_d_dd consists of three streams, but is combined to a one-stream feature vector? And at last, does the value of the parameter
$CFG_VECTOR_LENGTH in the cfg file impact the 1s_c_d_dd feature vector representation?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you! would you recommend experimenting using s2_4x for continuous models? Or is it prefered using 1s_c_d_dd?
And also my last question was not answered from my previous post. Does the vaule of $CFG_VECTOR_LENGTH (which is set to 13 by default) affect the 1s_c_d_dd feature vector representation?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Default value 13 results in 39 features scored in the end (13 base, then deltas and delta-deltas). If you change to 39 that would be too much with total feature size 117. It also require you to change nfilt value which is often just 25.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I just have some questions about the two feature vector representations 1s_c_d_dd and s2_4x. Could someone explain more clearly the fourth stream in the s2_4x feature vector representation. I also read (https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/d674f4a5/?limit=25) that 1s_c_d_dd consists of three streams, but is combined to a one-stream feature vector? And at last, does the value of the parameter
$CFG_VECTOR_LENGTH in the cfg file impact the 1s_c_d_dd feature vector representation?
s2_4x contains of the following features
12 cepstrum coefficients from c1 to c13, 24 deltas, c0 value + delta c0 + double-delta c0, 12 double deltas
Those are modelled as 4 GMM streams, so each group of 4 has it's own codebook.
Plain 1s_c_d_dd is a single stream.
1s_c_d_dd with
-svspec
modifier is 3 streams.-svspec 0-12/13-24/25-38
does the split.Last edit: Nickolay V. Shmyrev 2016-06-16
Thank you! would you recommend experimenting using s2_4x for continuous models? Or is it prefered using 1s_c_d_dd?
And also my last question was not answered from my previous post. Does the vaule of $CFG_VECTOR_LENGTH (which is set to 13 by default) affect the 1s_c_d_dd feature vector representation?
No, multistream models only make sense for semi-continuous and ptm models. And even then 3 streams is better than 4.
1s_c_d_dd is preferred for continuous.
Yes.
Last edit: Nickolay V. Shmyrev 2016-06-17
Ok, so then I should expect better results setting this parameter to 39 using 1s_c_d_dd feature representation?
Default value 13 results in 39 features scored in the end (13 base, then deltas and delta-deltas). If you change to 39 that would be too much with total feature size 117. It also require you to change nfilt value which is often just 25.