CMU Sphinx / Forums / Help: Feature vector representation

Andreas Ravndal - 2016-06-16

Hi, I just have some questions about the two feature vector representations 1s_c_d_dd and s2_4x. Could someone explain more clearly the fourth stream in the s2_4x feature vector representation. I also read (https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/d674f4a5/?limit=25) that 1s_c_d_dd consists of three streams, but is combined to a one-stream feature vector? And at last, does the value of the parameter
$CFG_VECTOR_LENGTH in the cfg file impact the 1s_c_d_dd feature vector representation?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-06-16
  
  Could someone explain more clearly the fourth stream in the s2_4x feature vector representation.
  
  s2_4x contains of the following features
  
  12 cepstrum coefficients from c1 to c13, 24 deltas, c0 value + delta c0 + double-delta c0, 12 double deltas
  
  Those are modelled as 4 GMM streams, so each group of 4 has it's own codebook.
  
  that 1s_c_d_dd consists of three streams,
  
  Plain 1s_c_d_dd is a single stream.
  
  1s_c_d_dd with -svspec modifier is 3 streams. -svspec 0-12/13-24/25-38 does the split.
  
  Last edit: Nickolay V. Shmyrev 2016-06-16
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Ravndal - 2016-06-17

Thank you! would you recommend experimenting using s2_4x for continuous models? Or is it prefered using 1s_c_d_dd?
And also my last question was not answered from my previous post. Does the vaule of $CFG_VECTOR_LENGTH (which is set to 13 by default) affect the 1s_c_d_dd feature vector representation?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-06-17
  
  Thank you! would you recommend experimenting using s2_4x for continuous models?
  
  No, multistream models only make sense for semi-continuous and ptm models. And even then 3 streams is better than 4.
  
  Or is it prefered using 1s_c_d_dd?
  
  1s_c_d_dd is preferred for continuous.
  
  Does the vaule of $CFG_VECTOR_LENGTH (which is set to 13 by default) affect the 1s_c_d_dd feature vector representation?
  
  Yes.
  
  Last edit: Nickolay V. Shmyrev 2016-06-17
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Andreas Ravndal - 2016-06-17
    
    Ok, so then I should expect better results setting this parameter to 39 using 1s_c_d_dd feature representation?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-06-17
      
      Default value 13 results in 39 features scored in the end (13 base, then deltas and delta-deltas). If you change to 39 that would be too much with total feature size 117. It also require you to change nfilt value which is often just 25.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Feature vector representation

Speech Recognition Toolkit

Forums

Help

Feature vector representation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Feature vector representation