Menu

Feature vector representation

Help
2016-06-16
2016-06-17
  • Andreas Ravndal

    Andreas Ravndal - 2016-06-16

    Hi, I just have some questions about the two feature vector representations 1s_c_d_dd and s2_4x. Could someone explain more clearly the fourth stream in the s2_4x feature vector representation. I also read (https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/d674f4a5/?limit=25) that 1s_c_d_dd consists of three streams, but is combined to a one-stream feature vector? And at last, does the value of the parameter
    $CFG_VECTOR_LENGTH in the cfg file impact the 1s_c_d_dd feature vector representation?

     
    • Nickolay V. Shmyrev

      Could someone explain more clearly the fourth stream in the s2_4x feature vector representation.

      s2_4x contains of the following features

      12 cepstrum coefficients from c1 to c13, 24 deltas, c0 value + delta c0 + double-delta c0, 12 double deltas

      Those are modelled as 4 GMM streams, so each group of 4 has it's own codebook.

      that 1s_c_d_dd consists of three streams,

      Plain 1s_c_d_dd is a single stream.

      1s_c_d_dd with -svspec modifier is 3 streams. -svspec 0-12/13-24/25-38 does the split.

       

      Last edit: Nickolay V. Shmyrev 2016-06-16
  • Andreas Ravndal

    Andreas Ravndal - 2016-06-17

    Thank you! would you recommend experimenting using s2_4x for continuous models? Or is it prefered using 1s_c_d_dd?
    And also my last question was not answered from my previous post. Does the vaule of $CFG_VECTOR_LENGTH (which is set to 13 by default) affect the 1s_c_d_dd feature vector representation?

     
    • Nickolay V. Shmyrev

      Thank you! would you recommend experimenting using s2_4x for continuous models?

      No, multistream models only make sense for semi-continuous and ptm models. And even then 3 streams is better than 4.

      Or is it prefered using 1s_c_d_dd?

      1s_c_d_dd is preferred for continuous.

      Does the vaule of $CFG_VECTOR_LENGTH (which is set to 13 by default) affect the 1s_c_d_dd feature vector representation?

      Yes.

       

      Last edit: Nickolay V. Shmyrev 2016-06-17
      • Andreas Ravndal

        Andreas Ravndal - 2016-06-17

        Ok, so then I should expect better results setting this parameter to 39 using 1s_c_d_dd feature representation?

         
        • Nickolay V. Shmyrev

          Default value 13 results in 39 features scored in the end (13 base, then deltas and delta-deltas). If you change to 39 that would be too much with total feature size 117. It also require you to change nfilt value which is often just 25.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.