Menu

chunking effect in alignment

Help
number31
2015-01-05
2015-01-06
  • number31

    number31 - 2015-01-05

    Suppose I have a sentence:
    Well let me see yes that's it.

    The actual spoken speech sounds like this:
    Well (short pause) let me see (short pause) yes that's it.

    I used Karel's nnet to do the alignment and the result is that the whole sentence has been aligned into the final part (yes that's it) of the actual spoken period (while the earlier period was assigned as NSN).

    Let us call it a chunking effect because there is a tendency for the phones to be chunked together within one spoken period.

    Just wondering is it a normal thing or is it due to parameter setting?

     
    • Daniel Povey

      Daniel Povey - 2015-01-05

      Sounds like an alignment error - could be a search error, fixable by
      higher beam, or simply a modeling error. However it's unusual that
      NSN could appear in the alignment if there was no such marking in the
      transcript, because it's not normally an optional-silence that the
      lexicon would allows between words (normally only SIL is allowed).
      Dan

      On Mon, Jan 5, 2015 at 12:14 AM, number31 number31@users.sf.net wrote:

      Suppose I have a sentence:
      Well let me see yes that's it.

      The actual spoken speech sounds like this:
      Well (short pause) let me see (short pause) yes that's it.

      I used Karel's nnet to do the alignment and the result is that the whole
      sentence has been aligned into the final part (yes that's it) of the actual
      spoken period (while the earlier period was assigned as NSN).

      Let us call it a chunking effect because there is a tendency for the phones
      to be chunked together within one spoken period.

      Just wondering is it a normal thing or is it due to parameter setting?


      chunking effect in alignment


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • number31

    number31 - 2015-01-05

    I thought both SIL and NSN are generated by Kaldi, no? I am using the TedLium scripts.

    In the transcript the first phones are supposed to be w eh l... but the alignment output gives NSN_S before w eh l... There are some raining sound at the background so I have supposed NSN_S means this kind of noise, but if it is not generated by Kaldi then it would be very strange.

     
  • number31

    number31 - 2015-01-05

    I guess the main reason is that the model for silence is poor, so it cannot add SIL between words of a sentence, hence all words stick together. I am thinking of some possible solutions but not sure which is best:

    [a] add a symbol <sp> between every word;

    [b] boost silence;

    [c] add some "silent sentences" in the transcript.

    What do you think?

     
    • Daniel Povey

      Daniel Povey - 2015-01-05

      I think the issue is likely that you have OOVs in your transcript,
      which are getting turned into <UNK>, which has a pronunciation of NSN
      in the Tedlium setup, which is eating up a bunch of speech. [Note:
      the use of NSN to model unknown-words is slightly against the meaning
      of NSN which I intended to mean non-spoken noise, but it doesn't
      really matter; it's just a cosmetic issue]
      .
      Likely you made some kind of scripting error and your "text" file has
      some kind of garbage before the "well" that doesn't appear in
      words.txt.

      Dan

      On Mon, Jan 5, 2015 at 3:28 AM, number31 number31@users.sf.net wrote:

      I guess the main reason is that the model for silence is poor, so it cannot
      add SIL between words of a sentence, hence all words stick together. I am
      thinking of some possible solutions but not sure which is best:

      [a] add a symbol <sp> between every word;

      [b] boost silence;

      [c] add some "silent sentences" in the transcript.

      What do you think?


      chunking effect in alignment


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • number31

    number31 - 2015-01-06

    In the alignment output, there are 2 NSN_S between EVERY sentence. In the transcript (the STM file that I feed), I have already removed all punctuation marks, so I cannot see where does the NSN_S come from (unless Kaldi automatically insert them between sentences?) Which files in data/ can I check?