Kaldi / Discussion / Help: chunking effect in alignment

number31 - 2015-01-05

Suppose I have a sentence:
Well let me see yes that's it.

The actual spoken speech sounds like this:
Well (short pause) let me see (short pause) yes that's it.

I used Karel's nnet to do the alignment and the result is that the whole sentence has been aligned into the final part (yes that's it) of the actual spoken period (while the earlier period was assigned as NSN).

Let us call it a chunking effect because there is a tendency for the phones to be chunked together within one spoken period.

Just wondering is it a normal thing or is it due to parameter setting?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-01-05
  
  Sounds like an alignment error - could be a search error, fixable by
  higher beam, or simply a modeling error. However it's unusual that
  NSN could appear in the alignment if there was no such marking in the
  transcript, because it's not normally an optional-silence that the
  lexicon would allows between words (normally only SIL is allowed).
  Dan
  
  On Mon, Jan 5, 2015 at 12:14 AM, number31 number31@users.sf.net wrote:
  
  Suppose I have a sentence:
  Well let me see yes that's it.
  
  The actual spoken speech sounds like this:
  Well (short pause) let me see (short pause) yes that's it.
  
  I used Karel's nnet to do the alignment and the result is that the whole
  sentence has been aligned into the final part (yes that's it) of the actual
  spoken period (while the earlier period was assigned as NSN).
  
  Let us call it a chunking effect because there is a tendency for the phones
  to be chunked together within one spoken period.
  
  Just wondering is it a normal thing or is it due to parameter setting?
  
  chunking effect in alignment
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355348/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

number31 - 2015-01-05

I thought both SIL and NSN are generated by Kaldi, no? I am using the TedLium scripts.

In the transcript the first phones are supposed to be w eh l... but the alignment output gives NSN_S before w eh l... There are some raining sound at the background so I have supposed NSN_S means this kind of noise, but if it is not generated by Kaldi then it would be very strange.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

number31 - 2015-01-05

I guess the main reason is that the model for silence is poor, so it cannot add SIL between words of a sentence, hence all words stick together. I am thinking of some possible solutions but not sure which is best:

[a] add a symbol <sp> between every word;

[b] boost silence;

[c] add some "silent sentences" in the transcript.

What do you think?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-01-05
  
  I think the issue is likely that you have OOVs in your transcript,
  which are getting turned into <UNK>, which has a pronunciation of NSN
  in the Tedlium setup, which is eating up a bunch of speech. [Note:
  the use of NSN to model unknown-words is slightly against the meaning
  of NSN which I intended to mean non-spoken noise, but it doesn't
  really matter; it's just a cosmetic issue].
  Likely you made some kind of scripting error and your "text" file has
  some kind of garbage before the "well" that doesn't appear in
  words.txt.
  
  Dan
  
  On Mon, Jan 5, 2015 at 3:28 AM, number31 number31@users.sf.net wrote:
  
  I guess the main reason is that the model for silence is poor, so it cannot
  add SIL between words of a sentence, hence all words stick together. I am
  thinking of some possible solutions but not sure which is best:
  
  [a] add a symbol <sp> between every word;
  
  [b] boost silence;
  
  [c] add some "silent sentences" in the transcript.
  
  What do you think?
  
  chunking effect in alignment
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355348/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

number31 - 2015-01-06

In the alignment output, there are 2 NSN_S between EVERY sentence. In the transcript (the STM file that I feed), I have already removed all punctuation marks, so I cannot see where does the NSN_S come from (unless Kaldi automatically insert them between sentences?) Which files in data/ can I check?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chunking effect in alignment

Forums

Help

chunking effect in alignment document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

chunking effect in alignment