CMU Sphinx / Forums / Help: Improving the boundaries of forced alignment

Diwakar.G - 2017-01-14

There are too much deviations in boundaries obtained from forced alignment. In the SIL part
most of the phones are identified. Is there any way to improvise the boundaries. I am using various distance measure to compare feature vectors but this doesn't work. Can you please suggest something so that I can improve the results.
I have an another doubt that what is the difference between array and head Mic. Initially I trained models with head Mic, but there is a large variation in boundary. Then I use database from array mic will give better results. Why is this variation.
Please help me. Thank you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-01-17
  
  I am using various distance measure to compare feature vectors but this doesn't work
  
  It is not quite clear what you are trying to achieve. If you need alignment, it is better to use HMMs and the algorithm of sphinx3_align. You can improve alignments with better acoustic models. No need to implement distance measure for feature vectors - this just makes no sense.
  
  Initially I trained models with head Mic, but there is a large variation in boundary. Then I use database from array mic will give better results.
  
  Again not clear what you ask. If the microphone quality is bad (or too much different from the one used in training the acoustic model) you can have bad alignments.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Diwakar.G - 2017-01-17

The database that I am working with consists of Dysarthric and controlled speakers speech (Torgo database) recorded with two kinds of microphones namely Head mic and array mic. Initally the models trained with the controlled speakers speech which are recorded using head mic and I used those trained models for the forced alignment of Dysarthric speech which resulted in erroneous phone and word boundaries. I repeated the same experiment by replacing the controlled speakers speech which are recorded with array mic which resulted in smaller deviation in phoneme and word boundaries. I am unable to figure out the discrepancy in the results.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-01-17
  
  Aligning dysarthric speech with the models trained on clean speech is not easy. I did not work with this data set, but generally speakers with dysarthria produce sounds of longer durationd that are also a lot corrupted.
  
  There is also a lot of mismatch between distant and close microphone recording features, so your outcomes are not surprising - this seems like an open research topic.
  
  I think both variability sources can be in part handled by model adaptation.
  
  I'd suggest reviewing some papers that work on your data set ( https://scholar.google.com/scholar?cites=15287134871273488524 ), like this one maybe ( http://ieeexplore.ieee.org/document/5947460/ )
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Diwakar.G - 2017-01-17

Thanks for your response. As you said there are lot deviations in boundaries. By using some knowledge based approach I have fine tune the boundaries to some extent. The major issue involved is repeation of words. For the utterance "yet he still thinks as swiftly as ever",
They have uttered something like this "ye yet he still thinks as swiftly as ever". Sphinx3_align is able to align for "ye" to some extent. For the word "yet" it is aligned with the some phone. I don't know how to address this issue. Please help me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-01-17
  
  My opinion is that your problem requires a careful research project involving acoustic model tuning/adaptation. It is quite hard to give ready-to-use advices.
  But I am almost sure your acoustic model should at least be adapted on dysarthric speech. You will also need to revise the lexicon for dealing with stuff like word truncation (like yet -> ye thing).
  
  Last edit: Arseniy Gorin 2017-01-17
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Diwakar.G - 2017-01-17

Ok sir, thanks for your suggestion

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Improving the boundaries of forced alignment

Speech Recognition Toolkit

Forums

Help

Improving the boundaries of forced alignment document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Improving the boundaries of forced alignment