I am working on repetition analysis in uttered speech. It requires the comparision of different words in a sentence, to detect the homogenous segments which are repeated. Each word consists of different number of samples resulting in different dimensions. For the comparision I need to encode all the words into a vector having same number of samples. The encoding scheme should retain spectro-temporal properties of each uttered word, such that the comparision results in lower distance for similar words and larger distance between different words. I approached the above problem by polynomial curve fitting which yielded erroneous results. Please suggest some other encoding technique. Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Not sure which curve you are fitting... How your approach will handle different length of the words? You should look towads hidden Markov models. Another option is using dynamic time warping, where you could probably get reference as averaged length-normalized MFCC sequences of your training samples
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am working on repetition analysis in uttered speech. It requires the comparision of different words in a sentence, to detect the homogenous segments which are repeated. Each word consists of different number of samples resulting in different dimensions. For the comparision I need to encode all the words into a vector having same number of samples. The encoding scheme should retain spectro-temporal properties of each uttered word, such that the comparision results in lower distance for similar words and larger distance between different words. I approached the above problem by polynomial curve fitting which yielded erroneous results. Please suggest some other encoding technique. Thank you.
Not sure which curve you are fitting... How your approach will handle different length of the words? You should look towads hidden Markov models. Another option is using dynamic time warping, where you could probably get reference as averaged length-normalized MFCC sequences of your training samples