Hi, I need to split a recording of a word into its phonemes in order to
compare it to another word. I know using a phoneme based dictionary and
pocketsphinx to treat each phoneme like an individual word is not really
accurate enough.
What I have tried to do so far is extract a feature vector from the word using
an fe_t object and fe_process_frames and then compare that to a vector
extracted in the same way from a short section of an individual phoneme. I am
using the flags "-logspec = TRUE", "-nfilt = 20" and "-lowerf = 80", do these
sound sensible? This doesn't really work either however as close matches to
the phoneme are found all over the word, not just in the correct location.
The next method that I am trying is to use the word feature vector and search
through it looking for points of change i.e. when the vector changes
considerably.
What I'm really asking for is some advise. Do either of these options sound
sensible? How is this sort of thing usually done? Are there any other
sphinxbase objects which would make this task simpler?
Any pointers or help would be much appreciated, thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am using the flags "-logspec = TRUE", "-nfilt = 20" and "-lowerf =80", do
these sound sensible? This doesn't really work either however as close matches
to the phoneme are found all over the word, not just in the correct location.
No, it doesn't sound sensible. In particular there is no reason to use logspec
What I'm really asking for is some advise. Do either of these options sound
sensible? How is this sort of thing usually done? Are there any other
sphinxbase objects which would make this task simpler?
The advice is to study the theory first and only then design the application.
You need to describe what kind of application you have to build in order to
get an advice on the algorithm. Details of the impelementation for some common
types are listed in FAQ.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I need to split a recording of a word into its phonemes in order to
compare it to another word. I know using a phoneme based dictionary and
pocketsphinx to treat each phoneme like an individual word is not really
accurate enough.
What I have tried to do so far is extract a feature vector from the word using
an fe_t object and fe_process_frames and then compare that to a vector
extracted in the same way from a short section of an individual phoneme. I am
using the flags "-logspec = TRUE", "-nfilt = 20" and "-lowerf = 80", do these
sound sensible? This doesn't really work either however as close matches to
the phoneme are found all over the word, not just in the correct location.
The next method that I am trying is to use the word feature vector and search
through it looking for points of change i.e. when the vector changes
considerably.
What I'm really asking for is some advise. Do either of these options sound
sensible? How is this sort of thing usually done? Are there any other
sphinxbase objects which would make this task simpler?
Any pointers or help would be much appreciated, thanks.
No, it doesn't sound sensible. In particular there is no reason to use logspec
The advice is to study the theory first and only then design the application.
You need to describe what kind of application you have to build in order to
get an advice on the algorithm. Details of the impelementation for some common
types are listed in FAQ.