I found a method in sphinx 4 result.getBestPronunciationResult() which lists the phonemes that were recognized (from the final result). Is it possible retrieve the list of all matching phonemes. I am interested in the intermediate state and not the final result, a list of all possible phonemes (optionally, with a score assigned to each)?
Thanks
Sriram
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the reply. If i am not wrong, your code displays all possible word combinations that were evaluated. Do you think it is possible to list the phonemes that were used to generate these words?
Thanks for your help.
Sriram
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yeah, I saw those, but, probably my understanding of the system isn't correct. So sphinx would first create a single list of phonemes and then fit words with these? Or does it create multiple lists of phonemes, score them and pick the best before proceeding with fitting the words?
Thanks
Sriram
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sphinx doesn't really have the concept of a initial list of phonemes. It builds the list of partial results as it walks through the graph by considering mainly the acoustic model score and the language model scores. In other words, the vocabulary that's provided to the language model is the one that really drives what will become the final best result.
So the bottom line is that the phonemes exposed by the function I gave you earlier are not really what the acoustic model has recognized. Instead, they're the phonetic representation of the word tokens that received the highest scores and almost became the best result.
Without a language model helping drive the process, chances are your results would be very inaccurate.
Long story shorter, your second statement is more in line with what really happens.
Cheers,
Andre
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I found a method in sphinx 4 result.getBestPronunciationResult() which lists the phonemes that were recognized (from the final result). Is it possible retrieve the list of all matching phonemes. I am interested in the intermediate state and not the final result, a list of all possible phonemes (optionally, with a score assigned to each)?
Thanks
Sriram
Hi Sriram,
Yes, that's possible. Here's how.
System.out.println("final result: " + result.getBestPronunciationResult() + "\n");
List nBest = result.getResultTokens();
Token nBestToken;
for (int i = 0; i < nBest.size(); i++) {
nBestToken = (Token)nBest.get(i);
System.out.println("partial result: " + nBestToken.getWordUnitPath());
System.out.println(" word path: " + nBestToken.getWordPath(false, true) + \n");
System.out.println(" score: " + nBestToken.getScore() + \n");
}
Check out the "Token" documentation to find out what other methods you can call.
http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/decoder/search/Token.html
Cheers,
Andre
Hi Andre,
Thanks for the reply. If i am not wrong, your code displays all possible word combinations that were evaluated. Do you think it is possible to list the phonemes that were used to generate these words?
Thanks for your help.
Sriram
The phonemes are also listed. The output string looks like this:
with[W,IH,TH] the[DH,AH] press[P,R,EH,S]
You just need to parse it.
Hi Andre,
Yeah, I saw those, but, probably my understanding of the system isn't correct. So sphinx would first create a single list of phonemes and then fit words with these? Or does it create multiple lists of phonemes, score them and pick the best before proceeding with fitting the words?
Thanks
Sriram
Sphinx doesn't really have the concept of a initial list of phonemes. It builds the list of partial results as it walks through the graph by considering mainly the acoustic model score and the language model scores. In other words, the vocabulary that's provided to the language model is the one that really drives what will become the final best result.
So the bottom line is that the phonemes exposed by the function I gave you earlier are not really what the acoustic model has recognized. Instead, they're the phonetic representation of the word tokens that received the highest scores and almost became the best result.
Without a language model helping drive the process, chances are your results would be very inaccurate.
Long story shorter, your second statement is more in line with what really happens.
Cheers,
Andre
Thanks for the explanation. Ill try to use the list of phonemes and proceed.
Thanks for your time and help.
Sriram