I am using PocketSphinx for Unity to extract phonemes from audio clips (with timing). Using "-allphone" I am able to do this decently, but I'd like better accuracy.
Since I have the associated text, I can recognize the clips using keyphrase mode. But unlike with "-allphone" mode, there is no SegmentList I can get phoneme times out of . Is there some way to leverage the text I have when extracting phonemes?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi!
I am using PocketSphinx for Unity to extract phonemes from audio clips (with timing). Using "-allphone" I am able to do this decently, but I'd like better accuracy.
Since I have the associated text, I can recognize the clips using keyphrase mode. But unlike with "-allphone" mode, there is no SegmentList I can get phoneme times out of . Is there some way to leverage the text I have when extracting phonemes?
Did you find any solution for this?
I'm having same problem.