It works fine on my audio file. Now the wiki for pocketsphinx phoneme
recognition says I can access individual timing for each phoneme through
the API : I was wondering if I could also access it through the decoder in
Python? And optionally, is there a way to get a SAMPA transcription instead
of the default one, or should I convert the default one manually?
Thank you very much for the quick reply.
I was also wondering if there was a prefered file format and size for audio ? I'm getting incoherent results on a 5 Mb .wav file, whereas the goforward.raw file works like a charm.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The audio is UK English, not US English, includes noise and 8khz telephony speech. Phonetic recognition of this would be problematic without accurate model. Even with the best neural network models phonetic recognition is not very accurate because it is hard to recognize very loosely defined phonemes in continuous speech.
If you want to deal with the audio like this you'd better recognize with UK large vocabulary model and then convert to phonemes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It's not US English, you're right, it's Irish English. What is strange is that a colleague using pocketsphinx was able to get great results for his transcription of European Spanish using the Mexican Spanish model (also on a file recorded from a radio broadcast and using a similar script).
I have a transcription of the audio file in words, and am aware of the CMU lmtool : would creating a dictionary and a language model from it be a good idea ?
Thank you again for your help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello all,
I am running the phoneme recognition script Nikolay Shmyrev posted on
Stackoverflow :
https://stackoverflow.com/questions/30705028/convert-sound-to-list-of-phonemes-in-python
It works fine on my audio file. Now the wiki for pocketsphinx phoneme
recognition says I can access individual timing for each phoneme through
the API : I was wondering if I could also access it through the decoder in
Python? And optionally, is there a way to get a SAMPA transcription instead
of the default one, or should I convert the default one manually?
Many thanks for your replies in advance.
Yes, sure
You have to convert manually
Thank you very much for the quick reply.
I was also wondering if there was a prefered file format and size for audio ? I'm getting incoherent results on a 5 Mb .wav file, whereas the goforward.raw file works like a charm.
Audio file has to be pcm wav 16khz 16bit mono. If you want help on the accuracy share the file.
Here it is in attachments
The audio is UK English, not US English, includes noise and 8khz telephony speech. Phonetic recognition of this would be problematic without accurate model. Even with the best neural network models phonetic recognition is not very accurate because it is hard to recognize very loosely defined phonemes in continuous speech.
If you want to deal with the audio like this you'd better recognize with UK large vocabulary model and then convert to phonemes.
It's not US English, you're right, it's Irish English. What is strange is that a colleague using pocketsphinx was able to get great results for his transcription of European Spanish using the Mexican Spanish model (also on a file recorded from a radio broadcast and using a similar script).
I have a transcription of the audio file in words, and am aware of the CMU lmtool : would creating a dictionary and a language model from it be a good idea ?
Thank you again for your help.