Help with pocketsphinx phoneme recognition through Python

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Help with pocketsphinx phoneme recognition through Python

Forum: Help

Creator: Fintan Herlihy

Created: 2020-01-17

Updated: 2020-01-19

Fintan Herlihy - 2020-01-17

Hello all,

I am running the phoneme recognition script Nikolay Shmyrev posted on
Stackoverflow :
https://stackoverflow.com/questions/30705028/convert-sound-to-list-of-phonemes-in-python

It works fine on my audio file. Now the wiki for pocketsphinx phoneme
recognition says I can access individual timing for each phoneme through
the API : I was wondering if I could also access it through the decoder in
Python? And optionally, is there a way to get a SAMPA transcription instead
of the default one, or should I convert the default one manually?

Many thanks for your replies in advance.

alternate

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2020-01-17
  
  I was wondering if I could also access it through the decoder in
  Python?
  
  Yes, sure
  
  hypothesis = decoder.hyp() # just the string print ('Best phonemes: ', decoder.seg()) # times and names
  
  And optionally, is there a way to get a SAMPA transcription instead
  of the default one, or should I convert the default one manually?
  
  You have to convert manually
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fintan Herlihy - 2020-01-18

Thank you very much for the quick reply.
I was also wondering if there was a prefered file format and size for audio ? I'm getting incoherent results on a 5 Mb .wav file, whereas the goforward.raw file works like a charm.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2020-01-18
  
  Audio file has to be pcm wav 16khz 16bit mono. If you want help on the accuracy share the file.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fintan Herlihy - 2020-01-18

Here it is in attachments

AnglaisUlster_16.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2020-01-18
  
  The audio is UK English, not US English, includes noise and 8khz telephony speech. Phonetic recognition of this would be problematic without accurate model. Even with the best neural network models phonetic recognition is not very accurate because it is hard to recognize very loosely defined phonemes in continuous speech.
  
  If you want to deal with the audio like this you'd better recognize with UK large vocabulary model and then convert to phonemes.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fintan Herlihy - 2020-01-19

It's not US English, you're right, it's Irish English. What is strange is that a colleague using pocketsphinx was able to get great results for his transcription of European Spanish using the Mexican Spanish model (also on a file recorded from a radio broadcast and using a similar script).

I have a transcription of the audio file in words, and am aware of the CMU lmtool : would creating a dictionary and a language model from it be a good idea ?

Thank you again for your help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.