Menu

Help with pocketsphinx phoneme recognition through Python

Help
2020-01-17
2020-01-19
  • Fintan Herlihy

    Fintan Herlihy - 2020-01-17

    Hello all,

    I am running the phoneme recognition script Nikolay Shmyrev posted on
    Stackoverflow :
    https://stackoverflow.com/questions/30705028/convert-sound-to-list-of-phonemes-in-python

    It works fine on my audio file. Now the wiki for pocketsphinx phoneme
    recognition says I can access individual timing for each phoneme through
    the API : I was wondering if I could also access it through the decoder in
    Python? And optionally, is there a way to get a SAMPA transcription instead
    of the default one, or should I convert the default one manually?

    Many thanks for your replies in advance.

     
    • Nickolay V. Shmyrev

      I was wondering if I could also access it through the decoder in
      Python?

      Yes, sure

      hypothesis = decoder.hyp() # just the string
      print ('Best phonemes: ', decoder.seg())   # times and names
      

      And optionally, is there a way to get a SAMPA transcription instead
      of the default one, or should I convert the default one manually?

      You have to convert manually

       
  • Fintan Herlihy

    Fintan Herlihy - 2020-01-18

    Thank you very much for the quick reply.
    I was also wondering if there was a prefered file format and size for audio ? I'm getting incoherent results on a 5 Mb .wav file, whereas the goforward.raw file works like a charm.

     
    • Nickolay V. Shmyrev

      Audio file has to be pcm wav 16khz 16bit mono. If you want help on the accuracy share the file.

       
  • Fintan Herlihy

    Fintan Herlihy - 2020-01-18

    Here it is in attachments

     
    • Nickolay V. Shmyrev

      The audio is UK English, not US English, includes noise and 8khz telephony speech. Phonetic recognition of this would be problematic without accurate model. Even with the best neural network models phonetic recognition is not very accurate because it is hard to recognize very loosely defined phonemes in continuous speech.

      If you want to deal with the audio like this you'd better recognize with UK large vocabulary model and then convert to phonemes.

       
  • Fintan Herlihy

    Fintan Herlihy - 2020-01-19

    It's not US English, you're right, it's Irish English. What is strange is that a colleague using pocketsphinx was able to get great results for his transcription of European Spanish using the Mexican Spanish model (also on a file recorded from a radio broadcast and using a similar script).

    I have a transcription of the audio file in words, and am aware of the CMU lmtool : would creating a dictionary and a language model from it be a good idea ?

    Thank you again for your help.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.