CMU Sphinx / Forums / Help: pocket sphinx phenome recognition / alignment for seperation

Speech Recognition Toolkit

pocket sphinx phenome recognition / alignment for seperation

Forum: Help

Creator: Andy Hu

Created: 2016-10-08

Updated: 2016-10-09

Andy Hu - 2016-10-08

Hello,
I have been using pocket sphinx to do phoneme recognition as documented here:
http://cmusphinx.sourceforge.net/wiki/phonemerecognition
by adding the -time argument, I can get the timing of the phonemes, which I use to segment the source file into tiny chunks.

The software has been easy to use and setup, however, like mentioned on the page, the accuracy is not very good for directly running phenome recognition. My goal is to build a collection of short sound files sorted by phonemes.

If I have a transcription for the source audio, is there some way to do "alignment" on the audio to get better segmentations? The audio files are only 5 - 20 s long.

There is a mention in another thread that you can get alignment on pocketsphinx by simply using the transcription as the grammar. https://sourceforge.net/p/cmusphinx/discussion/help/thread/dd998add/

How would I build such a language model for allphone/phonemes? From what I understand, the allphone search only takes ngram as the model? Should I replace everything in my transcription with phonemes and feed it to srilm ngram-count to use as the language model?

Or is there some better way to get phoneme timings through alignment in pocketsphinx?

Thank you very much!

Also, I think that this page has a typo near the bottom? feed this text file into [strilm] srilm
http://cmusphinx.sourceforge.net/wiki/phonemerecognition

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-10-09
  
  How would I build such a language model for allphone/phonemes? From what I understand, the allphone search only takes ngram as the model?
  
  You can still use grammar with a dictionary of single-phone words.
  
  There is also ps_alignment API, example of which you can find in tests. But that requires very exact match between reference phoneme string and the actual audio contents.
  
  Also, I think that this page has a typo near the bottom? feed this text file into [strilm] srilm
  http://cmusphinx.sourceforge.net/wiki/phonemerecognition
  
  Thank you for the notice, fixed
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pocket sphinx phenome recognition / alignment for seperation

Speech Recognition Toolkit

Forums

Help

pocket sphinx phenome recognition / alignment for seperation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

pocket sphinx phenome recognition / alignment for seperation