Menu

Using pocketsphinx for phoneme recognition

Help
2015-09-13
2015-10-13
  • Kaushik Ramachandran

    I followed the steps mentioned in the page http://cmusphinx.sourceforge.net/wiki/phonemerecognition to extract phonemes using CMU Sphinx.I'm trying to build a language model for the language "Tamil" .I'm using cmuclmtk for building the LM.I did the following steps:

    Step 1:text2wfreq.exe <input.txt>input.wfreq</input.txt>

    Step 2:wfreq2vocab.exe <input.wfreq>input.vocab</input.wfreq>

    Step 3:text2idngram.exe -vocab input.vocab -idngram input.idngram <input.txt

    Step 4:idngram2lm.exe -idngram input.idngram -vocab input.vocab -arpa input.arpa

    Step 5:echo "perplexity -text input.txt" |evallm -arpa input.arpa

    The CMU wiki says to "Just replace the words with their corresponding transcription" .Where should I do this ? I just have the phoneme list for my language .How do I incorporate this into cmuclmtk .And is there any tools to transcribe text to phonemes for my language?

     
    • Nickolay V. Shmyrev

      You have to write your own script to replace training text with phonemic sequences. You can use any scripting langauge for that - Python, Perl, etc.

       

Log in to post a comment.