Menu

generating dictionary of non-English language

Help
2016-07-15
2016-07-15
  • Park byeong soo

    Park byeong soo - 2016-07-15

    Hello, I,m new to sphinx.
    I have some questions about my understanding of training non-English language(Korean) to sphinx.
    (I heart that sphinx is a language dependent asr)

    First,
    When generating dictionary(http://cmusphinx.sourceforge.net/wiki/tutorialdict),
    I typed vocabulary words to revised romanization of my language.
    For example, I change "system" into "siseutem"(When pronouncing this, it means "system" in my language).
    I wonder this is a right approach when building a dictionary.

    Second(this is an additional question if the approach in question 1 was right)
    When I use g2p-seq2seq to change vocabulary for conversion, I use English model(A 2-layer LSTM with 256 hidden units) which was commented on tutorial.
    Can I use this model to conversion despite language difference?
    If I have to make g2p model of my own language, are there some tutorial that I can refer?

     

    Last edit: Park byeong soo 2016-07-15
  • Arseniy Gorin

    Arseniy Gorin - 2016-07-15

    Correct me if I am wrong, but romanized Korean grapheme-to-phoneme rules are not that close to the English ones. Therefore, using English system will likely create wrong pronunciation.

    Basically, you will need some initial training vocabulary and following this document train your own system.

    If there are no linguistic resources available, you may try using graphemes (characters) of romanized alphabete instead of phonemes in ASR system. Of course the accuracy degrades in this case compared to using a decent dictionary

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.