Menu

How can I build a language model for Persian/Farsi language?

Help
rezaee
2016-09-29
2016-10-07
  • rezaee

    rezaee - 2016-09-29

    Hi
    I want to build a simple voice recognition system(at first I think it's better to call it command-controll-like system), and want to use pocketsphinx. So I like to know, are there any sources like dictionary, lanuage model, acostic model, train data set, test data set,etc for adaption?
    or Should I do everything from scratch with myself?

    what's the road map and a comprehensive guide to do that?

     
    • Nickolay V. Shmyrev

      Most of the methods are listed in our http://cmusphinx.sourceforge.net/wiki/tutorial, you just need to read it.

      You can use espeak to create a phonetic dictionary.

      You can train language model from wikipedia dump and from movie subtitles

      You can find transcribed podcasts, audiobooks, radio shows to get acoustic model training data. You can run crowdsourcing data collection too.

       
      • rezaee

        rezaee - 2016-10-07

        How can I use espeak to build a Farsi dictionary that I can use with pocketsphinx?
        I saw their site but couldn't find Farsi in their language list!
        Is there any tutorial about this?

         
  • rezaee

    rezaee - 2016-09-29

    Persian language uses Arabic alphabet(فارسی) and our text resources are in that alphabet, so, is it posible to use them in those toolkits introduced in tutorial to build a language model?!

    this is a Persian page in wikipedia : https://fa.wikipedia.org/wiki/%D8%B2%D8%A8%D8%A7%D9%86_%D9%81%D8%A7%D8%B1%D8%B3%DB%8C

    you say we can build a model with using of many of these pages in Persian alphabet?!
    How?

     
    • Nickolay V. Shmyrev

      Persian language uses Arabic alphabet(فارسی) and our text resources are in that alphabet, so, is it posible to use them in those toolkits introduced in tutorial to build a language model?!

      Yes, you can use ti. We expect UTF-8 encoding of your input data.

      you say we can build a model with using of many of these pages in Persian alphabet?!

      Yes.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.