Menu

Arpabet for non-english languages?

Help
Kristoffer
2018-03-07
2018-03-07
  • Kristoffer

    Kristoffer - 2018-03-07

    I understand Arpabet is well-suited for English. I have some non-English dictionaries with - what seems to be - X-SAMPA pronounciations. Can these be used out of the box or do I need to map the characters to Arpabet friendly tokens? I.e. space-separated alpha-only characters?

    E.g. I have this for the words "expert" and "institut":

    ek$"spEt`
    In$stI$"t}:t
    

    Please advise!

     
    • Nickolay V. Shmyrev

      do I need to map the characters to Arpabet friendly tokens?

      Yes you need to map.

       
  • Kristoffer

    Kristoffer - 2018-03-07

    Can I create my own set of tokens or do they need to match CMUDict? E.g.:

    EH K S P ER T
    IH N S T IH T UW T
    

    That is, where there is an equivalent tag I must use it? And then, when there is no match, I can invent my own?

     
  • Kristoffer

    Kristoffer - 2018-03-07

    Thanks. Really appreciate your help :)

     
  • Kristoffer

    Kristoffer - 2018-03-07

    Given the resources mentioned here, what tools do I need to create something like en-us-phone.lm.bin but for e.g. Swedish? I understand that the phone LM file is a lot smaller than the original file. Not sure how it was created. Is there a tutorial somewhere to get me started?

    How much work do you think would be required in terms of hours/days?

     

Log in to post a comment.