Menu

Is it necessary to use english alphabets in phonetic dictionary & phone set ?

Help
pannam
2017-03-16
2017-03-20
  • pannam

    pannam - 2017-03-16

    As my language doesn't use english alphabets and with practice I have realised I can create a more restrictive and 'true' phone set. Although it works I would like to question is this practice okay ? or do I need english alphabets ? If I use english a lot of words will have the same phones but is pronounded differently.

    समूह        sa mu ha
    can be
    समूह        स मू ह
    

    and the phoneset

    sa
    mu
    ha
    
    can be
    
    स
    मू
    ह
    
     
  • pannam

    pannam - 2017-03-21

    Hi Nikolay,
    I have gone thru the mateiral. My language is Nepali and not Hindi (Although similar). I may not have explained myself properly I will try again.
    The document states

    We present a method for building an initial phoneme model for
    training an HMM in a new language using an already trained
    recognition system in a base language.
    

    By default is pocketsphinx already trained for english language ?

    I believe pocketshinx is the base to develop a speech recognition system from scratch. What I have done is seperate each character (letter/ alphabeth) of my language. I have only combined Diacritic .

    Basically what I am doing is using my own phoneme set (Nepali Language ) and not linking / converting it into english. Also, I change the code such that it now accepts phones around 350 which is sufficient for my whole language.

    I want to know is it a good practice or not to completely use phoneme / dictionary etc of my own characters and not english letters ?

    Please check my attached file if I have not made myself clear.

    This is the link to the whole project

     

    Last edit: pannam 2017-03-21
    • Nickolay V. Shmyrev

      Basically what I am doing is using my own phoneme set (Nepali Language ) and not linking / converting it into english.

      Your phoneset is not really a phoneset.

      Also, I change the code such that it now accepts phones around 350 which is sufficient for my whole language.

      350 is not the number of the phones of your language, its something different. There are languages with hundred of phones, but not Nepali.

      I want to know is it a good practice or not to completely use phoneme / dictionary etc of my own characters and not english letters ?

      No, it is a bad practice.

       
  • pannam

    pannam - 2017-03-21

    so, do I use International phonetic alphabets -IPA or ISO 15919 from this link ?https://en.wikipedia.org/wiki/Help:IPA_for_Hindi_and_Urdu

     
    • Nickolay V. Shmyrev

      You need to follow the tutorial:

      If you need to find phonetic dictionary, read Wikipedia or a book on phonetics. If you are using existing phonetic dictionary. Do not use case-sensitive variants like “e” and “E”. Instead, all your phones must be different even in case-insensitive variation. Sphinxtrain doesn't support some special characters like '*' or '/' and supports most of others like “+” or “-” or “:” But to be safe we recommend you to use alphanumeric-only phone-set.

      Replace special characters in the phone-set, like colons or dashes or tildes, with something alphanumeric. For example, replace “a~” with “aa” to make it alphanumeric only. Nowadays, even cell phones have gigabytes of memory on board. There is no sense in trying to save space with cryptic special characters.

       

Log in to post a comment.