CMU Sphinx / Forums / Help: How to build a Persian/Farsi dictionary to using with CMUsphinx!

rezaee - 2016-10-08

Hello
Let's considr we want to make a phonetic dictionary for Farsi digits from 1 to 10. this is the results in espeak with this command: espeak -v fa -x

j'ek 1 d'o 2 s'e 3 tS'AhAR 4 p'andZ 5 S'eS 6 h'aft 7 h'aSt 8 n'oh 9 d'ah 10

And I should map them to a phoneset that I can use in CMUsphinx language model. so I built this mapping for them:

j y e e k k d d o o s s tS ch A aa h h R r p p a a n n dZ j S sh f f t t

Finally I should write my dictionary like this:

یک y e k دو d o سه s e چهار ch aa h aa r پنج p a n j شش sh e sh هفت h a f t هشت h a sh t نه n o h ده d a h

Ok, am I in the right way?
What's the next step?
Do I have the phonetic dictionary and phoneset file for my project? should I go forward to training my system with these two files to get my language model?

Last edit: rezaee 2016-10-08
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-10-08
  
  Ok, am I in the right way?
  
  Yes
  
  What's the next step?
  
  Continue with speech data collection and training
  
  Do I have the phonetic dictionary and phoneset file for my project?
  
  You have the dictionary, phoneset must be compiled. Phoneset should lists phones.
  
  should I go forward to training my system with these two files to get my language model?
  
  Yes
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - rezaee - 2016-10-09
    
    You have the dictionary, phoneset must be compiled. Phoneset should lists phones.
    
    What do you mean by this? how should I compile? what is list of phones?
    you mean I can not do my project with these 2 files?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - rezaee - 2016-10-09
      
      May you put a phoneset file here and I can see what is it?
      Do I need phoneset for next steps?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2016-10-09
        
        This question is answered in acoustic model training tutorial "data preparation" section.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2016-10-08

Another question is. what do I must do with these symbols: ' , :
I didn't consider them into writing my phoneset mapping. will this make a problem?

Last edit: rezaee 2016-10-08

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-10-08
  
  You need to provide context in which particular words symbols like , or : happen. Symbol : usulaly means prolonged phone, which you can use in phonset or ignore, depends on how frequently it happens.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2016-10-09

Why should we map the phonems from espeak? because it has a Standard that CMUsphinx knows it?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-10-09
  
  There is no standards but there are rules described in tutorial: spaces between phonemes, lowercase, no punctuation in phonemes. Those make it easier for software to process input files.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2016-10-09

So the mapping file isn't important for project and it's only a self guied for ourselve to writing the dictionary. and the phoneset is the file that consist of characters we used for building our dictionary.

I have written following characters in a file with ".phone" extension and writing my dictionary with these. they are all of the phones that we need to write Persian words in dictionary. so this is my phoneset file I think?

a e o aa i u b p t s j ch h kh d z r z zh s sh s z t z gh f gh k g l m n v h y ss

Last edit: rezaee 2016-10-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2016-10-09

What about filler dictionary?
How should I build that?
I read the tutorial in acoustic training part but it was't enogh for me!
Can you explain more please?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2016-10-09

Another question
I used online language modeling service to put my text file senteces between by downloding it's ".sent" file. is there any command for doing this in ofline as easy as the online service?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-10-09
  
  so this is my phoneset file I think?
  
  Yes
  
  I read the tutorial in acoustic training part but it was't enogh for me!
  
  You need to ask more detailed question then
  
  I used online language modeling service to put my text file senteces between by downloding it's ".sent" file. is there any command for doing this in ofline as easy as the online service?
  
  SRILM does not require you to insert <s>, it adds them automatically. Otherwise you can write a simple Python script.
  
  Last edit: Nickolay V. Shmyrev 2016-10-09
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2016-10-14

Unfortunately I don't know python! is there any existen script to add and () after sentences?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- mehrshad - 2016-11-09
  
  hi
  plz check your mail...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to build a Persian/Farsi dictionary to using with CMUsphinx!

Speech Recognition Toolkit

Forums

Help

How to build a Persian/Farsi dictionary to using with CMUsphinx! document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How to build a Persian/Farsi dictionary to using with CMUsphinx!