Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Arabic language for Espeak

Developers
Marco Oros
2012-02-03
2013-06-23
1 2 > >> (Page 1 of 2)
  • Marco Oros
    Marco Oros
    2012-02-03

    Hi! I would like to develop an Arabic language for Espeak. Also, I have some friends for Arabian countries and I think, that maibe help to me. I know, that Dari language for Espeak was developed and this language uses an Arabic alphabet. I When I found thos think, I decidet to develop an Arabic language for Espeak. I study a Kirscheubaum IPA scheme, but here is another problem. I use a Windows, not Linux. I don't have a compiler for modification to praat. Existing some free compilers fort C?

    I am sorry, but My english is very terrible, this is My secound language. Thank You, Marco Oros.

     
  • Salam,
    I work actually on arabic espeak package,
    I have done a prototype, added some phonemes and voice,
    you can listen to a sample at {http://soundcloud.com/linuxscout/espeakarabic5}
    The main issue for arabic pronociation is the vowelization, so, I have already developped an arabic text vocalizer open source, at {http://tahadz.com/mishkal}
    so I plan to join the vocalizer to espeak to allow a good text to speech conversion.

    I need help to improve the arabic package,
    1- I need to know how to fix stress for generale group rules, because in arabic we have a general stress like CVVC, or CV,
    then, no partical stress for every word.
    for example the sentene:
    السَّلَامُ عَلَيْكُمْ كَيْفَ حالُكَ يَا صَدِيقِي.
    is stressed by espeak like this
         ** alss'al'a:m'o Aal'ajk'om kajf'a Ha:lok'a ja: s[ad'i:q'i: ||
    but it must be stressed like this:
        als'sa'la:'mo Aa'laj'kom kaj'fa Ha:'lo'ka ja: s[a'di:'qi: ||

    The actual source files are in svn : [http://svn.arabeyes.org/viewvc/projects/ar-espeak/

    All my Thanks for farsi package developer

    ](http://svn.arabeyes.org/viewvc/projects/ar-espeak/%3Cbr/%3E%3Cbr/%3E%3Cbr/%3EAll%20my%20Thanks%20for%20farsi%20package%20developer%3Cbr/%3E%3Cbr/%3E%3Cbr/%3E)**

     
  • Marco Oros
    Marco Oros
    2012-10-12

    I am sorry, but this link doesn't working.

     
  • Marco Oros
    Marco Oros
    2012-10-14

    But, I think to sound.

     
  • Marco Oros
    Marco Oros
    2012-10-19

    Thank You. But, I heard another 'h' voice. What's that. Because if You think to 'kh', You must use in Kirschembaum IPa [].

     
  • I launched the arabic espeak project,
    and the first release, by using espeak phonemes and mbrola transformation

    http://arabic-espeak.sf.net

     
  • I tried speaking Arabic text with this, but because of the lack of vowels there are many impossible consonant clusters.  I think that the basic rules in ar_rules should produce pronouncable words, even if they don't have the correct vowels.

    The author of the Farsi voice (fa) in the development version of eSpeak has done a lot of work with its rules and exceptions to pronounce the mssing vowels.  Perhaps the Arabic voice needs similar.

    If the eSpeak Arabic voice is to be used with screen reader software, then it will need to pronounce Arabic words recognizably without the assistance of an external 'vocalizer' program to add vowel characters.  If the features of eSpeak's *_rules and *_list files are not sufficient, then perhaps we can include program code for a vocalizer function inside eSpeak?

     
  • Thanks for your reply,
    for arabic we must try espeak with an vocalized textm else the result will be bed.
    For this purpose we had developed an open source Arabic vocalizer ( http://tahadz.com/mishkal)m it can be downloaded form http://mishak.sf.net.
    I tried to join espeak with mishkal with success.

    For adding vowels by espeak rules, I thinks it is very difficult,
    because  Arabic has very complex affixation system, and a lot of words, for example the "and they give them"  can be written in one word (wa?At|ajnahom). Then we must implement a morphological analyzer by using espeak.
    As a first time, we can add the most frequent words list, or an extra_list.
    The mishkal code is in python, open source, can we use it as library?

     
  • Hi,
    I tried to add a words dictionnary, the result is best than before,
    how can I add all this words into ar_extra instead of ar_list

     
  • eSpeak treats the file ar_extra (if it exists) as an extension of ar_list.   The *_extra  files are intended for a user's own additions.  For your purpose, you can use the file "ar_listx" which also acts as an extension of "ar_list".

    The command:  espeak -compile=ar
    will use ar_rules, ar_list, ar_listx, and ar_extra (if they exist).

     
  • Greate it works.

     
  • Can you help me to represent this case
       i represent "al" as a prefix ( 'al' is the definition article like 'The' in english)
    and i have words in word disctionary started by some letters like 's', for example 'sala:m'
    if i use "al" as  a prefix, it must give me  al+ sala:am => assala:m
    the first letter must be doubled.
    this case is namde sunny letters, there are 14 sunny letters.
    How can I do this?

     
  • > if i use "al" as a prefix, it must give me al+ sala:am => assala:m the first letter must be doubled.

    Do you mean either:
    1.  The word is written as "assalam" and you want to recognize that "as" is a prefix?
    Or
    2. The word is written as "al salam" but should be pronounced as "assalam"?

    If you mean (1) then the rule would be something like:
    .group a
       _)  as (sP2   as

    and similarly for the other 'sunny letters'.  This means that if a word starts with the letters 'ass' then 'as' is a prefix and is removed to leave the remainder of the word (which starts with 's').

    Of course, that example uses Latin characters and your rule would use Arabic characters.

     
  • The word is written as "al salam" but should be pronounced as "assalam"?

    This is the case with 'al' as a prefix

     
  • OK, but I'm no sure what is the problem.  Is "al" a separate word, or is it a prefix?
    Do you need a rule like:

    .group a
       _) al (_s  %as

    This means, if the word "al" is followed by a word which starts with 's', then it is pronounced as "as" (where the 'a' is not stressed).

    If you mean that the "al" is a prefix (part of the word) then use a rule such as:

       _) al (sP2   %as

    This means that if a word starts with "als" then the 2 letters "al" are removed as a prefix and pronounced as unstressed "".  Then the remainder of the word (starting with 's') is translated (including looking for it in the ar_list and ar_listx  files).

    Does this answer your question, or do i not understand the problem correctly?

     
  • Thank your for the answer, it works.
    I have another question about word ending,
    for rules can I use a rule between two words, and for the end of a sentence.
    I have seen it for word distionary like $sentence option, can I use it for regular rule.
    for example
         a 't' ( named in arabic Teh marbuta)  letter
          -   t        =>h   // in end of sentence,
             t        => t   // not the end of sentence.

     
  • I will need to make a change to the eSpeak program.

    Does the alternative pronunciation occur only at the end of a sentence, or also before other punctuation which causes a pause, such as comma, brackets, etc?

     
  • Yes,
       it occurs  also before other punctuation which causes a pause, such as comma, brackets, etc.

    I need another feature, like this but between letters in two word,
      For example:
           min albab =¨> mina albab  // changing the vowel according to the first letter of  the next word
           man mat => mam mat      // changing the  consonant according to the first letter of  the next word

     
  • You can test for the start of the next word in a rule, eg:

       n (_m    m

    This rule pronounces 'n' at the end of a word as  if the next word starts with letter 'm'.

      _) the (_A   DI

    This rule pronounces the word "the" as  if the next word starts with a vowel.  This rule would be correct for English.

    You can also test for the end of the previous word, eg:

      n_) a    a:

    This rule pronounces letter 'a' at the start of a word as  if the previous word ends in letter 'n'.

    None of these rules will match if the next or previous word is separated by punctuation, such as comma, brackets, quotation mark.

    So it is possible to implement your rule for "teh marbuta" as:

        t (_C    h
        t (_A    h
        t           t

    This means that letter 't' is pronounced as  if the next word starts with a vowel or  a consonant, but is pronounced as  if there is no word following, or if there is punctuation.  That's OK if you need it only for this one case.  If you need many similar rules, then perhaps we need a more efficient method.

    Or you could use:
         t (_L01    h
         t                t

    where you define the L01  letter group at the start of the ar_rules files so that it contains all the letters which cause the teh marbuta to be pronounces as .

    .L01  a b c d e f g    etc.

     
  • I wrote:

    t (_C h
    t (_A h
    t        t

    This means that letter 't' is pronounced as  if the next word starts with a vowel or a consonant…

    Sorry, the A and C matchings won't work because eSpeak doesn't have a list of vowels and consonants set for Arabic characters.

    You can define your own letter group in the ar_rules file, as I suggested.  Or if it's useful, I can set up lists of 'vowels' and 'consonants' if that's useful, if you tell me which Arabic characters are 'vowels' and 'consonants'.

     
  • Marco Oros
    Marco Oros
    2013-02-23

    You must create ar_rules file or ar_listx.

     
  • Marco Oros
    Marco Oros
    2013-02-23

    So, for representing this case I think, that Jonathan Duddington after buylding a new version, arabic will be in a new test version. But You must send a link to this forum.

    Thank You.

     
1 2 > >> (Page 1 of 2)