Menu

#109 Maybe Bugs: Two problems in Farsi dictionary

v1.0 (example)
open-fixed
5
2015-12-08
2013-12-22
No

Maybe Bugs: Two problems in Farsi dictionary that I can't fix them

  1. First about 0x640 character that is Arabic Lengthening! while it must pronounce as a single letter, it must remove from words before analyze them by Farsi dictionary. Because this is a character that use only for change appearance of writing. Please see below examples that only pronunciation of first word is correct because it doesn't have this character!
    دختر
    دختـر
    دختــر
    دختـــر
    دختــــر
    دخـــــتــــر
  2. I use TTSApp or any other software for read text, and when I have for example (any number) or (any text) then espeak in Farsi read right parenthesis while it don't read left parenthesis! I think it must read like English non of them or at least both of them. Declaration of these in fa_list as are "( paRAntezbAz " and ") paRAntezbaste:" that is correct. I change them to "( ^_en" and ") ^_en" but this change also can't fix this problem.
    Examples for parentheses in Farsi:
    این (1) است
    این (فارسی) است

Thanks in advance

Best Regards
Mahmood Taghavi

1 Attachments

Discussion

  • Jonathan Duddington

    1. I will add U+640 (arabic tatweel) to the list of characters to be ignored for the Farsi voice.

    2. Change the entries in fa_list for brackets (and for other punctuation characters which should not spoken during normal reading of text). Add an underscore character before the bracket character. For example:

    [ beRAketbAz
    _]
    beRAketbaste:
    ( paRAntezbAz
    _) paRAntezbaste:

     
  • Jonathan Duddington

    I will add U+640 (arabic tatweel) to the list of characters to be ignored for the Farsi voice.

    Done now in eSpeak 1.47.16 at
    http://espeak.sf.net/test/latest.html

     
  • Jonathan Duddington

    • status: open --> open-fixed
     
  • Mahmood Taghavi

    Mahmood Taghavi - 2014-01-24

    Arabic tatweel (0x640);

    Dear Jonathan,
    This Bug still exists in version 1.47.16.
    Please see attachment files for details.
    Note: First Pronounciation is true and must use for all examples.
    Please remove Arabic tatweel (0x640) chars before lookup Farsi dictionary.

    Thanks in advance

    Best Regards
    Mahmood Taghavi

     

    Last edit: Mahmood Taghavi 2014-01-24
  • Mahmood Taghavi

    Mahmood Taghavi - 2014-01-24

    I do more tests:
    This bug remain in Windows 7 SP1 x86
    But solved in Windows XP SP2 x86

    Best Regards
    Mahmood Taghavi

     

    Last edit: Mahmood Taghavi 2014-01-24
  • Jonathan Duddington

    I have found the cause of the problem. eSpeak checks for characters to ignore (such as U+640) only for non-alphabetic characters. Probably WindowsXP and Windows7 differ in whether they consider U+640 to be alphabetic.

    I will change eSpeak, so that the test for characters to ignore is done for all characters, both alphabetic and non-alphabetic.

     
  • Jonathan Duddington

    Done now in eSpeak 1.47.17

     
  • Valdis Vitolins

    Valdis Vitolins - 2015-12-06

    Is this bug fixed in espeak 1.48.x version?

     
  • Mahmood Taghavi

    Mahmood Taghavi - 2015-12-07

    I test these two problems using espeak 1.48.15 in a Windows 7 SP1 x86.

    *First problem solved.

    *But second problem exist. Even in this version, espeak read ) "paRAntezbaste:" in text!
    Please as Jonathan said add an underscore to fa_list in the begining of following terms:
    [ beRAketbAz
    ]
    beRAketbaste:
    ( paRAntezbAz
    ) paRAntezbaste:
    I said to Shadyar Khodayari to add underscore for these terms, But he use NVDA and NVDA has a different mechanism to handle punctuation characters. NVDA replace characters with equivalent text using its own dictionary. So Shadyar probably always forget to add underscore for non blind users.

    *And another point. I create some new Mbrola voices for Persian and English. The new phsource file also included. Please add the attached files to new version of espeak.

    Thanks in advance

    Cheers
    Mahmmod Taghavi

     
  • Mahmood Taghavi

    Mahmood Taghavi - 2015-12-08

    Please also add (or update) Mbrola voices in my previously attachment into github.
    New Mbrola voices:
    mb-de5-fa
    mb-de6-en
    mb-de6-fa
    mb-de7-en
    mb-de7-fa
    Updated Mbrola voices:
    mb-ir1
    mb-ir2
    link to previous Attachment:
    https://sourceforge.net/p/espeak/bugs/_discuss/thread/f09c85f3/bdfc/attachment/eSpeak.zip

     

Log in to post a comment.

MongoDB Logo MongoDB