Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#25 RFE: spellchecker should not be restricted to single languag

open
None
5
2009-02-23
2009-02-17
Moayyad Al-Sadi
No

Description of problem:
When checking a single script document the spellchecker needs to know what language should it use
for example if the document contains French and English text, then it should be told according to which dictionary should it use, because both use the Latin scripts language can't detected.

but if the document contains English and Arabic texts it should be able to know which parts to check against the English dictionary and which against the Arabic

How reproducible:
always

Steps to Reproduce:
1. start pidgin or firefox with hunspell-ar and hunspell-en install
2. type a mixed text like "Peace be upon those who follow guidance السلام على من اتبع الهدى"

Actual results:
only on of the two messages will be checked

Expected results:
both should be checked

Additional info:
http://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html#GUnicodeScript
http://library.gnome.org/devel/pango/unstable/pango-Scripts-and-Languages.html

some scripts represents more than one languages including the Arabic script which is used by Persian language for example

the spell check should work like this
for word in words:
script=get_script(word)
n=get_number_of_installed_dicts_for_script(script)
if n==1: use_this_dict()
else: use_conf_or_locale_to_guess()
spell_check(word)

of course get_number_of_installed_dicts_for_script should use some sort of cache

Discussion

    • assigned_to: nobody --> nemethl
     
  • I also think, using multiple languages automatically is very comfortable under the spell checking. There is an API-level method to add new dictionaries to the base dictionary, but their character encodings and affix files would be the same. I think, using multiple dictionaries is a task of the applications, because Hunspell has a simple thread-safe API without any "memory" and dictionary management function, and some of the applications have already had the right methods.

    The command line Hunspell can use multiple dictionaries with the following syntax:

    hunspell -d ar_EG,en_US

    The command line Hunspell has a method to automatically select the language of the suggestions based on its "word memory" (in this case, it uses the language of the previous correct word for the suggestions).

    In OpenOffice, you can install multiple dictionaries to the same locale. Please check the "English names" and Medical dictionary" extensions of the OpenOffice.org. You can modify one of these zipped extensions replacing its dictionary with a French dictionary, so you will be able to use automatic French spell checking under English text processing (or you can install the English dictionary to the French locale, add default English spell checking to the French text processing). Also there was a useful option, "Check all languages" in OpenOffice.org 3.0 and before, I hope, it will be implemented again.

    A possible API extension of the Hunspell library cannot help too much for Firefox, because the bottleneck is the Firefox spell checking development. Please, make a new issue for Firefox in its Bugzilla, too. Unfortunately, For Firefox, the only method to use Arabic and English in the same time is making a hybrid dictionary: unmunch the English dictionary and copy to the Arabic dic file:

    hunspell-1.2.8/src/tools unmunch en-US.dic en-US.aff >en_words.txt
    cat en_words.txt >>arabic_used_by_firefox.dic

    I believe, the ideal solution is an automatic language detection in the text editors and word processors by a less resource-consuming method, like the n-gram language detection in OpenOffice.org by the Libtextcat library. With the implementation of the new option (detect language) you shouldn't bother about the setting of the document language and spelling dictionaries.

    Thanks for your report, László