Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#18 equivalence support for some symbols

closed
None
5
2008-11-13
2007-10-05
Andriy Rysin
No

It would be nice to have support for a feature which allows to treat some characters as equal.

E.g. in Unicode there is recommended apostrophe symbol (U+2019) which is supposed to be used in texts instead of ASCII single quote (0x2D). With switching the spelling dictionary to UTF-8 it's easy to translate the words and rules.
But there are a lot of texts out there which still contain old ASCII apostrophe and suggesting to change all of them to new Unicode symbol may not be practical. Plus many users don't have the new apostrophe on their keyboards and don't care much so it would be nice if they still can spellcheck their texts with "old" apostrophe.

Having both versions of the words in the dictionary does not help much not just because it makes it harder to maintain the dictionary but also because the rules which deal with apostrophe would have to be modified and then caring about both types...

The situation gets even worse if take to account that there's Unicode hyphen recommened to replace the ASCII one. A most probably there are other such duplicated symbols.

Also there are other uses for such a feature, e.g. in Russian there's a letter cyrillc_yo which is often written as cyrillc_ye which is legal but to take care of that difference there are two dictionaries ru_yo and ru_ye. The eqivalency feature might help in this case as well.

OOo has some means to help with such issues but hunspell is to be used in other projects, e.g. next Firefox so we can't just rely on outside environment to fix this.

Suggested format would be to have main symbol coded in words and rules and have a line to specify there are other forms of particular symbol(s), like this:

EQUIV main_symbol altern1 {altern2 ...}

e.g.

EQUIV ’ '
EQUIV ‐ -
EQUIV ё е

Discussion

    • assigned_to: nobody --> nemethl
     
  • Logged In: YES
    user_id=726595
    Originator: NO

    Unfortunatelly, this feature is to complex for implementing within Hunspell.
    I suggest to use dictionary converters. For example, I have made a converter for the Hungarian dictionary to generate a mixed dictionary with Hungarian words and affixes with and without diacritics for e-mail spell checking.
    See in this zipped Mozilla extension: http://downloads.sourceforge.net/magyarispell/hungarian_dictionary-1.1.3.1ekezet-fx%2Bzm%2Btb.xpi

     
    • status: open --> closed
     
  • Good news: the new ICONV/OCONV feature is a generalized solution for this and other input/output conversions in Hunspell 1.2.8.

    ICONV 3
    ICONV ’ '
    ICONV ‐ -
    ICONV ё е

    OCONV is for suggestions:

    OCONV 1
    OCONV ' ’

    Thanks for your request, László