#2 REQ: Add barbarisms correction

open
nobody
None
5
2005-12-20
2005-12-20
Joan Montané
No

Well, sorry for long text :-(

* Some info about Abiword barbarisms support, my
explanation about what is a barbarism is taken from here:
Mail thread:
http://www.abisource.com/mailinglists/abiword-dev/02/Sep/0498.html

* What is a barbarism

Barbarism is a problem that manly concerns to
minority languages, i.e. languages that are
competing, in the same territory, with a more
powerful one, called "rooflanguage", for example
Welsh, Catalan, Occitan, and others.

When two languages compete in the same territory
comes up interferences, but they are not symmetric.
The roof language is weakly affected but the
minority one can be strongly affected, and can
disappear (glottophagy). One of these
interferences is barbarism.

* Example:

In Catalan: "tamany" is taked from Spanish "tamaño"
and should be corrected by "mida" or "grandària", means
"size", in English. Any spellchekcer without barbarism
support doesn't suggest "mida" or "grandària" when
tamany is checked.

* How to implement it (idea or aproach)

Using the MyThesaurus code of OpenOffice.org, with an
special thesaurus file where entries are barbarisms and
their synonyms are the correct suggestions.

Adding something like this in suggest() function
(suggestmgr.cxx):

if ((nsug < maxSug) && (nsug > -1))
nsug = barbarims(wlst, word, nsug);

And coding barbarims function:
barbarisms() must check the word in the barbarism
'thesaurus' file
If word is in 'thesaurus' file then barbarisms()
must add 'synonyms' of word as suggestions in wlst.
barbarisms() must update nsug properly.

* Known problems in this aproach:

- Working at word level, not sentence level. We are
just hacking a spell checker, not doing a grammar
checker. So, some barbarims can't be corrected. It
can't be solved.

- Currently, words that can be declined have to be
coded several times (plurals, verbs declinations, etc).
It's reported as a enhancement of MyThesaurus in OOo
(issue 19563)
http://www.openoffice.org/issues/show_bug.cgi?id=19563

Discussion

  • Joan Montané
    Joan Montané
    2005-12-30

    Logged In: YES
    user_id=1035909

    Another possiblity for this feature is what I call "custom
    user suggestions".

    Example: If an user types wrongly the same word again and
    again, but hunspell can't suggest the correct word, then the
    user can add this wrongly typed word to barbarisms data file
    with the correctly word as suggestion.

    An other example: Imagine a common very very large text (as
    company name, or any text), then user can create a dummy
    wrong word (as myword01) in the barbarims data file and add
    the correct word as suggestion (my real very large company
    name).

    Well, I've done a patch for MySpell code available on OOo
    website.
    Warning: It's a 'just working' patch, I'm only a beginer
    programer, but I hope it will be usefull.
    If you want to use it, copy mythes.cxx and mythes.hxx from
    MyThesarus code of OOo to MySpell directory and then apply
    diff.
    You must call example with 5 arguments: affix file,
    dictionary file, thesaurus index file, thesaurus data file
    and checkme file. Use with compatible thesaurus files and
    look the diferences with suggestions for words with synonyms
    in the thesaurus index.

    Yours, Joan Montané

     
  • Joan Montané
    Joan Montané
    2005-12-30

    Diff for MySpell original code

     
    Attachments
  • Logged In: YES
    user_id=726595

    Hi Joan,

    I also think, using grammar (sentence-level) checker will be
    the right solution. But now, you can use the REP table in
    the affix file for barbarism suggestion perfectly:

    REP 2
    REP tamany mida
    REP tamany grandària

    Thanks for the feature request and patch!
    I will look it.

    Best regards,

    Laci

     
  • Joan Montané
    Joan Montané
    2006-01-09

    Logged In: YES
    user_id=1035909

    Hi Laci

    Thanks you for considering this feature request. I hope it
    will be usefull.

    Well, I agree with you, a grammar checker will be the
    perfect solution (read known problems on my firt post). But
    with some spelling hack I could be a enough good solution
    for a software without grammar checker. Of course, this
    feature (hack) should be optional, many languages doesn't
    need barbarisms correction and don't should be affected with
    this new feature.

    Thanks, for enhanced REP table suggestion. I'm using it. But
    currently Abiword corrects a huge number of Catalan
    barbarisms (my language). A long equivalent REP table really
    slowns-down the spell checker.

    Yours, Joan.

     
  • Jakson
    Jakson
    2006-06-11

    Logged In: YES
    user_id=1327456

    Hello!

    Perhaps the use of barbarisms might be considered as part of
    a larger set of spelling decisions that users might do while
    writing.

    I think it would be good if hunspell had a flag to identify
    "rare words", like the text editor Vim (www.vim.org), which
    adopted the flag "?" for this purpose. However, I think that
    the expression "rare word" doesn't catch all the usefulness
    of this feature, and perhaps it would be better to think in
    a new name to represent the fact that this flag might be
    applied to any kind of correctly spelled word that might
    have been typed by mistake. In addition to rare words,
    slangs, barbarisms, professional jargons etc. could also
    receive this flag.

    Probably it would be easier to make full use of this feature
    if word lists were built by category, so each person would
    be able to compile the .dic file that better suits his
    needs. Suppose, for example, that we have three lists:
    "general", "medicine", and "slang". For most users, the
    words in the "medicine" and "slang" lists should be marked
    with the flag "?". Medical doctors, however, would be better
    off using a dictionary which doesn't classify the words in
    the "medicine" list as "rare". Of course, medical words that
    are known and used by most people should be in the "general"
    list.

    Best regards!

    Jakson