#43 More documentation on NEEDAFFIX

open
nobody
None
5
2014-08-26
2013-03-14
No

Thanks for your work on Hunspell. It's a great tool that is helping a lot of people.

I'm trying to get a good handle on how Hunspell works, for work on spellcheck dictionaries for Bantu languages in Congo.

I'm trying to understand how NEEDAFFIX works. It seems to be somewhat unpredictable. I'm testing a toy dictionary for Lingala to try to figure out how NEEDAFFIX works, and I see different results based on what order I put the affixes together.

Take the word na-yeb-is-a
prefix: na
stem: yeb
suffix1: is
suffix2: aka
The only correct form (as defined for this toy dictionary) is nayebisaka.
nayeb, nayebis, yebis, yebisaka, and yeb should all be considered incorrect.

I wanted to try to accomplish this using a NEEDAFFIX flag called NA.

There are three ways I can imagine doing this:
word->S1->S2->PF (i.e. yeb/NA -> -is/NA -> -aka/NA -> na-)
word->S1->PF->S2 (i.e. yeb/NA -> is/NA -> na-/NA -> -aka)
word->PF->S1->S2 (i.e. yeb/NA -> na-/NA -> -is/NA -> -aka)

In each case, the dictionary word WD and the first 2 hops along the route are told to combine with the next hop using the NEEDAFFIX flag.

However, these are the results I get:
word->S1->S2->PF (i.e. yeb/NA -> -is/NA -> -aka/NA -> na-) (yebisaka is recognized as correctly spelled, even though I don't want it to be)
word->S1->PF->S2 (i.e. yeb/NA -> is/NA -> na-/NA -> -aka) (nayebisaka is not recognized as correctly spelled, even though I want it to be)
word->PF->S1->S2 (i.e. yeb/NA -> na-/NA -> -is/NA -> -aka) (everything works how I want it to)

I'm attaching files for these if that helps. All the .aff files have the same header:

LANG ln_CG
SET UTF-8
FLAG long
NEEDAFFIX NA

[This doesn't work... Why not?]
(suffix1 -> suffix2 -> prefix)
lingalaSSP.dic
1
yeb/NAS1

lingalaSSP.aff
SFX S1 Y 1
SFX S1 0 is/NAS2 .
SFX S2 Y 1
SFX S2 0 aka/NAPF .
PFX PF Y 1
PFX PF 0 na .

[This doesn't work... Why not?]
(suffix1 -> prefix -> suffix2)
lingalaSPS.dic
1
yeb/NAS1

lingalaSPS.aff
SFX S1 Y 1
SFX S1 0 is/NAPF .
SFX S2 Y 1
SFX S2 0 aka .
PFX PF Y 1
PFX PF 0 na/NAS2 .

[This works... Why?]
(prefix -> suffix1 -> suffix2)
lingalaPSS.dic
1
yeb/NAPF

lingalaPSS.aff
SFX S1 Y 1
SFX S1 0 is/NAS2 .
SFX S2 Y 1
SFX S2 0 aka .
PFX PF Y 1
PFX PF 0 na/NAS1 .

Are there any guidelines of how I should chain affixes together? Is NEEDAFFIX intended to work at every step along the way to the max of 3 affixes? Why does it not work in two of the examples above but does work in the third?

(By the way, I know I could potentially roll up the -is and -aka suffixes into one suffix -isaka, but what I'm really trying to do is understand how the affix chaining works in Hunspell.)

Thanks for any information you can give me!

Jeremy

Discussion

  • Jeremy Brown

    Jeremy Brown - 2013-03-14

    Three toy dictionaries to test NEEDAFFIX

     
  • Eleonora

    Eleonora - 2013-03-14

    Sorry, if the question is too stupid, but if the word is nayebisaka, and na, nayeb, yeb, yebis, nayebis, yebisaka are all incorrect, then why do not you simply define the word nayebisaka as a word, and do not bother with NEEDAFFIX and the like?

     
  • Jeremy Brown

    Jeremy Brown - 2013-03-14

    It would be a good solution if all I was trying to do was recognize nayebisaka. But what I'm really trying to do is understand what the limits are, since some of the languages I'll be working on may have even more complex morphology.

    Also, it seems to me that the NEEDAFFIX flag ought to work the same regardless of which path I take between prefix, suffix1, and suffix2. But since it does not, I want to know what the underlying rules are so I can not waste time while trying to implement some of these dictionaries. And I also want to know if the fact that one of these paths works the way I want is some undocumented feature/bug that might be removed in some later version of Hunspell, and so I shouldn't count on it, for example.

    So what you're saying would solve the spell check problem, but not my problem of understanding what my options are when designing a spell check dictionary.

    Jeremy

     
  • Eleonora

    Eleonora - 2013-03-14

    Here the simplest needaffix test from the hunspell source tree:

    needaffix.dic:
    2
    foo/YXA
    bar/Y

    needaffix.aff:
    NEEDAFFIX X
    COMPOUNDFLAG Y

    SFX A Y 1
    SFX A 0 s/Y .

    needaffix.good:
    bar
    foos
    barfoos

    Needaffix.wrong:
    foo

    If this works on your system, you can build up your tests on this example.

     
  • Jeremy Brown

    Jeremy Brown - 2013-03-14

    It's also interesting to note that the behavior is different again when COMPLEXPREFIXES is used to enable 2 prefixes and 1 suffix (instead of 2 suffixes and 1 prefix).

    Take the word na-mi-yeb-aka
    prefix1: na-
    prefix2: mi-
    stem: yeb
    suffix: -aka

    Here, 2 of the paths through 3 affixes work right when the stem and all affixes have the NEEDAFFIX flag:

    Good: word -> prefix2 -> prefix1 -> suffix
    Good: word -> suffix -> prefix2 -> prefix1
    Bad: word -> prefix2 -> suffix -> prefix1 (it thinks namiyebaka is misspelled).

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks