Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Peculiarity with stems with identical flags

Help
2008-06-18
2013-06-03
  • Børre Gaup
    Børre Gaup
    2008-06-18

    I'm in the process of making Northern and Lule Sami hunspell dictionaries. During this work I've experienced this problem:
    The word ašeahtažan is accepted, but not ašeahtaža, even though ža and žan have identical flags.

    dic file:
    ašeahta/65517
    ža/28,65516,63005
    žan/28,65516,63005

    aff file:
    SET UTF-8
    FLAG num

    COMPOUNDBEGIN 63001
    COMPOUNDMIDDLE 63002
    COMPOUNDLAST 63003
    COMPOUNDFORBIDFLAG 63004
    ONLYINCOMPOUND 63005
    NEEDAFFIX 63006

    COMPOUNDRULE 1
    COMPOUNDRULE 65517,65516

    SFX 28 Y 1
    SFX 28 0 0/63003 . NIE

    One similar peculiarity:
    A short wordlist contains these four words. The third word is a constructed word, the other ones are real words.

    ađđamažžaseaskka
    ađđamaž
    ađđamažs
    ađđameamet

    ađđamaž is not accepted, but ađđamažs is, same issue with the flags as above.

    dic file:
    3
    až/2,65534
    ažs/2,65534
    ažža/1,65534
    ađđam/3,65535

    aff file:
    SET UTF-8
    FLAG num

    COMPOUNDBEGIN 63001
    COMPOUNDMIDDLE 63002
    COMPOUNDLAST 63003
    COMPOUNDFORBIDFLAG 63004
    ONLYINCOMPOUND 63005
    NEEDAFFIX 63006

    COMPOUNDRULE 1
    COMPOUNDRULE 65535,65534

    SFX 1 Y 1
    SFX 1 0 seaskka/63003 . NIE

    SFX 2 Y 1
    SFX 2 0 0/63003 . NIE

    SFX 3 Y 1
    SFX 3 0 eamet/63003 . NIE

    This is tested with hunspell 1.1.9 (kubuntu package) and 1.2.2 (self built) on Kubuntu 8.04.

    Is there an error in the dic or aff files? Could it be an issue with non-ascii chars?

    regards,
    Børre Gaup

     
    • Børre Gaup
      Børre Gaup
      2008-06-18

      Answering myself.
      I was wondering what was wrong, and it suddenly struck me that I had to use COMPOUNDMIN in the aff file. So adding COMPOUNDMIN 2 in the aff file solves the problem