Menu

#270 Condition field correction (dot and 'no more characters')

v1.0 (example)
open
nobody
None
5
2015-05-09
2015-05-08
Andrey
No

Hello,
I was making dictionary and faced the situation where two words with similar endings must be in one SFX flag.
Plain example,

file.dic

2
work/A
rework/A

file.aff

SFX A Y 2
SFX A 0 less work #work --> workless
SFX A 0 ing ework #work --> working; rework -> reworking

The target is, second word (rework) must not be affixed with first rule. But at the same time the first word (work) must be.
It is logicaly that dot (.) must be the first character in condition field to permit affixing word that has same ending. And if there's no dot (.) at the first place - the condition must mean "only these characters"
Like:

SFX A 0 ing .work #rework -> reworking; but no 'work -> working' affixing.

and

SFX A 0 ing work #work -> working; but no 'rework -> reworking' affixing.

So, is it possible to fix this in next release?

Discussion

  • swan

    swan - 2015-05-08

    I do not understand, why do you insist to have the same class for work and rework.

    You can simply say:

    file.dic
    work/A
    rework/B

    file.aff
    SFX A Y 2
    SFX A 0 less #work --> workless
    SFX A 0 ing #work --> working; rework -> reworking

    SFX B Y 1
    SFX B 0 ing #work --> working; rework -> reworking

     

    Last edit: swan 2015-05-08
  • Andrey

    Andrey - 2015-05-08

    Yes, I know. The "work" and "rework" is just abstract example in English. The actual words are Russian and have very much mnemonics for each part of speech. In the way you have suggested the affix file becomes very difficult, and it is good to set one flag for one grammar mnemonic. Anyway it's all the lyrics...
    IINM, the condition field is 'regular expression'. So, maybe there is way to set my needs by regular staff (without modifying sources). The only thing is to point "no symbol at all" in the appropriate character position.

     
  • swan

    swan - 2015-05-09

    If your wish can be satisfied without code modification, then I have no objections.

    Some thoughts about affix/dict building:
    Because of language complexity, flags can be any utf8 character (at least 256 characters) and flags can be even 2 characters long, which means 256x256 possible flags.

    .aff and .dict files for Hungarian and Finnish, probably also for Turkish (if any) are not manually created but using scripting like perl or awk/shell. They base on the fact, that there are word classes, that can use the same group of flags.

    see: http://manpages.ubuntu.com/manpages/dapper/man4/hunspell.4.html

    FLAG long
    SFX Y1 Y 1
    SFX Y1 0 s 1

    dictionary example using flags Y1, Z3 and F?
    foo/Y1Z3F?


    Maybe you should consider the above strategy also for Russian, that has doubtless more complexity in affixes, than English or German.

     
  • Andrey

    Andrey - 2015-05-09

    If your wish can be satisfied without code modification, then I have no objections.

    What metacharacters are allowed in condition field? I have tried $?.()[]^\ but not found any that sets end of word length. But there must be mechanism to calculate word length.
    Maybe two empty brackets with circumflex [^] can be used for pointing word end.

    I see comment in affentry.cxx

    // upon entry suffix is 0 length or already matches the end of the word.
    // So if the remaining root word has positive length
    // and if there are enough chars in root word and added back strip chars
    // to meet the number of characters conditions, then test it

    isn't it? How it's working?

     

    Last edit: Andrey 2015-05-09
  • swan

    swan - 2015-05-09

    http://manpages.ubuntu.com/manpages/dapper/man4/hunspell.4.html

    (4) condition.

    Zero stripping or affix are indicated by zero. Zero condition is
    indicated by dot. Condition is a simplified, regular
    expression-like pattern, which must be met before the affix can
    be applied. (Dot signs an arbitrary character. Characters in
    braces sign an arbitrary character from the character subset.
    Dash hasn’t got special meaning, but circumflex (^) next the
    first brace sets the complementer character set.)

    Comment: It is very simple. Intelligence must be in .aff/.dic
    generation program.