Menu

#220 Hunspell should prefer inflection to compounding

open
nobody
None
5
2014-11-06
2012-07-09
Anonymous
No

The Hungarian word 'szekrénysort' is analysed as szekrény + sort, a nonsensical compound of two nouns, instead of szekrénysor + t, i.e. a noun + the accusative mark.

My best guess is that Hunspell prefers compounding to inflection during analysis, which is clearly wrong in this case. I cannot bring any more examples from the top of my head, so I cannot test this hypothesis; but I decided to write it down nevertheless in case it may be of some help.

Discussion

  • Eleonora

    Eleonora - 2012-11-11

    If the compound word were 'vászonsort', then more probable vászon+sort was the most probable root, and the inflection version: vászon+Sor+acc was unlike.
    Same type of word is 'néppárt', it is likely nép+párt, and less likely nép+pár+acc.

     
  • The Polytonic Project

    Hunspell does indeed prefer compounding to inflection.
    Here is what I have found out while testing a greek dictionary - Note that greek has both inflection and compounding.

    If a non-compound word is checked, then hunspell will first suggest all compounds that may be produced with all variations of the wrongly-spelled word, and then - only then - will it suggest the possible correct spellings of the original (non-compound) word. This is a problem because what the user expects and needs may not appear at all, and if it does it is far from being one of the first suggestions.

    This is indeed a problem.

    Possible way around?

    Order suggestions by their Levenstein closeness.
    Even simpler, find suggestions with the same number of chars as the wrongly spelled word, and then add more by progressively increasing word length.