Reduce broadness of suggestionlist

Help
2008-01-19
2013-06-03
  • Hunspell sometimes gives pretty ludicrous suggestions with the Danish dictionary. It seems to be too liberal in the number of differences it allows between the wrong word and the suggestions.

    For example in replacement of cyberspace (which should be included in the wordlist, but isn't yet) it proposes:
    cyberpunk
    cockerspaniel
    racercyklers
    beachpartyers

    Is there any way to reduce this broadness of suggestions?

    Thanks!
    Jeppe

     
    • Yes, it is. I suggest to use MAXNGRAMSUGS to limit the number of n-gram suggestions (for example, use 3 instead of 5). Suggestion analysis based on the different distance functions of the different parallel correction algorithms of Hunspell is not trivial. In fact, the best method to extend the dictionary with "cyberspace" and other correct words.

       
    • peter rejto
      peter rejto
      2008-04-26

      "Suggestion analysis based on the different distance functions of the different parallel correction algorithms of Hunspell ....."

      Dear All,

      I am a mathematician and of course I do not understand your delicate discussions. However the concept of the distance function  caught my eyes. In fact, I have a hunch that you Hunspell Developers have borrowed it from Mathematics. If my hunch is correct, I would appreciate getting references to this concept.

      Thanks,

      -peter

       
      • Dear Peter,

        I plan to give more information about correction algorithms. In the meantime you can check them in the code (see src/hunspell/suggestmgr.* and src/hunspell/phonet.*). Keywords: 1-character distance, ngram with affixation, weighted LCS, phonetic similarity (dictionary based similarity and Aspell table driven transformation), rep (string replacement for correction of the typical language dependent mistakes), similarity based on predefined character sets (map suggestion).

        Regards, Laci