Hunspell / Bugs (archive) / #12 Spellchecker efficiency

#12 Spellchecker efficiency

Milestone: v1.0 (example)

Status: closed-out-of-date

Owner: nobody

Labels: None

Priority: 5

Updated: 2014-10-17

Created: 2006-03-13

Creator: Anonymous

Private: No

Hi,

With larger lexica (several 100,000 words), there are
efficiency problems. Profile information suggest that
the data structures and algorithms could be improved.
It might be worthwhile to read up on the topic.
(Sorry that this is rather vague: I didn't profile the
code (colleagues did), but work with the programme very
often - and I am a computational linguist, i.e. aware
of some of the issues.)

Discussion

Németh László - 2006-03-27

Logged In: YES
user_id=726595

Hi,

I have measured only enormous time (5 s) for 320 000 words
(Hebrew dictionary), and big suggestion times.
Simple spell() call is reasonably fast.

I can suggest some possible optimizations:

- use affix compression (see src/tools/munch)
- use alias compression (make alias compressed aff and dic
file with src/tools/makealias script)
- set bigger hash size in the first line of the dic file
- don't use suggestions or set off ngram suggestions with
MAXNGRAMSUGS 0
affix file parameter.
- use twofold-suffix compression (unfortunatelly, I haven't
implemented the right tool for it, yet)
- try spell checker of Vim or Aspell. Vim spell and Aspell
use different (perhaps faster) algorithms (but they havn't
supported the twofold-suffix compression, yet).

I also plan an improved version with some optimization.
Was loading time or other run-time performance (suggestion)
the bigger problem for you?

Many thanks for your report.

Best regards,

Laci

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Németh László - 2006-03-28

Logged In: YES
user_id=726595

> I have measured only enormous time (5 s) for 320 000 words

time = loading time of dictionary. Sorry.

Laci

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eleonora - 2008-09-23

Aspell current versions do handle twofold suffix compression (2008. Sept)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Németh László - 2008-09-24

> Aspell current versions do handle twofold suffix compression

It is a good news. Thanks for it. I believe, it will be a big help for dictionary developers of agglutinative languages.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adrián Chaves Fernández - 2014-09-09

Does it make sense to keep this report open?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

caolan mcnamara - 2014-10-17

status: open --> closed-out-of-date

Group: --> v1.0 (example)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link: