|
From: Geoff H. <ghu...@ws...> - 2002-12-03 05:01:35
|
> Indexing the stems is a good suggestion. It would > certainly give faster searching. If it replaced the > unstemmed inverted file then it would also save on storage > requirements, but it would mean we couldn't search on the > unstemmed version (if that is of concern). The general strategy used by ht://Dig is post-indexing fuzzy matching. Certainly a Porter stemming fuzzy algorithm would be quite useful. But I'd say if we intend on indexing stems, it should definitely be optional. I can think of several instances where I'd want to search on one particular word, and *not* stemmed variants. So I'd much rather see work into innovative fuzzy algorithms. Anyone want to add a real "spelling" fuzzy? What about a Porter endings fuzzy to replace/augment endings? > I have also been wondering if it is possible to turn off > word-level indexing, to give (much) smaller inverted files > if people don't need phrase searching. Does anybody know? Not at the moment. But you lose a lot more than phrase searching. You lose field-restricted searching. You lose scoring by proximity (like Google). You lose the ability to score "on the fly"--not to be discounted since many users wonder why they change their scoring factors and the results don't change. If you look at other search products, the basic strategy now is "index everything" and let the search frontend filter if needed. Yes, some even index words like the, and, not, etc. Just my $0.02, -Geoff |