select N most important words from query, not first N

Status: Alpha

Brought to you by: hughewilliams, nlester

#17 select N most important words from query, not first N

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2003-08-13

Created: 2003-08-13

Creator: Nicholas Lester

Private: No

currently lucy selects the first N words from a query,
where N is a compile time constant (for efficiency
reasons). A better strategy would be to select the N
most selective words (by inverse document frequency),
with a cutoff to prevent lots of parsing for really
long queries. Phrases and ANDs could be assumed to be
more selective than individual words. They could also
be included on a first come basis, given that this
simplifies things and we can only estimate their IDF
anyway.

select N most important words from query, not first N

Group

Searches

Help

#17 select N most important words from query, not first N

Discussion