From: Gilles D. <gr...@sc...> - 2002-01-18 19:15:39
|
According to Geoff Hutchison: > On Fri, 18 Jan 2002, Gilles Detillieux wrote: > > dummy ResultList with all valid document IDs. All it needs is a method > > to call to get that list of docIDs - that's the part I need help with. > > The reason I suggested in htsearch/main.cpp is that you could get this > from the DocumentDB class if you're not in the parser. I guess htsearch > could grab this for the parser as a callback, but this was why I was > looking to skip the parser entirely. OK, I've given this some more thought. The db.docdb in 3.1.x is keyed by encoded URLs, so to get a list of docIDs from it, you'd essentially need to read in and decode every record in that database just to get at the docIDs. Wouldn't it be much quicker to get them from the db.docs.index, which is keyed by docID in 3.1? It's a smaller database, and you'd just need to traverse the "cursor" part of it to get the list of keys. > > dummy record into the db.words.db with a list of cooked-up WordRecords. > > That would work, but it's not as clean as I'd like. > > It could also be a pretty big record. Yup, which is a big part of the reason this isn't as clean as I'd like. > > combining * with and or or doesn't really get you anything, but it might > > be nice to be able to do "* not foo". > > True. But we should make sure to remember the balance--how much code will > we add versus the utility of the feature. I see the utility of "return all > matches, then sort, restrict, etc." I also see the utility of "* not > foo," but I'm not sure it's as bulletproof. Should we pass the DocumentDB > to the parser too? Well, I'd say either we pass the doc_index filename to the parser, which would only create a database instance and open the database if it needs it, or we do the opening part in main() regardless and pass the Database pointer to the parser. I prefer the former. Come to think of it, the parser already does some "config" lookups, and I was going to add a lookup for prefix_match_character anyway, so why not just lookup doc_index too and forget adding extra stuff to pass. You know, I just may be able to code this thing after all. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |