From: Jeff B. <bre...@gm...> - 2005-10-21 04:01:00
|
> > Neal, are you tracking the Java Lucene dev lists? There's > > some recent discussion with respect to index interoperability > > that may be relevant. > > Not yet... just the Clucene list. I'll have a look. Here's some starting points maybe worth half an eyeball: The UTF-8 interoperability thread http://www.mail-archive.com/jav...@lu.../msg01970.html Interoperability with Perl Lucene http://www.mail-archive.com/jav...@lu.../msg02187.html Features in the approaching Java Lucene 1.9 http://www.mail-archive.com/jav...@lu.../msg02284.html Debian & Kaffe, Redhat & GCJ http://www.mail-archive.com/jav...@lu.../msg02092.html > We have been able to verify that the Java Lucene tool 'luke' is able to > read and query the indexes produced by CLucene. Very cool. > > The names of the searchable-fields we are using at this point is likely > different than nutch. Might be worth a look to see how different. As of Nutch 0.7.1, the crawler + indexer is getting close. If it had an easy to configure equivalent to HtDig's "local_urls" and "<!--htdig_noindex-->" features I think it would probably be good enough. Running Java for these operations does not feel like such a big deal, and maybe there would be GCJ magic to ease the pain. The search portion is a different story and requiring Tomcat is kind of a pain in the butt. If some miracle occurred and htdig 4.0 and nutch were super-compatible, I could imagine wanting to use htsearch against a nutch built index. Dropping a search program into cgi-bin is really convenient. > If you look at the 4.0 cvs branch, we've devised a pretty cool method o= f > using an STL map container to hold the fieldname & fieldtext pairs with > index/noindex and store/nostore flags. These are filled per document > during htdig's parsing. > > It makes the htdig<->clucene interface very elegant. I'm a straight C guy, so STL is a little beyond me. But I like the sound of elegant and am tracking the blog. |