|
From: Chris V. <cv...@gm...> - 2007-11-19 22:58:00
|
HI all, I'm back to working on this problem, but still with no success. Basically, what I want to do is to add additional metadata to an index created by NutchWax. I am able to add new fields and values to documents using standard Lucene classes, IndexReader, IndexWriter, IndexSearcher, and SimpleAnalyzer, and following the proper technique for updating Lucene documents (I think). New fields are added as stored, indexed, and un-tokenized. After the documents are updated, there is some strange behavior during querying. Queries against collection, date, url, and the newly added fields work fine. Unfortunately, queries against content and title no longer work. So it seems like the technique I'm using to update the documents is either insufficient (further action on index components is needed), or damaging (mangling part of the index or documents). If anyone is interested, I have a small sample index that exhibits the problem. Any insight is greatly appreciated. Thanks, Chris On Oct 23, 2007 5:06 PM, Chris Vicary <cv...@gm...> wrote: > Hi, > > I'd like to add extra metadata to indexes produced by NutchWax. The goal is > to perform searches against this metadata and full text at the same time. My > initial idea is to update documents similarly to suggested practices for > updating documents in Lucene indexes: retrieve documents based on search > term(s), delete documents from index, add new fields to documents, and then > add documents back to index. I am able to follow this strategy using the > Lucene 2.0 classes IndexSearcher, IndexReader and IndexWriter (or > IndexModifier). After the index documents have been updated, I can query > against the new metadata using the IndexSearcher class without any problem. > I can also use Luke to view the contents of the index and verify that the > metadata has been added to the documents. The problem is that once the > Index* classes are done updating the index documents, the NutchWax webapp is > unable to locate those documents (even after a restart). > > My question is what is the best way to add fields to NutchWax index > documents? Are there any Nutch or NutchWax classes I should use instead of > the Lucene Index* classes (I didn't see any likely candidates in either > project)? Is it possible I am leaving out some important steps when using > the Lucene Index* classes? > > Any help is appreciated, > > Chris > |