Re: [bailey-developers] lattice master

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Yonik Seeley wrote:
> It depends on the documents I guess... if they are big, putting them
> in the index can be a burden because they get copied on every segment
> merge, and loading the other stored fields takes longer.

Didn't Mike change that?  Segments can now point to fields in a separate 
file, according to:

http://lucene.apache.org/java/docs/fileformats.html#Segments%20File

I think that's so that they don't have to be copied with every merge.

> There are also two levels of "Document"... things like PDF, Word, etc,
> also need to be parsed and have fields extracted make a lucene-style
> Document.  I assume that's out of scope for this project though.

Yes, I think an application could implement that with, e.g., a binary 
field for the raw data, another field for the mime type, and a third for 
the extracted text to index.  The raw data and text might be compressed.

Doug