Fields and Metadata

David Fisher

There may be some confusion over the concepts of fields and metadata within the Lemur Toolkit.

In essence, "fields" are extents across textual content of a document (e.g. <h1>heading</h1> in an html document). When a field name is indexed, it is available for use in an Indri query language query. As a rule of thumb, any xml markup in a document can be used as an indexable field as long as the field is specified in the index parameter file when the collection is indexed.

"Metadata" fields contain textual values that may or may not occur in the textual content of a document. Metadata fields can not be indexed for use in an Indri query language query. The value of metadata fields, such as "docno" or "url" can be retrieved from the Indri Repository via the QueryEnvironment API. If a metadata field has been reverse indexed (the metadata.backward parameter) the list of document ids that match the specified metadata value can be retrieved by the QueryEnvironment API. The API also provides a method for retrieving the entire ParsedDocument representation, rather than just the document ids.


Related

Wiki: Home
Wiki: Overview

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks