Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Fields and Metadata

David Fisher

There may be some confusion over the concepts of fields and metadata within the Lemur Toolkit.

In essence, "fields" are extents across textual content of a document (e.g. <h1>heading</h1> in an html document). When a field name is indexed, it is available for use in an Indri query language query. As a rule of thumb, any xml markup in a document can be used as an indexable field as long as the field is specified in the index parameter file when the collection is indexed.

"Metadata" fields contain textual values that may or may not occur in the textual content of a document. Metadata fields can not be indexed for use in an Indri query language query. The value of metadata fields, such as "docno" or "url" can be retrieved from the Indri Repository via the QueryEnvironment API. If a metadata field has been reverse indexed (the metadata.backward parameter) the list of document ids that match the specified metadata value can be retrieved by the QueryEnvironment API. The API also provides a method for retrieving the entire ParsedDocument representation, rather than just the document ids.


Related

Wiki: Home
Wiki: Overview