[larm-dev] Dublin Core Metadata Engine
Brought to you by:
cmarschner,
otis
From: Jeff L. <je...@gr...> - 2003-06-14 16:14:22
|
Hi, This would go under record processing. Let me know what you think about the design. Jeff Dublin Core Metadata Indexing The record processor should be able to optionally handle Dublin Core metadata elements inside the content that is being indexed, or as part of an RDF record that is external to the content. Because these metadata elements are standard, we can use that to add fields to the Lucene Document for each of the metadata elements. This support is entirely optional, and can be configured. 1. The Metadata Retriever This retriever can read the Dublin Core metadata from a content element. It will support HTML, XML using the Dublin Core schema, and RDF files using the Dublin Core schema. It will not be responsible for getting the content element or RDF file from its location, but it will extract the relevant metadata from the pages. The retriever will be pluggable to support additional content formats. 2. The Metadata Engine The retriever will feed the data to the engine, which is responsible for any validation rules may be configured for the metadata to prevent spamming the search engine or inappropriate results. In addition, some metadata elements may not be allowed, and they can be removed here. Other metadata elements may only be relevant with a certain subset of URL's, and that filter may be applied here as well. 3. The Metadata Builder The builder retrieves the metadata from the engine and adds it to the Lucene document as a set of fields. The fields on the document will be mapped to metadata elements using a configuration, or defaults will be used. 4. Dublin Core metadata elements (from http://www.dublincore.org/documents/dces/) Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights 5. References http://www.dublincore.org/ http://www.dublincore.org/documents/dces/ |