#1 Enable generation of textMD property for textual files


Here is a patch against 1.4 version so that Jhove can generate a property conformant with the textMD schema (see http://www.loc.gov/standards/textMD\) for textual files.

The initial though was to make a simple XSLT transform over the output of jhove in order to generate this information but this doesn't work well because
1/ not all the needed information is generated by jhove or the output information is already bundled and
2/ the correct management of the charset and the language need to be programmatically verified.

This patch modifies 4 modules : ASCII-hul, UT8-hul, HTML-hul and XML-hul (the version number has been modified approprietely).
A parameter 'withTextMD=true' activates for each module the generation of the property (see jhove-withTextMD.conf, for an example)
The default is to not generate it to behave as before.
I added the determination of the line ending in html and xml to be able to generate the required element :
there is no performance penalty since the stream classes have been modified using the same algorithm that the one in ASCII module.

I decided NOT to add a TextMDMetadata property type so that the schema jhove.xsd will be unchanged.
So the TextMDMetadata property is of OBJECT type.
The TextHandler and XmlHandler are modified to generate the information (the version number has been modified appropriately).

Hope this patch could be added into Jhove to enhance its handling of textual files.

Thanks for your attention.


  • Thomas

    Patch over 1.4 version to add the generation of a TextMDMetadata property