Menu

#340 Office convert to HTML results in incorrect metadata

open
nobody
Normaliser (38)
4
2011-08-01
2011-07-28
No

This bug relates to the xena-named-output branch.

When an office file is normalised with the output conversion type chosen as HTML the metadata is wrong. This is because the process is done twice behind the scenes and thus the resulting metadata refers to the temporary file uses as an intermediate part of the processing rather than the original file. This effects metadata items such as input_source_uri, etc.

Discussion

  • Allan Cunliffe

    Allan Cunliffe - 2011-07-28

    1. I altered the default settings for Office Properties to select the output format for word processing documents to be HTML
    2. I normalised a Word document with the Migrate to Open Formate Only option *deselected*

    Result: Xena created three files in the destination directory:

    * sample.doc.html.wsx_Website.xena [metadata indicates MIME of application/zip]
    * sample.doc.html_HTML.xena [not a proper Xena file]
    * sample.doc_Office.xena not a proper Xena file]

    sample.doc.html_HTML.xena looks like an attempt to create a HTML page from the original - this shouldn't happen as I should only see .xena files in the destination directory with the settings I selected.

    sample.doc_Office.xena has empty content tags

    Something is definitely wrong here - I should only ever get one file based on the settings I selected and the original file.

     
  • Michael Carden

    Michael Carden - 2011-08-01
    • priority: 8 --> 4
     

Log in to post a comment.