Testing Branch - Xena 4.3.14
Ran several archive files through Xena and some failed normalisation. Some GZIP and ZIP files pass and some fail.
The current Stable Branch version of Xena (4.3.0) can normalise all of these files OK.
Some of the errors are:
For a GZIP file :
The supplied data appears to be in the Office 2007+ XML. POI only supports OLE2 Office documents
Trace:
org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:108)
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151)
au.gov.naa.digipres.xena.plugin.office.MicrosoftOfficeGuesser.officeTypeMatched(MicrosoftOfficeGuesser.java:129)
au.gov.naa.digipres.xena.plugin.office.MicrosoftOfficeGuesser.guess(MicrosoftOfficeGuesser.java:103)
au.gov.naa.digipres.xena.plugin.office.spreadsheet.XlsxGuesser.guess(XlsxGuesser.java:70)
au.gov.naa.digipres.xena.kernel.guesser.GuesserManager.getBestGuess(GuesserManager.java:376)
au.gov.naa.digipres.xena.kernel.guesser.GuesserManager.mostLikelyType(GuesserManager.java:260)
au.gov.naa.digipres.xena.kernel.guesser.GuesserManager.mostLikelyType(GuesserManager.java:240)
au.gov.naa.digipres.xena.plugin.archive.ArchiveNormaliser.parse(ArchiveNormaliser.java:96)
au.gov.naa.digipres.xena.plugin.archive.gzip.GZipNormaliser.parse(GZipNormaliser.java:114)
au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.parse(NormaliserManager.java:817)
au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.normalise(NormaliserManager.java:1005)
au.gov.naa.digipres.xena.core.Xena.normalise(Xena.java:595)
au.gov.naa.digipres.xena.core.Xena.normalise(Xena.java:539)
au.gov.naa.digipres.xena.litegui.NormalisationThread.normaliseFile(NormalisationThread.java:324)
au.gov.naa.digipres.xena.litegui.NormalisationThread.normaliseStandard(NormalisationThread.java:246)
au.gov.naa.digipres.xena.litegui.NormalisationThread.run(NormalisationThread.java:187)
Another GZIP file:
org.xml.sax.SAXException: Cannot connect to OpenOffice.org - possibly something wrong with the input file
Trace:
au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.parse(NormaliserManager.java:826)
au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.normalise(NormaliserManager.java:1005)
au.gov.naa.digipres.xena.core.Xena.normalise(Xena.java:595)
au.gov.naa.digipres.xena.core.Xena.normalise(Xena.java:539)
au.gov.naa.digipres.xena.litegui.NormalisationThread.normaliseFile(NormalisationThread.java:324)
au.gov.naa.digipres.xena.litegui.NormalisationThread.normaliseStandard(NormalisationThread.java:246)
au.gov.naa.digipres.xena.litegui.NormalisationThread.run(NormalisationThread.java:187)
A Zip file:
The supplied data appears to be in the Office 2007+ XML. POI only supports OLE2 Office documents
Trace:
org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:108)
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151)
au.gov.naa.digipres.xena.plugin.office.MicrosoftOfficeGuesser.officeTypeMatched(MicrosoftOfficeGuesser.java:129)
au.gov.naa.digipres.xena.plugin.office.MicrosoftOfficeGuesser.guess(MicrosoftOfficeGuesser.java:103)
au.gov.naa.digipres.xena.plugin.office.spreadsheet.XlsxGuesser.guess(XlsxGuesser.java:70)
au.gov.naa.digipres.xena.kernel.guesser.GuesserManager.getBestGuess(GuesserManager.java:376)
au.gov.naa.digipres.xena.kernel.guesser.GuesserManager.mostLikelyType(GuesserManager.java:260)
au.gov.naa.digipres.xena.kernel.guesser.GuesserManager.mostLikelyType(GuesserManager.java:240)
au.gov.naa.digipres.xena.plugin.archive.ArchiveNormaliser.parse(ArchiveNormaliser.java:96)
au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.parse(NormaliserManager.java:817)
au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.normalise(NormaliserManager.java:1005)
au.gov.naa.digipres.xena.core.Xena.normalise(Xena.java:595)
au.gov.naa.digipres.xena.core.Xena.normalise(Xena.java:539)
au.gov.naa.digipres.xena.litegui.NormalisationThread.normaliseFile(NormalisationThread.java:324)
au.gov.naa.digipres.xena.litegui.NormalisationThread.normaliseStandard(NormalisationThread.java:246)
au.gov.naa.digipres.xena.litegui.NormalisationThread.run(NormalisationThread.java:187)
There were a number of problems with archive files that needed to be fixed:
* 0-length archive files are not valid and cause problems when attempting to operate on them. 0-length files are now more likely to be guessed as binary files.
* The wrong magic number was being used for jar files, causing them to be guessed as .odt files.
* Problems with the plaintext normaliser for files inside the archives.
Fixes made in Xena v4.3.16, archive v1.2.4, plaintext v3.4.3 (testing branches)
Tested in Testing Branch - Xena v4.3.16, archive v1.2.4, plaintext v3.4.3.
Ran selection of zip and gzip files through Xena. All were normalised correctly.
Tested in Testing Branch - Xena v4.3.16, archive v1.2.4, plaintext v3.4.3.
I've tested this a bit more thoroughly and there appears to be some problems with exporting archive files.
I normalised a group of archive files (.zip, .gz, .jar)
Opened the archive from the Normalisation Results
From within Xena Viewer, I attempted to Export the archive.
Some allow me to export and others throw up the error:
"java.util.zip.ZipException: duplicate entry. org/w3c/dom/UserDataHandler.class"
Terminal output:
"au.gov.naa.digipres.xena.kernel.XenaException: java.util.zip.ZipException: duplicate entry: org/w3c/dom/UserDataHandler.class
at au.gov.naa.digipres.xena.core.Xena.export(Xena.java:765)
at au.gov.naa.digipres.xena.viewer.NormalisedObjectViewFrame.exportXenaFile(NormalisedObjectViewFrame.java:275)
at au.gov.naa.digipres.xena.viewer.NormalisedObjectViewFrame.access$300(NormalisedObjectViewFrame.java:69)
at au.gov.naa.digipres.xena.viewer.NormalisedObjectViewFrame$2.actionPerformed(NormalisedObjectViewFrame.java:157)
..."
This only happens with some archive files - too large to attach here but one is the xena.jar
opening based on last comment (though a different problem to what was originally reported)
Tested in Stable Branch v5.0.0
Looks like this is still an issue in current stable branch.
When exporting Xena.jar, I'm still getting the error:
"java.util.zip.ZipException: duplicate entry.
org/w3c/dom/UserDataHandler.class"
Tested in Xena imageMagicFix branch (Date: Thu Aug 25 13:49:19 2011 +1000)
Still an issue.
Console output:
Destination: /home/al/Xena/Destination/xena.jar_Zip.xena
au.gov.naa.digipres.xena.kernel.XenaException: org.xml.sax.SAXException: Problem exporting archive entry org/w3c/dom/UserDataHandler.class
java.util.zip.ZipException: duplicate entry: org/w3c/dom/UserDataHandler.class
at au.gov.naa.digipres.xena.core.Xena.export(Xena.java:911)
at au.gov.naa.digipres.xena.viewer.NormalisedObjectViewDialog.exportXenaFile(NormalisedObjectViewDialog.java:305)
at au.gov.naa.digipres.xena.viewer.NormalisedObjectViewDialog.access$300(NormalisedObjectViewDialog.java:72)
at au.gov.naa.digipres.xena.viewer.NormalisedObjectViewDialog$2.actionPerformed(NormalisedObjectViewDialog.java:174)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2012)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2335)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:404)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:253)
at java.awt.Component.processMouseEvent(Component.java:6203)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3267)
at java.awt.Component.processEvent(Component.java:5968)
at java.awt.Container.processEvent(Container.java:2105)
at java.awt.Component.dispatchEventImpl(Component.java:4564)
at java.awt.Container.dispatchEventImpl(Container.java:2163)
at java.awt.Component.dispatchEvent(Component.java:4390)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4461)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4125)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4055)
at java.awt.Container.dispatchEventImpl(Container.java:2149)
at java.awt.Window.dispatchEventImpl(Window.java:2478)
at java.awt.Component.dispatchEvent(Component.java:4390)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:649)
at java.awt.EventQueue.access$000(EventQueue.java:96)
at java.awt.EventQueue$1.run(EventQueue.java:608)
at java.awt.EventQueue$1.run(EventQueue.java:606)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:105)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:116)
at java.awt.EventQueue$2.run(EventQueue.java:622)
at java.awt.EventQueue$2.run(EventQueue.java:620)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:105)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:619)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:275)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:200)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:190)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:185)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:177)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:138)
Caused by: org.xml.sax.SAXException: Problem exporting archive entry org/w3c/dom/UserDataHandler.class
java.util.zip.ZipException: duplicate entry: org/w3c/dom/UserDataHandler.class
at au.gov.naa.digipres.xena.plugin.archive.ArchiveDeNormaliser.startElement(ArchiveDeNormaliser.java:166)
at org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
at au.gov.naa.digipres.xena.kernel.metadatawrapper.DefaultUnwrapper.startElement(DefaultUnwrapper.java:40)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.export(NormaliserManager.java:1666)
at au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.export(NormaliserManager.java:1420)
at au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.export(NormaliserManager.java:1391)
at au.gov.naa.digipres.xena.kernel.normalise.NormaliserManager.export(NormaliserManager.java:1347)
at au.gov.naa.digipres.xena.core.Xena.export(Xena.java:905)
... 39 more
Caused by: java.util.zip.ZipException: duplicate entry: org/w3c/dom/UserDataHandler.class
at java.util.zip.ZipOutputStream.putNextEntry(ZipOutputStream.java:192)
at au.gov.naa.digipres.xena.plugin.archive.ArchiveDeNormaliser.startElement(ArchiveDeNormaliser.java:151)
... 56 more