#100 JpgExtractor failing on some JPEG files

closed-invalid
Antoni Mylka
general (25)
7
2009-09-25
2009-08-19
No

When running tests with of the XMPExtractor, some valid JPEG files are being flagged as not a JPEG file. The following exception is raised by the JpgExtractor:

Aug 18, 2009 8:30:08 PM org.semanticdesktop.aperture.extractor.jpg.JpgHeaderExtractor <init>
WARNING: error extracting metadata
com.drew.imaging.jpeg.JpegProcessingException: not a jpeg file
at com.drew.imaging.jpeg.JpegSegmentReader.readSegments(Unknown Source)
at com.drew.imaging.jpeg.JpegSegmentReader.<init>(Unknown Source)
at com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(Unknown Source)
at org.semanticdesktop.aperture.extractor.jpg.JpgHeaderExtractor.<init>(JpgHeaderExtractor.java:91)
at org.semanticdesktop.aperture.extractor.jpg.JpgExtractor.extract(JpgExtractor.java:40)
at org.semanticdesktop.aperture.extractor.xmp.XMPTestHelper.extractMetadataRegistry(XMPTestHelper.java:100)
at org.semanticdesktop.aperture.extractor.xmp.XMPExtractorSDKSamplesTest.testXMPSDKSamplesWitRegistry(XMPExtractorSDKSamplesTest.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:73)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:46)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
at org.junit.runners.Suite.runChild(Suite.java:115)
at org.junit.runners.Suite.runChild(Suite.java:23)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

The offending file can be found in the XMP 4.4 toolkit here:

http://www.adobe.com/devnet/xmp/

Any of the JPEG files in the XMP SDK will cause this error.

Discussion

  • It turns out that this is not limited to files in the XMP SDK, but it also appears to be an issue with JPEG files created with Adobe Photoshop CS3.

     
    • labels: 1113227 --> general
    • priority: 5 --> 7
     
  • According to filext.com, pdf files start with "%PDF-1." and Illustrator files start with "%!PS-". However, this particular .ai file starts with "%PDF-1.4", so it matches the PDF magic number. When I rename it so that it has a .pdf extension, it also opens fine in Acroread.

    I don't have Illustrator myself, can you check whether other .ai files (preferably obtained from a variety of sources) also have the PDF magic number? If this is the case, then what we can do is create an entry for .ai files in mimetypes.xml as a subtype of PDF: if it matches the PDF magic number and has a .ai file extension, then it will get classified as an Illustrator file. However, I am only comfortable with this fix if it covers the general case of .ai files.

     
  • Woops, wrong issue :) The last comment belongs in issue #2839986.

     
  • Sorry, just created a test for all files in the XMP Toolkit SDK 4.4. They all work. I get no exceptions.
    The columns in this table are
    File Name - Detected Mime type - number of triples extracted by the extractor

    BlueSquare.ai application/pdf 9
    BlueSquare.avi video/x-msvideo nothing
    BlueSquare.eps application/postscript nothing
    BlueSquare.indd null
    BlueSquare.jpg image/jpeg 3
    BlueSquare.mov video/quicktime nothing
    BlueSquare.mp3 audio/mpeg nothing
    BlueSquare.pdf application/pdf 9
    BlueSquare.png image/png nothing
    BlueSquare.psd null
    BlueSquare.tif image/tiff nothing
    BlueSquare.wav audio/x-wav nothing
    Image1.jpg image/jpeg 12
    Image2.jpg image/jpeg 11

    Please paste the code of your XMP extractor. Maybe there is something wrong with it. I will close this issue in a couple of days.

     
  • Antoni Mylka
    Antoni Mylka
    2009-09-25

    See my comment to issue 2839986. I close this ticket as invalid. Ryan, if you manage to reproduce this problem with the current trunk - please reopen.

     
  • Antoni Mylka
    Antoni Mylka
    2009-09-25

    • milestone: 893322 -->
    • assigned_to: nobody --> mylka
    • status: open --> closed-invalid