Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#63 MagicMimeTypeIdentifier mixes up subcrawled uris

1.2.0
closed-fixed
nobody
None
5
2008-10-20
2008-10-14
Antoni Mylka
No

I have a .eml file with a zip attachment and the zip contains three documents two of the three are in the new docx format when AutoFocus processes them, the docx are also seen as ZIP files

the MagicMimeTypeidentifier says its a zip file, but it should have classified it using the docx MIME type i.e. the magic number test for ZIP passes but the file extension checkk doesn't

I ran the code in a debugger one of the docx files gets the following URI:

zip:mime:file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml!/86b313dc282850fef1762fb400171750%2540amrapali.com#1!/Board+paper.docx

this looks ok to me the hex part looks a bit scary, but that doesn't give any problems, it seems it goes wrong when I apply the MIME type identifier I invoke it as follows:

String result = identifier.identify(bytes, null, metadata.getDescribedUri());

when you look at the MagicMimeTypeIdentifier, down from line 429, you'll see that it tries to determine the file extension from the file name and what there is no file name, it takes the URI instead the URI has the fragment identifier and query string removed, before the last dot is searched for for this URI, this removes the part that contains the actual file name instead, it finds "com" as the extension, which is total nonsense

possible solutions:
(1) refine the code for deriving a file extension from a URI, e.g. by adding knowledge about the subcrawler URIs and extracting the child part
(2) see if the DataObjects have a file name property and pass that value to the MagicMimeTypeIdentifier, as it first looks at the file name for the extension before it moves over to the URI

(1) is a fix in Aperture, (2) is a fix in my own code

Discussion

  • Antoni Mylka
    Antoni Mylka
    2008-10-14

    Long description - short fix. Please review and apply the patch.
    File Added: aperture-sf2166890.patch

     
  • Antoni Mylka
    Antoni Mylka
    2008-10-14

    • status: open --> pending
     
  • Antoni Mylka
    Antoni Mylka
    2008-10-14

    switched the status to PENDING

     
  • Antoni Mylka
    Antoni Mylka
    2008-10-14

    and added to the 1.2.0.beta group

     
  • Antoni Mylka
    Antoni Mylka
    2008-10-14

    • milestone: --> 1.2.0
    • status: pending --> open
     
  • Antoni Mylka
    Antoni Mylka
    2008-10-14

    • status: open --> pending
     
  • Antoni Mylka
    Antoni Mylka
    2008-10-20

    committed the patch in r1455

    I close this issue

     
  • Antoni Mylka
    Antoni Mylka
    2008-10-20

    • status: pending --> closed-fixed