#63 MagicMimeTypeIdentifier mixes up subcrawled uris

1.2.0
closed-fixed
nobody
None
5
2008-10-20
2008-10-14
No

I have a .eml file with a zip attachment and the zip contains three documents two of the three are in the new docx format when AutoFocus processes them, the docx are also seen as ZIP files

the MagicMimeTypeidentifier says its a zip file, but it should have classified it using the docx MIME type i.e. the magic number test for ZIP passes but the file extension checkk doesn't

I ran the code in a debugger one of the docx files gets the following URI:

zip:mime:file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml!/86b313dc282850fef1762fb400171750%2540amrapali.com#1!/Board+paper.docx

this looks ok to me the hex part looks a bit scary, but that doesn't give any problems, it seems it goes wrong when I apply the MIME type identifier I invoke it as follows:

String result = identifier.identify(bytes, null, metadata.getDescribedUri());

when you look at the MagicMimeTypeIdentifier, down from line 429, you'll see that it tries to determine the file extension from the file name and what there is no file name, it takes the URI instead the URI has the fragment identifier and query string removed, before the last dot is searched for for this URI, this removes the part that contains the actual file name instead, it finds "com" as the extension, which is total nonsense

possible solutions:
(1) refine the code for deriving a file extension from a URI, e.g. by adding knowledge about the subcrawler URIs and extracting the child part
(2) see if the DataObjects have a file name property and pass that value to the MagicMimeTypeIdentifier, as it first looks at the file name for the extension before it moves over to the URI

(1) is a fix in Aperture, (2) is a fix in my own code

Discussion

  • Antoni Mylka

    Antoni Mylka - 2008-10-14

    Long description - short fix. Please review and apply the patch.
    File Added: aperture-sf2166890.patch

     
  • Antoni Mylka

    Antoni Mylka - 2008-10-14
     
  • Antoni Mylka

    Antoni Mylka - 2008-10-14
    • status: open --> pending
     
  • Antoni Mylka

    Antoni Mylka - 2008-10-14

    switched the status to PENDING

     
  • Antoni Mylka

    Antoni Mylka - 2008-10-14

    and added to the 1.2.0.beta group

     
  • Antoni Mylka

    Antoni Mylka - 2008-10-14
    • milestone: --> 1.2.0
    • status: pending --> open
     
  • Antoni Mylka

    Antoni Mylka - 2008-10-14
    • status: open --> pending
     
  • Antoni Mylka

    Antoni Mylka - 2008-10-20

    committed the patch in r1455

    I close this issue

     
  • Antoni Mylka

    Antoni Mylka - 2008-10-20
    • status: pending --> closed-fixed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks