Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#55 MagicMimeTypeIdentifier & Open Office docs

1.6.0 - features
closed
nobody
general (27)
5
2011-11-28
2008-10-30
Greg Power
No

Use case:

Uploading files from web app, storing to temp location, extracting metadata/text prior to client approval/edit and committal to document repository.

Fault:

MagicMimeTypeIdentifier only works for Open Office files if they retain their original extensions (.odt, .ods etc). Otherwise it detects them as a zip file (application/zip) therefore the extractors don't work. For PDF's and MS Office documents on the other hand it works fine, extensions irrelevant.

Work around:

In my case get the extension from the file upload and use that for temporary file names, but that obviously won't work for identifying arbitrary binaries.

Regards
greg

Discussion

  • Antoni Mylka
    Antoni Mylka
    2008-11-13

    The mime type identifier works with so called magic numbers - byte sequences that appear at the beginning of a file. .odt files are ZIP files, you can confirm it by changing the extension to .zip and uncompressing it. There is no way to tell an ordinary zip from an odt without the extension, because both have the same magic number (begin with the bytes 'PK'). You'd have to unzip it and see what's inside, but that's beyond the scope of the mime type identifier (at least now).

    There are other zip-based formats out there the openoffice and ms office 2007 ones being the most prominent examples. Making the mime type identifier aware of these is a much more complex issue, definitely not a bug, more like a feature request.

    Therefore i mark this issue as a feature request.

     
  • Antoni Mylka
    Antoni Mylka
    2008-11-13

    • labels: 827280 --> general
     
  • Antoni Mylka
    Antoni Mylka
    2011-11-28

    • milestone: --> 1.6.0 - features
     
  • Antoni Mylka
    Antoni Mylka
    2011-11-28

    Fixed with TikaMimeTypeIdentifier. It detects OpenOffice docs without the file name.

     
  • Antoni Mylka
    Antoni Mylka
    2011-11-28

    • status: open --> closed