#55 MagicMimeTypeIdentifier & Open Office docs

1.6.0 - features
general (27)
Greg Power

Use case:

Uploading files from web app, storing to temp location, extracting metadata/text prior to client approval/edit and committal to document repository.


MagicMimeTypeIdentifier only works for Open Office files if they retain their original extensions (.odt, .ods etc). Otherwise it detects them as a zip file (application/zip) therefore the extractors don't work. For PDF's and MS Office documents on the other hand it works fine, extensions irrelevant.

Work around:

In my case get the extension from the file upload and use that for temporary file names, but that obviously won't work for identifying arbitrary binaries.



  • Antoni Mylka

    Antoni Mylka - 2008-11-13

    The mime type identifier works with so called magic numbers - byte sequences that appear at the beginning of a file. .odt files are ZIP files, you can confirm it by changing the extension to .zip and uncompressing it. There is no way to tell an ordinary zip from an odt without the extension, because both have the same magic number (begin with the bytes 'PK'). You'd have to unzip it and see what's inside, but that's beyond the scope of the mime type identifier (at least now).

    There are other zip-based formats out there the openoffice and ms office 2007 ones being the most prominent examples. Making the mime type identifier aware of these is a much more complex issue, definitely not a bug, more like a feature request.

    Therefore i mark this issue as a feature request.

  • Antoni Mylka

    Antoni Mylka - 2008-11-13
    • labels: 827280 --> general
  • Antoni Mylka

    Antoni Mylka - 2011-11-28
    • milestone: --> 1.6.0 - features
  • Antoni Mylka

    Antoni Mylka - 2011-11-28

    Fixed with TikaMimeTypeIdentifier. It detects OpenOffice docs without the file name.

  • Antoni Mylka

    Antoni Mylka - 2011-11-28
    • status: open --> closed

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks