ZIP File "Family" Detection

  • Chris Bamford
    Chris Bamford


    Is there an API I can use to detect if a file is actually a ZIP file?  For example, JARs and DOCXs will be identified with the MIME types "application/java-archive" and "application/vnd.openxmlformats-officedocument.wordprocessingml", respectively, but both are really just glorified PK Zip files.
    If not, I can look at the header myself to get this information, but if something already exists …


    - Chris

  • Antoni Mylka
    Antoni Mylka

    Short answer: nothing like this is implemented.
    All these types are marked in our mimetypes.xml file as subtypes of application/zip, though this information isn't exposed anywhere in the API. I guess that checking if the first two bytes are PK might be easier than trying to hack some isSubtypeOf functionality into MagicMimeTypeIdentifier.

    An alternative would be something along the lines of

    Set<String> zipSubTypes = new HashSet<String>();
    for (Object o : mmti.getMimeTypeDescriptions()) {
       MimeTypeDescriptoin mtd = (MimeTypeDescription)o;
       String superType = mtd.getParentType();
       if (superType != null && "application/zip".equals(supertype)) {

    Then you get a list of all "glorified zip" types. (provided they are marked as such in our mimetypes.xml file). The getMimeTypeDescriptions() is protected, so you need to create a subclass of MagicMimeTypeIdentifier to expose it as public.

    Advantage of the longer approach is that you reuse the knowledge from mimetypes.xml

  • Chris Bamford
    Chris Bamford

    That's a good idea - thanks.