#127 genxref binary file detection flawed

closed-fixed
Malcolm Box
genxref (49)
7
2009-04-22
2007-03-30
Paul D. Smith
No

Genxref uses File::MMagic to determine what kind of file it's indexing, and that's great. However, I noticed that a number of binary files in my tree were not being detected properly as "binary".

Looking more closely I see that Perl's File::MMagic module uses an older, less complete table of file types as its hardcoded defaults, which is causing it to not detect some types of non-text files. However, it turns out that you can tell File:MMagic to use an external definition table for file types rather than the default builtin table.

So, I got the latest version of magic.mime from my Ubuntu system and added it to the LXR directory. I needed to uncomment one or two lines, which magic.mime had commented out with the note 'Formats for "compress" proper have been moved into "compress.c"', which we don't have in File::MMagic apparently.

This helped a LOT in classifying my files. It still gets a few wrong, though: apparently its method for detecting binary-ness of files that don't match any of the mime types is more liberal than the file(1) method. But, this works much better than before (detects .z compressed files, RPMs, etc.)

Patch attached.

Discussion

  • Paul D. Smith
    Paul D. Smith
    2007-03-30

    Use comprehensive mmagic file to detect binaries

     
    Attachments
  • Malcolm Box
    Malcolm Box
    2009-03-26

    • priority: 5 --> 7
     
  • Malcolm Box
    Malcolm Box
    2009-04-22

    • assigned_to: nobody --> mbox
    • status: open --> closed-fixed
     
  • Malcolm Box
    Malcolm Box
    2009-04-22

    Fixed as described - thanks for the patch.