Thread: [Lxr-dev] [ lxr-Bugs-1691407 ] genxref binary file detection flawed
Brought to you by:
ajlittoz
From: SourceForge.net <no...@so...> - 2007-03-30 16:04:54
|
Bugs item #1691407, was opened at 2007-03-30 12:04 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=1691407&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: genxref Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Paul D. Smith (psmith) Assigned to: Nobody/Anonymous (nobody) Summary: genxref binary file detection flawed Initial Comment: Genxref uses File::MMagic to determine what kind of file it's indexing, and that's great. However, I noticed that a number of binary files in my tree were not being detected properly as "binary". Looking more closely I see that Perl's File::MMagic module uses an older, less complete table of file types as its hardcoded defaults, which is causing it to not detect some types of non-text files. However, it turns out that you can tell File:MMagic to use an external definition table for file types rather than the default builtin table. So, I got the latest version of magic.mime from my Ubuntu system and added it to the LXR directory. I needed to uncomment one or two lines, which magic.mime had commented out with the note 'Formats for "compress" proper have been moved into "compress.c"', which we don't have in File::MMagic apparently. This helped a LOT in classifying my files. It still gets a few wrong, though: apparently its method for detecting binary-ness of files that don't match any of the mime types is more liberal than the file(1) method. But, this works much better than before (detects .z compressed files, RPMs, etc.) Patch attached. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=1691407&group_id=27350 |
From: SourceForge.net <no...@so...> - 2009-03-26 15:44:21
|
Bugs item #1691407, was opened at 2007-03-30 17:04 Message generated for change (Settings changed) made by mbox You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=1691407&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: genxref Group: None Status: Open Resolution: None >Priority: 7 Private: No Submitted By: Paul D. Smith (psmith) Assigned to: Nobody/Anonymous (nobody) Summary: genxref binary file detection flawed Initial Comment: Genxref uses File::MMagic to determine what kind of file it's indexing, and that's great. However, I noticed that a number of binary files in my tree were not being detected properly as "binary". Looking more closely I see that Perl's File::MMagic module uses an older, less complete table of file types as its hardcoded defaults, which is causing it to not detect some types of non-text files. However, it turns out that you can tell File:MMagic to use an external definition table for file types rather than the default builtin table. So, I got the latest version of magic.mime from my Ubuntu system and added it to the LXR directory. I needed to uncomment one or two lines, which magic.mime had commented out with the note 'Formats for "compress" proper have been moved into "compress.c"', which we don't have in File::MMagic apparently. This helped a LOT in classifying my files. It still gets a few wrong, though: apparently its method for detecting binary-ness of files that don't match any of the mime types is more liberal than the file(1) method. But, this works much better than before (detects .z compressed files, RPMs, etc.) Patch attached. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=1691407&group_id=27350 |
From: SourceForge.net <no...@so...> - 2009-04-22 15:19:19
|
Bugs item #1691407, was opened at 2007-03-30 17:04 Message generated for change (Comment added) made by mbox You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=1691407&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: genxref Group: None >Status: Closed >Resolution: Fixed Priority: 7 Private: No Submitted By: Paul D. Smith (psmith) >Assigned to: Malcolm Box (mbox) Summary: genxref binary file detection flawed Initial Comment: Genxref uses File::MMagic to determine what kind of file it's indexing, and that's great. However, I noticed that a number of binary files in my tree were not being detected properly as "binary". Looking more closely I see that Perl's File::MMagic module uses an older, less complete table of file types as its hardcoded defaults, which is causing it to not detect some types of non-text files. However, it turns out that you can tell File:MMagic to use an external definition table for file types rather than the default builtin table. So, I got the latest version of magic.mime from my Ubuntu system and added it to the LXR directory. I needed to uncomment one or two lines, which magic.mime had commented out with the note 'Formats for "compress" proper have been moved into "compress.c"', which we don't have in File::MMagic apparently. This helped a LOT in classifying my files. It still gets a few wrong, though: apparently its method for detecting binary-ness of files that don't match any of the mime types is more liberal than the file(1) method. But, this works much better than before (detects .z compressed files, RPMs, etc.) Patch attached. ---------------------------------------------------------------------- >Comment By: Malcolm Box (mbox) Date: 2009-04-22 16:19 Message: Fixed as described - thanks for the patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=1691407&group_id=27350 |