[Lxr-dev] [ lxr-Feature Requests-3578666 ] Add ignorefiles and extend ignoredirs
Brought to you by:
ajlittoz
From: SourceForge.net <no...@so...> - 2012-11-02 13:04:59
|
Feature Requests item #3578666, was opened at 2012-10-20 04:14 Message generated for change (Comment added) made by ajlittoz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390120&aid=3578666&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: General Group: None Status: Open Priority: 5 Private: No Submitted By: Lukasz M (myny) >Assigned to: Andre-Littoz (ajlittoz) Summary: Add ignorefiles and extend ignoredirs Initial Comment: It would be nice to add possibility to add ignorefiles option for files just like ignoredirs for directories. I have added this for my lxr and it's just 3 lines of code. Would it be also possible for ignoredirs option to handle regexp? I would like to exclude /include dir from indexing (as header files are also within libs). ---------------------------------------------------------------------- >Comment By: Andre-Littoz (ajlittoz) Date: 2012-11-02 06:04 Message: I rearchitected the "storage" backend through common factoring 'ignoredirs' and file filtering processing. They are now located in a single Files.pm method which can be referenced from the specific classes. dirs: I can add a new parameter to filter out based on full path instead of last segment. It is preferentially a regexp to allow accurate exclusion. However, I fear performance impact on kernel indexing (more than 38'000 files which would trigger the regexp -- mostly to tell "go ahead") What would suggest for the name of the global directory-excluding parameter? files: I replaced the various hard-coded regexp in the storage backends by a call to the new method which uses regexp contained in 'ignorefiles'. I also removed the filter in source's direxpand since the regexp already excludes the previously discarded files (and it is more efficient since the removal is done when enumerating the directory). ---------------------------------------------------------------------- Comment By: Andre-Littoz (ajlittoz) Date: 2012-11-01 01:21 Message: Transferred from "support request" to "feature request" ---------------------------------------------------------------------- Comment By: Andre-Littoz (ajlittoz) Date: 2012-10-24 07:55 Message: Mmmh! Your "specification" is hard to twist into the present implementation. It was designed to be rather efficient: 'ignoredirs' is taken into consideration when function getdir() is invoked to enumerate the content of a directory. 'ignoredirs' subdirectories are filtered here. This is also where 'ignorefiles' could be filtered. But, only this very "local" path element is compared, not the whole absolute path. This is very good for large sized projects such as the Linux kernel (~37 000 files and hundreds, maybe thousands, directories). I want to keep performance on such projects. 'ignoredirs' is also scanned in toreal() function with pattern matching. This is compatible with a longer path fragment (i.e. containing path separators). But this function exists only ib Plain.pm and CVS.pm, not in GIT.pm nor Subversion.pm. Consequently, this is not the place for implementation. While I think about an angle of attack, what about the following strategy since your concern is to prevent duplicates from entering into the DB: - before genxref step, disable (or remove) the links (ln) causing the duplicates, - launch genxref to create the DB without duplicates, - recreate the links. This could temporarily solve your problem. If there are too many links, you can design a small script so that you only type a short command to do the removal/creation. ---------------------------------------------------------------------- Comment By: Lukasz M (myny) Date: 2012-10-21 08:48 Message: Exactly, I have duplicate files in /include folder. Due to that any search results in duplcate results. I also cannot add this folder to ignoredirs because I have some other include dirs in some libs. So really I would like to ignore only /include folder and not /somelib/include. ---------------------------------------------------------------------- Comment By: Andre-Littoz (ajlittoz) Date: 2012-10-21 08:28 Message: I experimented with 'filter' and finally got it right. To include only Perl files for instance, add in lxr.conf: , 'filter' => '(\\/$|\\.pm$)' The first alternative keeps directories (they have a canonical trailing slash as fixed in LXR::Common::httpinit); the second keeps only .pm files; I admit that this INCLUDE rule is probably less flexible as in EXCLUDE rule. Second, it does not prevent genxref from indexing. I'll add an 'ignorefiles' parameter for both genxref and source. Could you better explain your "exclude /include dir from indexing (as header files are also within libs)". Do you mean there is a link resulting in duplicate files: one set accessed through /include and another accessed through /libs? I'll see if the "already indexed" featured can cope with this. Otherwise, add one of the set to 'ignoredirs'. ---------------------------------------------------------------------- Comment By: Lukasz M (myny) Date: 2012-10-21 02:00 Message: Actually, I do not mind displaying the file/directory. I would rather it not be indexed. For ignoring files (from indexing) I just modified the following: LXR/Files/Plain.pm # Check directories to ignore if (-d $dir . $node) { foreach my $ignoredir (@{$config->{'ignoredirs'}}) { next FILE if $node eq $ignoredir; } # Directory to keep: suffix name with a slash push(@dirs, $node . '/'); } else { --> foreach my $ignorefile ($config->ignorefiles) { --> next FILE if $node eq $ignorefile; --> } # File: don't change the name push(@files, $node); } ---------------------------------------------------------------------- Comment By: Andre-Littoz (ajlittoz) Date: 2012-10-20 07:12 Message: Well, this is new feature which could be included in the next release. 1/ regexp for 'ignoredirs' I had a quick look at the code sections related to 'ignoredirs'. The change seems to involve the 'getdir' sub in the vatious Files/ handlers. 2/ files exclusion I wonder if it is not already there. Look at the end of 'source' script (lines 401-406 in release 1.0). There is an undocumented call to lxr.conf's parameter 'filter'. It looks like it should be a regexp SELECTING (not excluding) which file or directory is displayed. This has been lurking in 'source' for ages and I really never succeeded in setting it up correctly. The main difficulty for the regexp is to be valid both for directories (otherwise they can't be listed) and for wanted files. All failures end up with 'fil does not exist' (which is what I always got!). You might experiment with it. Note that this does not exclude files from indexing, meaning you have no speed improvement in genxref. I suppose your 3-line solution is something equivalent to line 242 (release 1.0) which excludes *.o, *.a, core files (and also the index files of the initial LXR implementation). Can you send your patch? Best regards ajl ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390120&aid=3578666&group_id=27350 |