Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#34 Add ignorefiles and extend ignoredirs

closed
Andre-Littoz
General (16)
5
2012-11-15
2012-10-20
Lukasz M
No

It would be nice to add possibility to add ignorefiles option for files just like ignoredirs for directories. I have added this for my lxr and it's just 3 lines of code.
Would it be also possible for ignoredirs option to handle regexp? I would like to exclude /include dir from indexing (as header files are also within libs).

Discussion

<< < 1 2 (Page 2 of 2)
  • Andre-Littoz
    Andre-Littoz
    2012-11-05

    I was thinking of a new pair of parameters.

    In your specification proposal, you want to be able to filter the full path.

    Presently, 'ignoredirs' and the new 'ignorefiles' are activated in sub getdir() when scanning the "current" directory. It is thus very fast to check only the last segment of the path. I could extend 'ignoredirs' to be a mixed list of strings and regexps (if I can find an efficient Perl way to discriminate between then) but still on the last path segment.

    The new set (or may be a single parameter, a path is a string after all) would be an indication that full path filtering is wanted. The reason why I'd like to have both sets separate is I fear the cost of repetitively regexp-testing the full path when genxref'ing the kernel (38'000 files and hundreds of directories with an average path length over 60 characters, max. around 110 characters). Presently, my best indexing time on my high-end computer (3.4GHz) is 2 hours 40 minutes on a 3.1 kernel. I had a hard time to squeeze it from 3:50 to 2:40 (this was through DB requests restructuring, but directory tree traversal seems also expensive -- I know the worst step is reference collecting because the parser is written in Perl [interpretation not execution!!] with regexp instead of a good LR finite state automaton).

    If the set does not exist, I can quickly skip the test. If it exist, I can launch a "long" test on the full path.

    In the single set solution, I don't see how I can keep the fast last-segment test and switch to the long full-path regexp test.

    On what kind of tree do you need such detailed exclusion control? (number of files/directories, any conventional pattern in names?, mixture of languages, ...) This information could give me leads in better understanding your needs.

    ajl

    PS I've uploaded a beta version of the User Manual with a description of 'ignorefiles'. You can download it through a link in http://lxr.sf.net/en/index.html. Please give me your feedback.

     
  • Andre-Littoz
    Andre-Littoz
    2012-11-15

    • status: open --> closed
     
  • Andre-Littoz
    Andre-Littoz
    2012-11-15

    Extension implemented as 3 new configuration parameters while retaining the present simple and fast 'ignoredirs'.

    a/ 'ignorefiles' is a regexp against the final path segment (aka. filename). If it matches, file is skipped.
    b/ 'filterdirs' is an array of regexp against the full path. If one matches, directory is skipped.
    c/ 'filterfiles' is an array of regexp against the full path. If one matches, file is skipped.

    These exclusion rules are tested inside getfir() function. This function is a method inside the storage engines (Files/*.pm). It provides a list of a directory content, considering separately sub-directories and files. Rule order of application is 'ignore***' first, then 'filter***' if first rule did not exclude the candidate directory/file.

    The exclusion rules are checked only in getdir(). This allows to bypass them by typing an otherwise forbidden path as an URL in the browser address bar. Of course, the locally declared variables or functions will not be highlighted since they have not been indexed by genxref. Ther's no free meal!

     
<< < 1 2 (Page 2 of 2)