#82 Regexp file matching for inclusions/exclusions

someday
open
nobody
None
1
2013-06-10
2013-04-19
Fatti Miei
No
0 up votes | 0 down votes | 0%
4 comments

I would like to be able to specify inclusion and exclusion masks with regular expressions.

I have several multipart rar archives, which in some cases make up most of the contents of a disk. Since compressed files are best left without defragmenting, I would like to exclude all files with extension .rar, .r00, .r01 ... .r99. Same goes with .s00, ..., .s99. Simple regexps such as .r[0-9][0-9] or .s[0-9][0-9] would do, but UltraDefrag just understands simple wildcards. Of course, .r?? and *.s?? would match way too many files.

I think we don't need full fledged regexp with backsubstitution etc. For this particular case, only character classes are necessary (expressions such as [a-z] or [0-9]).

Discussion

  • Stefan Pendl
    Stefan Pendl
    2013-04-19

    • status: unread --> open
     
  • Stefan Pendl
    Stefan Pendl
    2013-04-19

    We will have to investigate how easy it would be to implement this in native code, since the engine is written for the boot time processing.


    Stefan

     
  • Fatti Miei
    Fatti Miei
    2013-06-10

    There are several open source regexp libraries available. Perhaps it's an issue of executable size? If so, here are two small ones.

    1) T-Rex is a minimalistic regular expression library written in ANSI C, supports the following POSIX expressions: ?,*,+,^,$,.,[a-b],() plus the perl style greedy closures {n} . It can be conditionally compiled to support 8-bits or 16-bits character strings.
    http://tiny-rex.sourceforge.net/

    2) SLRE is an ANSI C library that implements a tiny subset of Perl regular expressions. It is primarily targeted for developers who want to parse configuation files, where speed is unimportant. It is in single .c file, easily modifiable for custom needs. For example, if one wants to introduce a new metacharacter, '\i', that means 'IP address', it is easy to do so.
    http://slre.sourceforge.net/

    T-Rex supports character ranges, such as [0-9] or [a-d], while SLRE seems not to. Therefore the config files should say something like
    .r[0123456789][0123456789] (T-Rex, SLRE)
    .r[0-9][0-9]. (T-Rex only)

     
    Last edit: Fatti Miei 2013-06-10
  • Stefan Pendl
    Stefan Pendl
    2013-06-10

    Thanks for the hints.


    Stefan