|
From: Joseph F. R. <rya...@os...> - 2001-11-17 07:14:03
|
>You're right. There's no good reason for not using File::Find. But in
>this case doesn't it change the behaviour of the script?
>
>There are a couple of differences that I can see:
>
>1/ The old version only searched the direcotries it was given, the new one
> searches subdirectories too.
>
>2/ In the old version, you could tell it to search files using wildcards
> like 'doc*.txt', can you still do that in your version?
>
>Feel free to implement your change, but please use the $emulate_matts_code
>flag that we discussed a couple of days ago. When that flag is true then
>we must emulate Matt's code _exactly_.
Ah, complete oversight on my part. Thank you for pointing it out; took a
bit to get it up to par, but it should be alright now:
-----------------------------------------------
use vars qw($typelist @blocked);
my $basedir = '/indigo';
my @files = ('robot*' ,'pod','ftp*.html','txt','jpg');
my @directories = ('/indigo/lib/Pod');
@blocked = @directories;
my @filetypes = grep($_!~/[^a-z]/,@files);
my @wildcards = grep(/[^a-z]/,@files);
$typelist =
((@filetypes>0)?'(\.'.join(')|(\.',@filetypes).')':'').((@wildcards>0)?((@filetypes>0)?'|':'').join('|',map{s#\*(\.)#'.*?'.($1?'\.':'')#ge;$_='('.$_.')'}@wildcards):'');
my @search_files = find(\&wanted, $basedir);
sub wanted
{
return if(/^\./);
return unless (m/$typelist/i);
my @stats = stat $File::Find::name;
return if -d _;
return unless -r _;
foreach my $blocked (@blocked) {
return if ($File::Find::dir eq $blocked)
}
}
----------------------------------
That large amount of junk at the beginning constructs a regex to be used by
wanted. It works like this: First, every item in @files that does not have
a wildcard is extracted. Those will be treated as file extentions, and
translated into (\.ext). Next, every item in @files is extracted, and
treated as a pseudo regex. * are transliterated to .*?, and . are
translated to \. So robot* is translated into (robot.*?), and ftp*.html is
translated into (ftp.*?\.html). Each of "mini-regexes" are then joined by a
pipe, and used in the wanted function to match the desired filetypes. The
rest of the wanted sub is pretty standard; it filters out .htaccess etc
files,directories, and non-readables. Finally, the wanted sub filters out
any file in a directory that is listed in @directories (@blocked is set
equal to @directories so that wanted can use it; there is no way to pass
arguments to wanted (that I know of ...)). I know that is opposite of how
Matt's script works; however, it makes much more sense. Why should you have
to set every directory in your website to be searched just to filter out 1
directory? The good news is that the fix is easy if we still want to
emulate Matt; the last if statement in wanted simply needs to be changed to
unless to obtain his functionality.
|