#3 Option for a recursion limit on walking directories


The recommendations for Google are just to submit html
pages (as opposed to .gifs, .jpgs, etc) So either give an
example in the config if there is a simple way to reject
ALL else using say "regexp" for those not familiar with
regexp, or make a new switch to make it easier to pass
JUST .htm/.html. Right now we had to list and filter all
other possible extensions using the wildcard filters since
it was not acceptable to just pass ALL .htms since
there were some .htm's calls found in the logs with
parameters which we did NOT want to include.

Also it would be nice to have an option for the number of
levels walked in directories. For instance we wanted to
have our root, and only a PORTION of the subdirectories
contained in it walked. Since walking the root apparently
automatically walks ALL subdirectories, the only way
we could think to do this was to filter the rest out by
name. Would be nice to be able to specify if walking the
root included walking ALL subdirectories or only
specified ones.


  • Wyvern

    Wyvern - 2005-07-03

    Logged In: YES

    For the first part, the functionality is there already. Try
    something like this:

    <filter action="pass" type="wildcard" pattern="*.htm*" / >
    <filter action="drop" type="wildcard" pattern="*" / >

    I'll see if I can get this example worked into the

    I'm changing the title of your post to reflect your second

  • Wyvern

    Wyvern - 2005-07-03
    • summary: Easier to exclude everything but .htm and .html --> Option for a recursion limit on walking directories
  • Klaus Johannes Rusch

    Logged In: YES

    Wouldn't wildcard filters work for restricting the number of
    levels, for example

    <filter action="drop" type="wildcard" pattern="/*/*">

    to disregard all content in subdirectories.

    This also works nicely with directory patterns, e.g.

    <filter action="drop" type="wildcard"
    <filter action="drop" type="wildcard"


Log in to post a comment.