|
From: Brad T. <br...@ar...> - 2007-04-27 21:16:59
|
Hi Ignacio,
The Wayback documentation has fallen a bit behind new features. This
should be rectified after a check-in that will significantly change the
configuration system (Wayback will no longer be configured via web.xml),
and also will switch the build system to maven 2 and continuum. We're
hoping to have all this complete sometime in May.
The exclusion system that is (barely) documented, and to which you are
referring, has significant performance issues: Every record retrieved
from the ResourceIndex requires an HTTP request to an external
"exclusion service". We are not recommending use of this exclusion
system until these performance issues have been addressed.
Recent versions of the Wayback software includes an alternate
"static-map" exclusion system, which monitors the contents of a text
file, and excludes URLs and URL prefixes placed in the file.
Until we switch standard distributions to maven 2 and continuum, you can
grab a "preview" .war which includes this "static-map" exclusion
component, but otherwise is compatible with the older web.xml
configuration system. This .war should be a drop-in replacement for what
you're working with now, but will allow you to add the following
configuration in place of whatever exclusion configuration you're using:
===============
<context-param>
<param-name>exclusion.factorytype</param-name>
<param-value>static-map</param-value>
</context-param>
<context-param>
<param-name>resourceindex.exclusionpath</param-name>
<param-value>/tmp/wb-excludes.txt</param-value>
</context-param>
===============
Here's where you can grab the .war:
http://builds.archive.org:8080/maven2/org/archive/wayback/wayback-webapp/0.9.0-SNAPSHOT/wayback-webapp-0.9.0-20070418.010333-23.war
Then the contents of /tmp/wb-excludes.txt might look something like:
==================
www.foo.com/private/
foo.com/private/
www.foo.com/extras/secure/
foo.com/extras/secure/
www.example.com/
example.com/
==================
Updates to the file should be noticed automatically and take affect
within 10 seconds.
Please let me know how this works for you, and if you have other
suggestions for how this would be useful to you.
Brad
Ignacio Garcia wrote:
> Hello,
>
> I have a question regarding the Resource Index Exclusions,
>
> I want to create a manual list of URLs that should not be exposed by
> wayback. As far as I understand by reading the online user manual, I
> have to
> point the option "adminexclusion.dbpath" to the location where my
> exclusion
> list is.
> My question is: what format does the BDB exclusion file has and how can I
> create it.
>
> The command line tools included with wayback let you maintain BDB
> files or
> create CDX files, but nowhere it says anything about creating new BDB
> files
> based on a list of URLs.
> How would I create a exclusion list that will hold the following urls:
>
> http://www.foo.com/private/
> http://www.foo.com/extras/secure/
> http://www.example.com/
>
> In this case I want to hide all URLs from the domain example.com and all
> files URLs under the private and extras/secure directories in the
> foo.comdomain.
> Is that possible? Do I have to specify absolute URLs on the exclusion
> list?
>
> Thank you.
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>
|