|
From: Brad T. <br...@ar...> - 2007-05-21 19:25:04
|
There's pretty significant work going into the Wayback right now to
simplify configuration of multiple collections.
When completed, this should also minimize server resources used, so it
should be possible to host hundreds of collections on a modest server.
Before the next release is available, this can be accomplished using
multiple servlet contexts, and CDX files:
1) create a CDX file for each individual collection you want to be able
to search independantly
2) deploy the war under the webapps directory with the name "COLLECTION.war"
3) edit the web.xml under the COLLECTION webapp, customizing the
ResourceIndex to use the appropriate CDX file for that collection with
the "resourceindex.cdxpaths" configuration parameter
4) to create an aggregate collection which searches multiple CDX files,
configure that collection to search all needed CDX files by separating
multiple CDX files with commas (",") in the "resourceindex.cdxpaths"
configuration parameter.
5) edit the WaybackUI.properties file under WEB-INF/classes to alter the
text displayed for the not-in-archive exception:
Exception.resourceNotInArchive.message=The Resource you requested is not
in this archive.
Let me know if you have problems or questions setting this up. We're
currently hosting dozens of collections on a single machine with 2GB of
RAM using this method. We use a simple shell script to generate and
customize each webapp based on a text file listing the collections needed.
With the new release, it will be possible to share rendering .jsp files
across multiple collections, which should simplify institution-level
.jsp customization, and to easily configure and use per-collection text
within those .jsp files.
Brad
Ignacio Garcia wrote:
> Hello,
>
> I have a question regarding collections within wayback.
> In the older perl versions, there was a way to specify different
> collections
> within wayback, and each collection will be handled as a separate set
> of arc
> files. Having specific messages identifying the collections and searching
> withing collections only...
> I was wondering if the latest java versions have such functionallity
> built
> in?
>
> What I'm trying to achieve is the following:
>
> Imagine I have a set of 100 arc files, and 25 are from crawls related
> with
> science magazine articles, 25 related with sports magazines and the
> other 50
> as misc. crawls.
> I would like to create 2 collections: One for science magazines and 1 for
> sport magazines.
> Once the collections are created, I would like to be able to search
> either
> ALL the arcs (100), or search by collection. I select one of the 2
> collections created and then only the specific arcs will the searched.
> Also, if I search for http://espn.magazine.com/* within the science
> magazines collection, and I get NO RESULTS, the message shown by wayback
> would have a specific message created for that particular collection,
> something like: No results withing SCIENCE MAGAZINES collection.
>
> Since the old wayback was able to handle such configurations, I was
> wondering if this was still doable in the newest java versions, or if
> I need
> to modify the actual source code to fit my needs?
>
> Thank you.
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>
|