Hi Arnaud,
This is a good question, which has come up several times in the past
month or so, which hopefully means we'll be addressing it better in the
1.4 release, 2-3 month time frame. You do need to put all the ARC/WARC
files in a single directory to serve them all from a single collection.
This can be accomplished by copying/moving them to a single directory,
or by using symbolic links.
As another, more complex, alternative you could set up an ARC Proxy and
expose all the ARCs in their various directories via HTTP 1.1, possibly
just for access to the local machine, which would then be configured to
use a RemoteResourceStore. The primary downside to doing this is that
you need to manage updating your index yourself.
Or, possibly better, would be to create multiple collections, each with
a distinct LocalResourceStore pointing at the correct directory
containing the appropriate ARC files. This would also require creating a
separate ResourceIndex for each collection, and a separate AccessPoint
for each of those collections. IA and other users have been doing this
extensively in our deployments, but the down side is that you won't be
able to search across all collections with a single query, but that may
be what you want.
So, the simplest is to move the files, or use symbolic links, but there
are other options that can accomplish this.
Brad
Arn...@he... wrote:
> Hello,
> sorry certainly for this stupid question but after spending time into
> heritrix manuals, wayback manuals and mailing list archive , I hope
> users of this mailing can help me!
> 1/ In heritrix Arcs files are created in 'arcs' directories under each
> different Job. So several directories.
> 2/ In 'wayback.xml' I have to define the 'dataDir'. So one directory.
> How to organized my arcs files in one directory to be used by the
> wayback machine?
> Do I need to regularly copy the arcs files in a specific directory?
>
> Currently I tested by setting dataDir to one of my job arcs directory
> but I obtain this error message when I hit the 'take me back' button
>
>
> Etat HTTP 404 - /wayback-webapp-1.2.0/query
>
> I haven't error messages in tomcat log file.
>
> Arnaud.
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>
|