|
From: Brad T. <br...@ar...> - 2007-04-10 18:19:48
|
Hey Jimmy,
I just downloaded and installed wayback on a machine running Cygwin.
Two things I found:
1) need to set the JAVACMD env variable:
export JAVACMD=`which java`
2) I was only able to get things running when passing a relative argument
to the arc directory. There's some path resolution happening that I
haven't tracked down yet.
Just want to make sure that your Cygwin /tmp is HTTP (assuming on port
8081 from your example) exported on the node holding the ARC data at
/tmp/: so a file in the Cygwin folder at /tmp/foo.arc.gz is accessible at
http://archostip:8081/arc/foo.arc.gz.
Let me know if you're still having problems,
Brad
> Brad,
>
> I tried running index-client today. Ran it from the machine that is
> hosting the arc files. I received an error:
>
> bin/index-client: line 81: C:\Program: command not found
>
> This is the line that I used in cygwin:
>
> bin/index-client \tmp\ http://waybackip:8080/wayback/index-incoming/
> http://waybackip/arc-proxy \apache-tomcat-5.5.23\webapps\arc
> http://archostip:8081/arc
>
> I also tried:
>
> bin/index-client /tmp/ http://waybackip:8080/wayback/index-incoming/
> http://waybackip:8080/arc-proxy /apache-tomcat-5.5.23/webapps/arc
> http://archostip:8081/arc
>
> And I tried:
>
> bin/index-client C:\tmp\ http://waybackip:8080/wayback/index-incoming/
> http://waybackip/arc-proxy C:\apache-tomcat-5.5.23\webapps\arc
> http://archostip:8081/arc
>
> I received the same error each time. Any thought?
>
> Jimmy
>
>
> -----Original Message-----
> From: Brad Tofel [mailto:br...@ar...]
> Sent: Monday, April 09, 2007 5:05 PM
> To: Lin, Jimmy
> Cc: br...@ar...
> Subject: RE: [Archive-access-discuss] getting http resourcestore to work
> in wayback
>
> You're right, you may not need to use the location-client if you used
> the
> second usage of the index-client. The index-client scans through ARC
> files
> and outputs records for each document found in CDX format.
>
> In usage 1, the CDX output is sent to STDOUT, for later (manual) sorting
> and merging to generate an aggregated CDX file from many ARC input
> files.
> The location-client tool is primarily aimed for installations that are
> using this form to generate index files.
>
> In usage 2, the CDX data is sent directly to the ResourceIndex(can only
> be
> done with the BDB ResourceIndex implementation) via HTTP PUT. In this
> second usage, the index-client will also notify the ArcProxy's
> LocationDB
> of where that ARC can be found, which means you don't need to use the
> location-client tool at all.
>
> I haven't tested the codebase on Cygwin for a long time -- please send
> feedback on how it works for you.
>
> Automation of large scale indexing is the next key feature for the
> wayback
> project, so all this should get easier in the near term, but what's
> there
> now is hopefully enough to get smaller scale indexes built, or larger
> scale indexes built with a little shell scripting.
>
> We use these tools at the archive to maintain indexes for 10's of TB of
> ARC data, but we'd be happy to receive other feature suggestions that
> would make things simpler for you.
>
> Brad
>
>> Brad,
>>
>> Thanks. I did do that, however, I never followed through with
>> location-client. Its good to see that I was somewhat on the right
>> track. A couple follow up questions, can you run location-client
>> through cygwin(We are working on windows machines), and do I not need
> to
>> run index-client? The two shell scripts seem to be very similar.
>>
>> Jimmy
>>
>> -----Original Message-----
>> From: Brad Tofel [mailto:br...@ar...]
>> Sent: Monday, April 09, 2007 4:18 PM
>> To: Lin, Jimmy
>> Cc: arc...@li...
>> Subject: Re: [Archive-access-discuss] getting http resourcestore to
> work
>> in wayback
>>
>> I'll take a crack at improving the docs for other users later today,
> but
>> here are a couple quick tips:
>>
>> * the idea is to set up the ArcProxy to reverse proxy all HTTP 1.1
> range
>> requests to the actual storage node that holds the ARC files. If your
>> ArcProxy server is set up on arc-proxy.foo.org:8080/arc-proxy/
> (implies
>> you placed the wayback.war under the webapps dir on arc-proxy.foo.org,
>> with the name arc-proxy.war) then all ARCs can be accessed at:
>>
>> http://arc-proxy.foo.org:8080/arc-proxy/bar.arc.gz
>> http://arc-proxy.foo.org:8080/arc-proxy/baz.arc.gz
>>
>> even if bar.arc.gz and baz.arc.gz are on different nodes. To do this,
>> you
>> need to modify the arc-proxy web.xml, after it's been unpacked,
>> uncommenting the ArcProxy section of the configuration (and commenting
>> out
>> UI, ResourceStore, and ResourceIndex sections) and restart Tomcat.
>>
>> * the last step is to inform the ArcProxy where all the ARC files
> live,
>> so
>> it knows where to forward requests for the various ARCs stored on the
>> ARC
>> storage machines. This can be done with the location-client script.
>>
>> * I'm not sure which symbolic link you're referring to in the user
>> manual,
>> which version of the software are you using?
>>
>> Let me know if there's still missing info, and thanks for using the
>> tools!
>>
>> Brad
>>
>>
>>> Hello,
>>>
>>>
>>>
>>> I need some guidance in getting this up and running. The user manual
>>> states the following steps:
>>>
>>>
>>>
>>> 1. Set up a singleton ArcProxy webapp. This webapp maintains a BDB
>>> that maps ARC filenames to their actual absolute URL, and creates an
>>> indirection, so all ARC files are accessible within a single HTTP
>>> exported directory.
>>>
>>> How do I go about doing this? Do I rename the wayback.war file to
>>> arc-proxy and install it in tomcat?
>>>
>>> 2. Export your ARC files via HTTP 1.1, on all hosts that hold them,
>>> to the node running the ArcProxy webapp. Some examples of HTTP 1.1
>>> webservers you can use to export your ARC files are Apache, Tomcat,
>> and
>>> thttpd. Any other webserver that supports HTTP 1.1 will also work.
>>>
>>> This requires that I have one of the webservers listed above
> installed
>>> on each machine that holds arc files right? I have installed tomcat
>> on
>>> such a machine, how do I go about creating the "symbolic link" that
>> the
>>> User manual refers to.
>>>
>>> 3. Populate the ArcProxy BDB with the locations of all ARC files in
>>> your repository. See instructions for the using location-client
>>> command-line tool, within this document, to populate the ArcProxy
> BDB.
>>>
>>> Thanks in advance,
>>>
>>> Jimmy Lin
>>>
>>>
>>>
>>>
>>
> ------------------------------------------------------------------------
>> -
>>> Take Surveys. Earn Cash. Influence the Future of IT
>>> Join SourceForge.net's Techsay panel and you'll get the chance to
>> share
>>> your
>>> opinions on IT & business topics through brief surveys-and earn cash
>>>
>>
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
>> V_______________________________________________
>>> Archive-access-discuss mailing list
>>> Arc...@li...
>>> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>>>
>>
>
|