You can subscribe to this list here.
| 2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(5) |
Oct
(17) |
Nov
(30) |
Dec
(3) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 |
Jan
(4) |
Feb
(14) |
Mar
(8) |
Apr
(11) |
May
(2) |
Jun
(13) |
Jul
(9) |
Aug
(2) |
Sep
(2) |
Oct
(9) |
Nov
(20) |
Dec
(9) |
| 2007 |
Jan
(6) |
Feb
(4) |
Mar
(6) |
Apr
(7) |
May
(6) |
Jun
(6) |
Jul
(4) |
Aug
(3) |
Sep
(9) |
Oct
(26) |
Nov
(23) |
Dec
(2) |
| 2008 |
Jan
(17) |
Feb
(19) |
Mar
(16) |
Apr
(27) |
May
(3) |
Jun
(21) |
Jul
(21) |
Aug
(8) |
Sep
(13) |
Oct
(7) |
Nov
(8) |
Dec
(8) |
| 2009 |
Jan
(18) |
Feb
(14) |
Mar
(27) |
Apr
(14) |
May
(10) |
Jun
(14) |
Jul
(18) |
Aug
(30) |
Sep
(18) |
Oct
(12) |
Nov
(5) |
Dec
(26) |
| 2010 |
Jan
(27) |
Feb
(3) |
Mar
(8) |
Apr
(4) |
May
(6) |
Jun
(13) |
Jul
(25) |
Aug
(11) |
Sep
(2) |
Oct
(4) |
Nov
(7) |
Dec
(6) |
| 2011 |
Jan
(25) |
Feb
(17) |
Mar
(25) |
Apr
(23) |
May
(15) |
Jun
(12) |
Jul
(8) |
Aug
(13) |
Sep
(4) |
Oct
(17) |
Nov
(7) |
Dec
(6) |
| 2012 |
Jan
(4) |
Feb
(7) |
Mar
(1) |
Apr
(10) |
May
(11) |
Jun
(5) |
Jul
(7) |
Aug
(1) |
Sep
(1) |
Oct
(5) |
Nov
(6) |
Dec
(13) |
| 2013 |
Jan
(9) |
Feb
(7) |
Mar
(3) |
Apr
(1) |
May
(3) |
Jun
(19) |
Jul
(3) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2014 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2016 |
Jan
(4) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
| 2019 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
|
From: Kaisa K. <kau...@cc...> - 2006-11-04 08:24:01
|
Hi all, I don't seem to find a combination of hadoop-0.5.0 and nutchwax-0.6.x or nutchwax-0.7.x that would index on my machines. hadoop-0.5.0 + nutchwax-0.6.1 (latest official) fails (for different reasons than 0.7.0-200611030343) hadoop-0.5.0 + nutchwax-0.7.0-200611030343 (latest build artifact) fails Attached log from the 0.7.0 run when trying to index one arc. The run stops by saying 'A record version mismatch occurred. Expecting v3, found v5' Best, Kaisa Kaunonen Nat.Lib.Finland |
|
From: Michael S. <st...@ar...> - 2006-11-03 15:24:20
|
Shay Lawless wrote: > Hi, > > I am using nutchWax to index a series of ARC files created in a > webcrawl using the Heritrix crawler. > > My problem occurs when I perform a query on nutchWax and attempt to > view the results, nutch attempts to send me to the URL in question > rather than the archived content item. As a result I am getting an > error as the URL is not being correctly formed. Thats right. You need something to serve up the Archived content. Nutchwax has traditionally been paired with WERA: http://archive-access.sourceforge.net/projects/wera/. Check it out. We also need to make it so Nutchwax works using the opensource wayback machine. Its been reported recently that the bridge between the two is broken at the moment. It needs to be fixed. Yours, St.Ack > > Has anyone any experience with displaying content from an ARC content > archive rather than directly from the URL. Do I require an ARC-access > redisplay tool such as 'Wayback Machine' to achieve this. If so, can > anyone give advice on this or other similar tools for ARC redisplay? > > Any help would be greatly appreciated, thanks in advance > > Seamus > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Shay L. <sea...@gm...> - 2006-11-03 14:33:16
|
Hi, I am using nutchWax to index a series of ARC files created in a webcrawl using the Heritrix crawler. My problem occurs when I perform a query on nutchWax and attempt to view the results, nutch attempts to send me to the URL in question rather than the archived content item. As a result I am getting an error as the URL is not being correctly formed. Has anyone any experience with displaying content from an ARC content archive rather than directly from the URL. Do I require an ARC-access redisplay tool such as 'Wayback Machine' to achieve this. If so, can anyone give advice on this or other similar tools for ARC redisplay? Any help would be greatly appreciated, thanks in advance Seamus |
|
From: Kaisa K. <kau...@cc...> - 2006-11-03 07:56:24
|
I had something similar and was given the advice to use the very lastest version of nutchwax with hadoop-0.5.0 (and not hadoop-0.7.2 for example) On Thu, 2 Nov 2006, James Grahn wrote: > Greets, > I have been attempting to follow the tutorial to get NutchWAX up and > running in standalone mode, but I've reached an error that confounds me. > > The printlns seem to indicate that NutchWAX does successfully import the > ARC files. > > I see this line: > opening /tmp/mirror/heretrix/IAH-20061026194403-00000.arc.gz > > And after many individual pages being imported, I see this line: > > 061102 115327 opening /tmp/mirror/heretrix/IAH-20061026194522-00001.arc.gz > > This followed by more individual pages. So that seems fine. But no > index is generated and the printlns end like this: > > ... > 061102 115345 adding http://www.cnn.com/CNN/Programs/student.news/ 24869 > text/html > 061102 115345 adding http://www.cnn.com/CNN/Programs/people/ 367 text/html > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) > at > org.archive.access.nutch.ImportArcs.importArcs(ImportArcs.java:519) > at org.archive.access.nutch.IndexArcs.doImport(IndexArcs.java:154) > at org.archive.access.nutch.IndexArcs.doAll(IndexArcs.java:139) > at org.archive.access.nutch.IndexArcs.doJob(IndexArcs.java:246) > at org.archive.access.nutch.IndexArcs.main(IndexArcs.java:439) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:130) > > > -------- > > Any suggestions for this error? I am using a hadoop installation I > acquired with the current version of nutch, and am running the "all" > command as per the tutorial: > > ${HADOOP_HOME}/bin/hadoop jar ${NUTCHWAX_HOME}/nutchwax.jar all > /tmp/inputs /tmp/outputs test > > > Thanks, > James > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: James G. <jg...@si...> - 2006-11-02 17:46:04
|
Greets, I have been attempting to follow the tutorial to get NutchWAX up and running in standalone mode, but I've reached an error that confounds me. The printlns seem to indicate that NutchWAX does successfully import the ARC files. I see this line: opening /tmp/mirror/heretrix/IAH-20061026194403-00000.arc.gz And after many individual pages being imported, I see this line: 061102 115327 opening /tmp/mirror/heretrix/IAH-20061026194522-00001.arc.gz This followed by more individual pages. So that seems fine. But no index is generated and the printlns end like this: ... 061102 115345 adding http://www.cnn.com/CNN/Programs/student.news/ 24869 text/html 061102 115345 adding http://www.cnn.com/CNN/Programs/people/ 367 text/html Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) at org.archive.access.nutch.ImportArcs.importArcs(ImportArcs.java:519) at org.archive.access.nutch.IndexArcs.doImport(IndexArcs.java:154) at org.archive.access.nutch.IndexArcs.doAll(IndexArcs.java:139) at org.archive.access.nutch.IndexArcs.doJob(IndexArcs.java:246) at org.archive.access.nutch.IndexArcs.main(IndexArcs.java:439) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:130) -------- Any suggestions for this error? I am using a hadoop installation I acquired with the current version of nutch, and am running the "all" command as per the tutorial: ${HADOOP_HOME}/bin/hadoop jar ${NUTCHWAX_HOME}/nutchwax.jar all /tmp/inputs /tmp/outputs test Thanks, James |
|
From: Maximilian S. <sch...@ci...> - 2006-10-27 17:32:20
|
I've destilled Michael Stack's instruction into a shell script which I'd like to share. It seems to work quite good for me, but I've only used it on smaller archives (several hundert MBs) with the latest NutchWAX (CVS Head) and under Cygwin. Please let me know if it works for you and whether you still find everything with the new indices: http://www.cip.ifi.lmu.de/~schoefma/howto/incremental_indexing_with_nutch= wax/incr_index.sh Usage: ./incr_index.sh input_dir target_dir [collection_name] or ./incr_index.sh --arcs dir_with_arc_files target_dir [collection_name] Example: ./incr_index.sh --arcs heritrix/jobs/MyJob-12345/arcs myarch/output mycol= l Proconditions: - HADOOP_HOME and NUTCHWAX_HOME must be set - You need an existing index in "target_dir" to operate on, e.g. one generated by running NutchWAX' "all" task on a set of arc files. Hints: - Save your production index directory before running this script on it! - When using Cygwin, use relative paths especially for the input dir. - Either shut down NutchWAX when running this script or operate on a copy of your live index (to avoid permission denied errors). Return codes: This script returns exit codes which can be used by other scripts: 0 - Everything went fine, 1 - Script failed to start (directory not found etc.) 2 - The importing/indexing process was already started and the index in the target directory might have been damaged. You should restore it from your backup in this case. - Max |
|
From: Michael S. <st...@ar...> - 2006-10-26 14:50:52
|
Maximilian Schoefmann wrote: > Hi, > ... > > Am I missing some important configuration thing here or did the Nutch part > of wayback just not get enough love the last months (-: ? > > The latter is likely the problem. Let me talk to Brad (Mr. Wayback). Its a priority that it gets fixed. St.Ack > Cheers, > Max > > > Full stacktrace: > > java.lang.RuntimeException: field "link" does not appear to be indexed > org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:356) > org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedHitQueue.java:341) > org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:184) > org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58) > org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollector.java:44) > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:108) > org.apache.lucene.search.Searcher.search(Searcher.java:76) > org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(LuceneQueryOptimizer.java:268) > org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:95) > org.apache.nutch.searcher.NutchBean.search(NutchBean.java:180) > org.apache.nutch.searcher.NutchBean.search(NutchBean.java:242) > org.apache.nutch.searcher.OpenSearchServlet.doGet(OpenSearchServlet.java:136) > org.archive.access.nutch.NutchwaxOpenSearchServlet.doGet(NutchwaxOpenSearchServlet.java:69) > javax.servlet.http.HttpServlet.service(HttpServlet.java:689) > javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Maximilian S. <sch...@ci...> - 2006-10-26 12:32:07
|
Hi,
I'm trying to get Wayback with nutchWax working, but I'm running into thi=
s
error from nutchwax: field "link" does not appear to be indexed
Now I don't know wether nutchWax or Wayback is to blame here, or if "link=
"
_should_ be in my index but isn't somehow?!
When I just remove &sort=3Dlink from the query URL the query works fine. =
I
found it being added here:
org.archive.wayback.resourceindex.NutchResourceIndex.java (285):
ms.append("&sort=3Dlink");
But even without the exception being thrown I don't get any results as th=
e
exact date is added to the query each time. And by exact I mean _second_!
Even when I select "All" from the years select box,
"date%3A20061231235959" is added (?), when I select "2003",
date%3A20031231235959 is added. Nutch will then only search for documents
with this specific timestamp.
Am I missing some important configuration thing here or did the Nutch par=
t
of wayback just not get enough love the last months (-: ?
Cheers,
Max
Full stacktrace:
java.lang.RuntimeException: field "link" does not appear to be indexed
org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:356)
org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedH=
itQueue.java:341)
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSo=
rtedHitQueue.java:184)
org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.=
java:58)
org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollecto=
r.java:44)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:108)
org.apache.lucene.search.Searcher.search(Searcher.java:76)
org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(LuceneQueryOptim=
izer.java:268)
org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:95)
org.apache.nutch.searcher.NutchBean.search(NutchBean.java:180)
org.apache.nutch.searcher.NutchBean.search(NutchBean.java:242)
org.apache.nutch.searcher.OpenSearchServlet.doGet(OpenSearchServlet.java=
:136)
org.archive.access.nutch.NutchwaxOpenSearchServlet.doGet(NutchwaxOpenSea=
rchServlet.java:69)
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
|
|
From: Alex Wu <aw...@sd...> - 2006-10-24 19:06:20
|
Hi Brad,
Another update. Thank you for the input.
We modified org.archive.wayback.cdx.RemoteCDXIndex.java to search
multiple remote indexes. One wayback instance (the frontend) has been
configured to use RemoteCDX in the web.xml and the other wayback
instances are using LocalBDB configuration.
[code block]
private String sdscSearchUrlBases[] = {"http://machine:9000/wayback/
xmlquery", "http://machine:9002/wayback/xmlquery", ...}
public SearchResults query(WaybackRequest wbRequest) {
...
SearchResults searchResults = new SearchResults();
for(int i=0;i<sdscSearchUrlBases.length;i++) {
try {
doc = queryOneUrl(sdscSearchUrlBases[i], wbRequest);
...
}
...
}
[end code block]
On Oct 18, 2006, at 2:58 PM, Brad Tofel wrote:
> Sorry for the delayed response -- lots of balls in the air right now..
>
> I just did a large check in of the software the supports sorted
> flat files, but will not have time to update the docs for another
> week or so. There are some comments in the code and in the new
> web.xml, (which has changed pretty significantly) that might be
> enough to make sense of how to use the new functionality.
>
>
> Currently, there is no support for querying multiple remote
> indexes, but it seems like this should be relatively
> straightforward, in the good-guy case, using the new software:
> you'd just need to make a RemoteSearchResultSource out of the
> RemoteResourceIndex, and modify the SearchResultSourceFactory to
> build a composite from several of them...
>
> I say "in the good-guy case" because the failure modes might get
> complicated in terms of timeouts, failed connections, etc. However,
> if your hardware is stable, then the "easy solution" I outlined
> might be good enough.
>
> I'll drop you another line when the documentation has been updated.
>
> Brad
>
> Bing Zhu wrote:
>> Dear Mr. Brad,
>>
>> This is Bing Zhu from University of California: San Diego.
>>
>> We really appreciate your time to put the answers for our questions.
>>
>> Is it possible for a Wayback machine to query multiple index
>> sources (e.g. index info
>> in multiple Wayback machines) when using RemoteCDXIndex ? If yes,
>> would you
>> let us how to do so? Many thanks.
>>
>> Sincerely,
>> Bing
>>
>>
>>
>>
Alex Wu
858-534-5074
|
|
From: Maximilian S. <sch...@ci...> - 2006-10-11 09:33:49
|
Ok, CVS works. But someone should update the webpage as the instructions there don't work anymore. This works (archive-access.cvs.sourcef... instead of cvs.sourcef...): cvs -d:pserver:ano...@ar...:/cvsroot/archive-= access login cvs -z3 -d:pserver:ano...@ar...:/cvsroot/archive-= access co -P archive-access > It seems that the cruisecontrol server has been down for a while. As > sourceforge CVS also doesn't work (neither directly nor with CVSGrab) t= his > makes it a bit hard to test the latest versions. |
|
From: Maximilian S. <sch...@ci...> - 2006-10-11 09:21:11
|
Hi, It seems that the cruisecontrol server has been down for a while. As sourceforge CVS also doesn't work (neither directly nor with CVSGrab) thi= s makes it a bit hard to test the latest versions. Could it be that sourceforge neglects its cvs services a bit...? I'm seeing this quite often :-( Max |
|
From: Alex Wu <aw...@sd...> - 2006-10-10 19:11:16
|
Hi Brad, Thank you for your input. Wanted to give an update on our experience with the wayback application within Tomcat. We tried one setup, where on one machine, we ran 12 instances of the wayback application, each in it's own Tomcat container, and gave about 2,700 ARC files for each instance. Each Tomcat was allocated 1GB memory. This was done over the weekend, and over 30,000 ARCs were processed. Another setup was tried on the same machine, where 3 tomcat instances were run, each with 6 wayback applications. Each wayback application handles 2,700 ARC files. Each tomcat was allocated 1 to 3 GB memory. Within the instance of Tomcat with 3GB allocated, the result in about 48 hours was just over 3,000 ARCs processed. The other two tomcat instances are mostly idle, having indexed/merged their respective set of ARCs almost completely over the weekend. We are experimenting with different setups that involve many variables, such as the varying size of ARC files, non-wayback load on the machine, etc., so it's difficult to give a more accurate performance comparison without controlling the variables more. We modified slightly org/archive/wayback/cdx/indexer/ IndexPipeline.class so that the indexing, queuing, and merging are running in separate threads, and sleeping at different intervals. And with this, 3 indexing threads are running. Lastly, I was not able to view the CVS at http:// crawltools.archive.org:8080/cruisecontrol/buildresults/HEAD-archive- access. "Firefox can't establish a connection to the server at crawltools.archive.org:8080." Thank you again, Alex > Hi, > > We have a project with about 48000 ARC files, and would like inputs > on the best way to implement the wayback machine 0.6.0 > > Our setup is Tomcat 5.5.17, JDK 1.5, 1GB memory for JVM. We have > only 6000 ARCs indexed at this point over a 1 week period. We would > like to increase this rate significantly. > > > Some questions we have are: > > 1. Suggested environment setup for this number of ARC files and > greater. > > Your current setup should be fine for this, but when the distributed index option is available, it would be advisable to move to this configuration. > 2. Parallel indexing option for the current version or additional > tools that will allow for this. > > The pipeline-client command line tool has a new option to generate a flat-file version of the index data on STDOUT. This process could be executed in parallel across multiple nodes, and their outputs sorted, and merged together to form a single flat-file. This flat-file can be used today with the BDBJE option, by manually placing the file into the "toBeMerged" directory on the host holding the index. We've seen acceptable performance inserting large sorted files in this manner. With the new flat-file binary searching ResourceIndex code, this sorted flat-file could be used as-is, bypassing the BDBJE altogether. I'll let you know when it's checked in. > 3. The index is tied to the machine name. How to avoid this. > > Not sure what you mean. Do you mean there is data internal to the BDBJE that is aware of the host where it was created and cannot be used on other hosts? Can you elaborate? > 4. Is it possible to have multiple wayback installations, each with > its own JVM, use the same arc files and/or index. > > Yes. We have a couple of installations that include front end UIs for Proxy, Timeline, and Archival URL replay modes on top of the same index, where each installation uses a RemoteCDXIndex. I'll add some documentation to the User Manual outlining this configuration in the next day or two. > 5. The user manual at http://archive-access.sourceforge.net/ > projects/wayback/user_manual.html mentions a non- > LocalBDBResourceIndex resource implementation that communicates > with a remote wayback installation. The user manual does not cover > the preparation of the index data. What are the steps for this > setup, including index data preparation. > > As mentioned in #4, I'll outline this configuration in the User Manual, but the basics: set up one webapp with a LocalBDBResourceIndex, making sure it has a QueryUI with the QueryXMLUI jsps set up. This will allow HTTP-XML queries of the index. Then you set up one or more webapps, using whatever replay modes you prefer, using the RemoteCDXIndex ResourceIndex implementation to connect to the HTTP-XML exported ResourceIndex. > 6. Is there a limitation to the number of ARCs wayback will handle. > > With the 0.8.0 features, we expect the WM to be able to scale to arbitrarily large numbers of ARC files. Generating indexes for larger installations will be handled offline, and will be a manual process until the 1.0.0 release. Thanks for the feedback and questions. We're very interested in your experiences and making this software as easy to use as possible. |
|
From: Lukas M. <mat...@ce...> - 2006-10-09 14:28:37
|
Dne pond=C4=9Bl=C3=AD 09 =C5=99=C3=ADjen 2006 16:13 Shay Lawless napsal(a): > Hi all, > > I'm attempting to use NutchWax to index a number of .arc files generated = by > a web crawl. I can get the indexing step to run fine, and when I perform a > keyword search results are returned and ranked by nutch. However when I > click on any of the results, the content cannot be displayed. The message > returned is as follows, Did you fill collection name during indexing process? Which version of=20 NutchWax are you using? lukas > > Not Found The requested URL /null/20060930150000/http://blah.blah.com/ was > not found on this server. > > Additionally, a 404 Not Found error was encountered while trying to use an > ErrorDocument to handle the request. > > Any help you can provide would really be appreciated, > > Thanks, > > S=C3=A9amus Lawless |
|
From: Shay L. <sea...@gm...> - 2006-10-09 14:13:22
|
Hi all, I'm attempting to use NutchWax to index a number of .arc files generated by a web crawl. I can get the indexing step to run fine, and when I perform a keyword search results are returned and ranked by nutch. However when I click on any of the results, the content cannot be displayed. The message returned is as follows, Not Found The requested URL /null/20060930150000/http://blah.blah.com/ was not found on this server. Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request. Any help you can provide would really be appreciated, Thanks, S=E9amus Lawless |
|
From: Brad T. <br...@ar...> - 2006-09-26 20:20:28
|
Hi Alex, Good questions, all of them. First off, your collection is larger than any collection we've implemented using the current WM, but we are in the process, right now, of creating an installation of about 5TB, or about 50K ARCs, so you're not completely out in front of the crowd. Firstly, the BDBJE has performance issues at larger scales when inserting in random order, both in insert, and in subsequent lookup. We haven't yet done serious performance analysis on this. Our solution has been to externally sort the index data. This makes insert linear in performance, and lookup performance has been good on BDBJE's created this way(see answer to #2 below for a few more hints on implementing this, or the online User Manual in the near future). I'll add some notes on how we've been implementing this to the User Manual. 0.8.0, which will hopefully be available soon, will include modules for distributing an index across multiple nodes, in alphabetic regions. This code is mostly done now, but is not checked in. 0.8.0 will also include several new Index related features, including: capability to use sorted flat files as a Wayback index (which will allow external sort tools to be used to generate the index, long term(1.0.0) we're planning on using Hadoop for this) capability to merge results found from multiple index sources, which could involve multiple sorted flat files, and a BDBJE, for example. We expect that the combination of these features will allow indexes of arbitrarily large sizes to be created and searched efficiently. Today, 48K ARCs is pushing the edge. I can probably do a check in in the next few days of most of the functionality I've described above, if you're interested in helping to test this new software. Specific answers to your questions below. Alex Wu wrote: > Hi, > > We have a project with about 48000 ARC files, and would like inputs on > the best way to implement the wayback machine 0.6.0 > > Our setup is Tomcat 5.5.17, JDK 1.5, 1GB memory for JVM. We have only > 6000 ARCs indexed at this point over a 1 week period. We would like to > increase this rate significantly. > > > Some questions we have are: > > 1. Suggested environment setup for this number of ARC files and greater. > Your current setup should be fine for this, but when the distributed index option is available, it would be advisable to move to this configuration. > 2. Parallel indexing option for the current version or additional > tools that will allow for this. > The pipeline-client command line tool has a new option to generate a flat-file version of the index data on STDOUT. This process could be executed in parallel across multiple nodes, and their outputs sorted, and merged together to form a single flat-file. This flat-file can be used today with the BDBJE option, by manually placing the file into the "toBeMerged" directory on the host holding the index. We've seen acceptable performance inserting large sorted files in this manner. With the new flat-file binary searching ResourceIndex code, this sorted flat-file could be used as-is, bypassing the BDBJE altogether. I'll let you know when it's checked in. > 3. The index is tied to the machine name. How to avoid this. > Not sure what you mean. Do you mean there is data internal to the BDBJE that is aware of the host where it was created and cannot be used on other hosts? Can you elaborate? > 4. Is it possible to have multiple wayback installations, each with > its own JVM, use the same arc files and/or index. > Yes. We have a couple of installations that include front end UIs for Proxy, Timeline, and Archival URL replay modes on top of the same index, where each installation uses a RemoteCDXIndex. I'll add some documentation to the User Manual outlining this configuration in the next day or two. > 5. The user manual at > http://archive-access.sourceforge.net/projects/wayback/user_manual.html > mentions a non-LocalBDBResourceIndex resource implementation that > communicates with a remote wayback installation. The user manual does > not cover the preparation of the index data. What are the steps for > this setup, including index data preparation. > As mentioned in #4, I'll outline this configuration in the User Manual, but the basics: set up one webapp with a LocalBDBResourceIndex, making sure it has a QueryUI with the QueryXMLUI jsps set up. This will allow HTTP-XML queries of the index. Then you set up one or more webapps, using whatever replay modes you prefer, using the RemoteCDXIndex ResourceIndex implementation to connect to the HTTP-XML exported ResourceIndex. > 6. Is there a limitation to the number of ARCs wayback will handle. > With the 0.8.0 features, we expect the WM to be able to scale to arbitrarily large numbers of ARC files. Generating indexes for larger installations will be handled offline, and will be a manual process until the 1.0.0 release. Thanks for the feedback and questions. We're very interested in your experiences and making this software as easy to use as possible. Brad > > Thank you for your input. > > Alex Wu > 858-534-5074 > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Alex Wu <aw...@sd...> - 2006-09-26 19:32:49
|
Hi, We have a project with about 48000 ARC files, and would like inputs on the best way to implement the wayback machine 0.6.0 Our setup is Tomcat 5.5.17, JDK 1.5, 1GB memory for JVM. We have only 6000 ARCs indexed at this point over a 1 week period. We would like to increase this rate significantly. Some questions we have are: 1. Suggested environment setup for this number of ARC files and greater. 2. Parallel indexing option for the current version or additional tools that will allow for this. 3. The index is tied to the machine name. How to avoid this. 4. Is it possible to have multiple wayback installations, each with its own JVM, use the same arc files and/or index. 5. The user manual at http://archive-access.sourceforge.net/projects/ wayback/user_manual.html mentions a non-LocalBDBResourceIndex resource implementation that communicates with a remote wayback installation. The user manual does not cover the preparation of the index data. What are the steps for this setup, including index data preparation. 6. Is there a limitation to the number of ARCs wayback will handle. Thank you for your input. Alex Wu 858-534-5074 |
|
From: roger p <ro...@ho...> - 2006-08-27 23:16:20
|
Can anyone tell me howto overcome the following problem.
I use:
Fedora Core 5
Sun Java 1.5
and I have followed the instructions throughout the manual, but still I get
an error retrieving the records in the arc files. The following error
occurs:
:WARN: /nutchwax/opensearch:
java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:204)
at
org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:346)
at
org.archive.access.nutch.NutchwaxBean.getSummary(NutchwaxBean.java:53)
at
org.apache.nutch.searcher.OpenSearchServlet.doGet(OpenSearchServlet.java:155)
at
org.archive.access.nutch.NutchwaxOpenSearchServlet.doGet(NutchwaxOpenSearchServlet.java:69)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:442)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:357)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:226)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:615)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:150)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:123)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:141)
at org.mortbay.jetty.Server.handle(Server.java:272)
at
org.mortbay.jetty.HttpConnection.handlerRequest(HttpConnection.java:404)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:650)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:488)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:198)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:319)
at
org.mortbay.jetty.nio.HttpChannelEndPoint.run(HttpChannelEndPoint.java:270)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:475)
Before this it has found 14 hits of the searched word:
2006-08-27 17:11:30,038 INFO NutchBean - found 14 raw hits
2006-08-27 17:11:30,039 INFO NutchBean - total hits: 1829
So the problem does not seem to be searching in the indeces. Maybe there
might be a problem with accessing the arcs to get the specific page. I put
the correct path (searcher.dir) inside the nutch-site.xml and
hadoop-site.xml
Has anyone any idea about how to solve this problem?
CK
|
|
From: Maximilian S. <sch...@ci...> - 2006-08-01 08:28:38
|
Thanks for the fast and informative answer!
> Its not so much that its broken.
good to hear. Btw, I'm now using a 0.7 version from the integration
server. Don't know if that changes anything.
> + If no /index/ sub-directory in /${searcher.dir}/ then the nutch
> searcher NutchBean in the webapp opens all indices in the /indexes/
> subdir. Usually, under /indexes, there/ are subdirectories holding an
> /index/ per segment. I've tested mixing in ${searcher.dir}/indexes the
> indexes of merged segments and individual segment indices. This works
> as long as the indices under search.dir/indexes have a (empty)
> /index.done/ file added (Merged indexes don't have this file present --
> you may have to add manually). So, you can ingest new ARCS, then add th=
e
> new segment to the crawldb, do a new link invertion (the new segments
> links will be added to the old linkdb, as for the crawldb), and then
> index your new segment. When done, add the new index to
> ${segment.dir}/indexes and perhaps move the old, big merged index here
> (adding in the index.done file) and restart your webapp. You should be
> able to search the old and new.
> + But you may find that you might have to merge your new incremental
> segment indices into the large index and then 'sort' the merged index t=
o
> get good, 'balanced' results. Sorting is a recent feature added to nutc=
h
> that allows sorting the index by rank so the highest ranked pages are
> returned first. Generally you sort to get the best results returned
> faster than you would from the unsorted index. But, I've observed that
> querying across multiple indices, one index may be favored. To fix, I
> found I had to merge and sort all indices (To sort, do
> ''$HADOOP_HOME/bin/hadoop jar nutchwax.jar class
> org.apache.nutch.indexer.IndexSorter').
Seems complicated, but I will give it a try as soon as I have crawled a
few smaller arcs which don't take too long to index.
But as the URLs of the site I'm crawling don't change too often and
searching across multiple versions doesn't work right now, having a merge=
d
index won't mean much to me.
Would it make sense to just name the collections after the crawl date to
be able to distinguish between different versions?
Regards,
Max
|
|
From: Michael S. <st...@ar...> - 2006-07-31 20:56:53
|
Maximilian Schoefmann wrote:
> Hi *,
>
> I'want to do regular crawls of a bigger website. I've already crawled it
> successfully with heritrix, indexed the resulting arcs with nutchwax and
> also searched/browsed them with wera. Works pretty well!
>
I'm glad to hear.
> Now I wanted to do a second crawl but I've read that incremental indexing
> is broken in nutchwax 0.6 (which I'am using).
>
Its not so much that its broken. Its more that I don't yet have a good
story to tell on how to do incremental indexing in 0.6+ of nutchwax.
Here is what I currently know (I've been kinda waiting on getting more
practise under my belt before starting in writing a recipe for others):
+ If no /index/ sub-directory in /${searcher.dir}/ then the nutch
searcher NutchBean in the webapp opens all indices in the /indexes/
subdir. Usually, under /indexes, there/ are subdirectories holding an
/index/ per segment. I've tested mixing in ${searcher.dir}/indexes the
indexes of merged segments and individual segment indices. This works
as long as the indices under search.dir/indexes have a (empty)
/index.done/ file added (Merged indexes don't have this file present --
you may have to add manually). So, you can ingest new ARCS, then add the
new segment to the crawldb, do a new link invertion (the new segments
links will be added to the old linkdb, as for the crawldb), and then
index your new segment. When done, add the new index to
${segment.dir}/indexes and perhaps move the old, big merged index here
(adding in the index.done file) and restart your webapp. You should be
able to search the old and new.
+ But you may find that you might have to merge your new incremental
segment indices into the large index and then 'sort' the merged index to
get good, 'balanced' results. Sorting is a recent feature added to nutch
that allows sorting the index by rank so the highest ranked pages are
returned first. Generally you sort to get the best results returned
faster than you would from the unsorted index. But, I've observed that
querying across multiple indices, one index may be favored. To fix, I
found I had to merge and sort all indices (To sort, do
''$HADOOP_HOME/bin/hadoop jar nutchwax.jar class
org.apache.nutch.indexer.IndexSorter').
> I guess I need incremental indexing if I want to be able to search across
> all versions of the site?
>
Yes. Sort of. The not so good news is that in new nutch(wax), the key
it uses doing all of the mapreduce indexing steps is the URL (Not
URL+date but URL only). What this means is that only the latest version
of a page is searchable; unlike old nutch, you can't search a single URL
across all page versions. This feature was lost when we moved on to new
nutch. Recently I made nutchwax use URL + collection as the key
end-to-end indexing and at query time. This makes it so I can have the
same URL in the index multiple times distingushed by collection. Next
will be to key by URL + date (See '[ 1518431 ] [nutchwax] Search
multiple versions of one URL broken'
http://sourceforge.net/tracker/index.php?func=detail&aid=1518431&group_id=118427&atid=681137).
> Now I think I have three options:
> 1. wait until incremental indexing is fixed
> 2. use the 4.3 branch
>
4.3 branch is dead and no longer supported.
> 3. index only the newly crawled arcs and let the user select on which date
> she want's to search
>
> So my questions are:
> - Is it foreseeable when incremental indexing will be fixed - and if -
> what performance can I expect compared to completely reindexing all arc
> files?
>
Soon (smile). Its about time we had a new NutchWAX release. Lots of
changes of late (not the least of which is that there is an official 0.8
nutch release). I'm currently working on making incremental updates
work for us internally. Once the internal client is satisfied, I'll
document and release. I'd SWAG a month.
> - Will the 4.3 branch be maintained beside the 0.6 branch and will it be
> possible to convert the webdb/indices later (doesn't seem to be the case
> right now)?
>
No to both questions.
St.Ack
|
|
From: Maximilian S. <sch...@ci...> - 2006-07-31 12:22:39
|
Hi *, I'want to do regular crawls of a bigger website. I've already crawled it successfully with heritrix, indexed the resulting arcs with nutchwax and also searched/browsed them with wera. Works pretty well! Now I wanted to do a second crawl but I've read that incremental indexing is broken in nutchwax 0.6 (which I'am using). I guess I need incremental indexing if I want to be able to search across all versions of the site? Now I think I have three options: 1. wait until incremental indexing is fixed 2. use the 4.3 branch 3. index only the newly crawled arcs and let the user select on which dat= e she want's to search So my questions are: - Is it foreseeable when incremental indexing will be fixed - and if - what performance can I expect compared to completely reindexing all arc files? - Will the 4.3 branch be maintained beside the 0.6 branch and will it be possible to convert the webdb/indices later (doesn't seem to be the case right now)? What solution would you suggest? Thanks & Best regards, Max |
|
From: Michael S. <st...@ar...> - 2006-07-20 16:23:04
|
Natalia Torres wrote: > Thanks Michael, I'll experiment indexing job this way. > > > About indexing proces .. > > I'm testing how it works (Heritrix+Hadoop+NutchWax+Wera) with our web > and I'm running it in standalone mode with one crawled job (about 7 arc > 700Mb). > How long is it taking you to index your 7 ARCs? > I want to start a hadoop cluster but i d0n't know how many slaves put > and hardware requerimets to it. I'm looking for infomation about > benchmarks, indexing performance .... to know more about hardware needed > , but I don't find anything. > When the software settles more -- hadoop, nutch, and nutchwax -- I'll put up some figures on our experience here at the Archive. Meantime, here's a few coarse stats. + A cluster should have at least 3, probably 4 machines, to make distribution worth the bother. + Here at the Archive, we have a rack that has between 16 and 30 machines that we've been running/debugging indexing jobs on over the last bunch of months (The number of slaves participating varies because the hardware we use is not of the best quality and these indexing jobs lasting days doing checksums of all read and written are a good way of finding those flakey RAM sticks and erroring motherboards). We find on this rack that total processing of an ARC including ingest through indexing takes about 3 minutes (Machines are 4Gig 2Ghz dual-core Athlons with 4x400 SATA disks). Other things to consider: + Make all slave nodes exactly the same -- same RAM and disk configuration. It'll save you headache down the road. + Setup rsync so you can pull ARCs into your cluster with it. Once done, you can then feed nutchwax lists of ARCs using rsync URLs. This way, you can leave your ARCs out on storage nodes and the indexing software will take care of making the ARCs local to the indexing cluster. + DFS cannot be trusted. It'll be fixed soon but for now, as soon as an indexing job is completed, make a backup of the produced data -- segments and indices -- to local storage. Yours, St.Ack |
|
From: Natalia T. <nt...@ce...> - 2006-07-07 11:45:11
|
Thanks Michael, I'll experiment indexing job this way. About indexing proces .. I'm testing how it works (Heritrix+Hadoop+NutchWax+Wera) with our web and I'm running it in standalone mode with one crawled job (about 7 arc 700Mb). I want to start a hadoop cluster but i d0n't know how many slaves put and hardware requerimets to it. I'm looking for infomation about benchmarks, indexing performance .... to know more about hardware needed , but I don't find anything. Thanks, Natalia |
|
From: Michael S. <st...@ar...> - 2006-07-07 00:41:45
|
Natalia Torres wrote: > Hello > > I tried to add the new job moving the indexes directory before starting > index process and it works fine. Thanks!! > > So, every time I want to index a new job I need to move indexes > directory? If I move this directory the nuch wax search still working? > I presume you are using the 'all' command each time? It will complain if there are already indices in place from a previous run. The 'all' command is a convenience. It assumes you want to do a single-pass indexing of a set of ARCs. Running the 'all' command to bring in a new set of ARCs will run through all steps and index all the new additions as well as reindex all ARCs added previously. Sounds like you want to do incremental updates to your index. Experiment by calling the jobs that comprise the 'all' command individually. For example, run the import passing it a directory that contains a file that points to just the new ARCs you want to ingest. Then do 'update' and 'invert'. Next run indexing just of the segments that were added by the ingest step. Save aside the indexes made previously first. Run your deduplication. Finally merge the new indices and the old. I'm working currently on tools and documentation to better support incremental updates to indices. They'll form core of next release (Coming soon -- month or so). > This proces takes many hours ... > > Yes. It can. Depends on number of ARCs you have. Sounds like too that you are running in the standalone mode. You might consider starting a small hadoop cluster. That should improve your throughput. Yours, St.Ack > Natalia > > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Michael S. <st...@du...> - 2006-07-07 00:18:01
|
|
From: Natalia T. <nt...@ce...> - 2006-07-05 12:10:28
|
I have the same problem searchin any document (gif,html...)as JCL using=20 this versions of Wera and Nutchwax. , and Wayback works fine. I tried to change arc path in documentDispatcher but it doesn't work. Natalia arc...@li... wrote: > Send Archive-access-discuss mailing list submissions to > arc...@li... >=20 > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > or, via email, send a message with subject or body 'help' to > arc...@li... >=20 > You can reach the person managing the list at > arc...@li... >=20 > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Archive-access-discuss digest..." >=20 >=20 > Today's Topics: >=20 > 1. Re: Nutchwax+Wera problems (Michael Stack) >=20 >=20 > ---------------------------------------------------------------------- >=20 > Message: 1 > Date: Mon, 03 Jul 2006 16:03:27 -0700 > From: Michael Stack <st...@ar...> > Subject: Re: [Archive-access-discuss] Nutchwax+Wera problems > To: Jo?o Cl?udio Luzio <jl...@ex...> > Cc: arc...@li... > Message-ID: <44A...@ar...> > Content-Type: text/plain; charset=3DISO-8859-1; format=3Dflowed >=20 > Jo?o Cl?udio Luzio wrote: >=20 >>Oops.. forgot to say that the arcs where on=20 >>/var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/=20 >>directory but with .arc.gz instead of .arc. >> >> =20 >=20 > This should be fine. >=20 >=20 >>Jo?o Cl?udio Luzio wrote: >> =20 >> >>>Hi, >>> I've been trying to get the pair up and running for a while now bu= t=20 >>>had some problems.. >>> Using nutchwax 0.4.3 and the wera (0.4.2RC1 & 0.5.0) I managed to=20 >>>get it running but some of the related files (images) >>>aren't displayed. Those get: >>><retrievermessage> >>> <head> >>> <errorcode>4</errorcode> >>> <errormessage>Unable to parse Archive Identifier</errormessage= > >>> </head> >>></retrievermessage> >>> Using wera debug I found that the "[archiveidentifier] =3D>=20 >>>2770/IAH-20060619172903-00000-webarchive1" for a specific search i mad= e. >>>(Starting tomcat from the nutchwax indexed data) >>> =20 >=20 >=20 > So, it generally works but some of the images don't show sometimes? >=20 >=20 >>> Using wayback I dont have the same problems(I dont use nutchwax wi= th=20 >>>wayback..). >>> >>> I've tried to get nutchwax 0.6.1 and wera running but the opensear= ch=20 >>>servlet for the rss from nutchwax gives an exception.. >>> =20 >=20 >=20 > Do you still have the exception? >=20 >=20 >=20 >>> So i tried nutchwax 0.7.0 (with latest hadoop - standalone), but n= ow=20 >>>the arcretriever gives an exception when trying to get the document. >>> Using wera debug I found that the "[archiveidentifier] =3D>=20 >>>2234331/filedesc://IAH-20060619172903-00000-webarchive1.arc" for the=20 >>>same search i made. >>>(Starting tomcat from anywhere) >>> >>>The exception: >>>7 Bad function argument Cause: java.io.FileNotFoundException:=20 >>>/var/local/webarchive/heritrix/jobs/bn_18_test-20060619172505727/filed= esc:/IAH-20060619172903-00000-webarchive1.arc=20 >>>does not exist. Stack trace:=20 >>>org.archive.io.arc.ARCUtils.isReadable(ARCUtils.java:171)=20 >>>org.archive.io.arc.ARCUtils.testCompressedARCFile(ARCUtils.java:94)=20 >>>org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:200)=20 >>>org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:194)=20 >>>no.nb.nwa.retriever.ARCRetriever.getDocument(ARCRetriever.java:410)=20 >>>no.nb.nwa.retriever.ARCRetriever.doGet(ARCRetriever.java:131) >>> =20 >=20 >=20 > Looks like we shouldn't be putting the 'filedesc:' on front of ARC=20 > name? Does ARCRetreiver work if you make a request with=20 > IAH-20060619172903-00000-webarchive1.arc instead of=20 > filedesc:/IAH-20060619172903-00000-webarchive1.arc? >=20 > St.Ack >=20 >=20 >=20 > ------------------------------ >=20 > Using Tomcat but need to do more? Need to support web services, securit= y? > Get stuff done quickly with pre-integrated technology to make your job = easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geron= imo > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat= =3D121642 >=20 > ------------------------------ >=20 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >=20 >=20 > End of Archive-access-discuss Digest, Vol 2, Issue 2 > **************************************************** >=20 >=20 --=20 ...................................................................... __ / / Natalia Torres C E / S / C A Dept. de Sistemes /_/ Centre de Supercomputaci=C3=B3 de Catalunya Gran Capit=C3=A0, 2-4 (Edifici Nexus) =E2=80=A2 08034 Barcelona T. 93 205 6464 =E2=80=A2 F. 93 205 6979 =E2=80=A2 nt...@ce... ...................................................................... |