You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(5) |
Oct
(17) |
Nov
(30) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(4) |
Feb
(14) |
Mar
(8) |
Apr
(11) |
May
(2) |
Jun
(13) |
Jul
(9) |
Aug
(2) |
Sep
(2) |
Oct
(9) |
Nov
(20) |
Dec
(9) |
2007 |
Jan
(6) |
Feb
(4) |
Mar
(6) |
Apr
(7) |
May
(6) |
Jun
(6) |
Jul
(4) |
Aug
(3) |
Sep
(9) |
Oct
(26) |
Nov
(23) |
Dec
(2) |
2008 |
Jan
(17) |
Feb
(19) |
Mar
(16) |
Apr
(27) |
May
(3) |
Jun
(21) |
Jul
(21) |
Aug
(8) |
Sep
(13) |
Oct
(7) |
Nov
(8) |
Dec
(8) |
2009 |
Jan
(18) |
Feb
(14) |
Mar
(27) |
Apr
(14) |
May
(10) |
Jun
(14) |
Jul
(18) |
Aug
(30) |
Sep
(18) |
Oct
(12) |
Nov
(5) |
Dec
(26) |
2010 |
Jan
(27) |
Feb
(3) |
Mar
(8) |
Apr
(4) |
May
(6) |
Jun
(13) |
Jul
(25) |
Aug
(11) |
Sep
(2) |
Oct
(4) |
Nov
(7) |
Dec
(6) |
2011 |
Jan
(25) |
Feb
(17) |
Mar
(25) |
Apr
(23) |
May
(15) |
Jun
(12) |
Jul
(8) |
Aug
(13) |
Sep
(4) |
Oct
(17) |
Nov
(7) |
Dec
(6) |
2012 |
Jan
(4) |
Feb
(7) |
Mar
(1) |
Apr
(10) |
May
(11) |
Jun
(5) |
Jul
(7) |
Aug
(1) |
Sep
(1) |
Oct
(5) |
Nov
(6) |
Dec
(13) |
2013 |
Jan
(9) |
Feb
(7) |
Mar
(3) |
Apr
(1) |
May
(3) |
Jun
(19) |
Jul
(3) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2014 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(4) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2019 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Aaron B. <aa...@ar...> - 2012-05-16 20:42:44
|
Erik Hetzner <eri...@uc...> writes: > A quick question. UURI [1] is located in Heritrix Commons. HandyUrl is > located in archive-commons. Which should I use? Hmmm, it might depend on your needs. AFAIK, the UURI is geared towards Heritrix's needs, which includes a pretty light "normalization" of the URL. From an archival capture point of view, I think the idea is that Heritrix shouldn't munge the URL very much. However, HandyUrl is geared for access/playback/Wayback needs, and as such incorporates stronger URL normalization/canonicalization. I haven't spent much time in the code for either, the above is just my thoughts based on informal discussions with Gordon and Brad. Aaron |
From: Erik H. <eri...@uc...> - 2012-05-16 19:35:14
|
Hi all, A quick question. UURI [1] is located in Heritrix Commons. HandyUrl is located in archive-commons. Which should I use? Thank you! best, Erik 1. https://github.com/internetarchive/heritrix3/blob/master/commons/src/main/java/org/archive/net/UURI.java 2. https://github.com/internetarchive/archive-commons/blob/master/archive-commons/src/main/java/org/archive/url/HandyURL.java |
From: Bjarne A. <bj...@st...> - 2012-05-11 21:03:48
|
Hi. A website owner is asking for an extract of material from a specific domain Anybody aware of a tool that given either complete URLs or a URL regexp Would run though an ARC file and write All records into a new (W)ARC file? Best Bjarne Andersen Sendt fra min iPhone |
From: raffaele m. <raf...@at...> - 2012-05-08 12:31:26
|
ok, it was simple. https://gist.github.com/2634523 ciao -- raf...@at... |
From: raffaele m. <raf...@at...> - 2012-05-07 19:25:56
|
On May 7, 2012, at 8:19 PM, Pranay Pandey wrote: > What version of wayback are you using? > I recall the XML API went missing for 1.6.0 (not very sure on the version number). > > May be 1.7.x has it enabled back. > http://builds.archive.org:8080/maven2/org/archive/wayback/dist/ Ciao Pranay, yes, i'm on 1.6.0 i've tried right now 1.7.0 and 1.7.1 but the behaviour is the same. probably i'm missing something very obvious. thanks for your support -- raf...@at... |
From: Pranay P. <pra...@gm...> - 2012-05-07 18:19:50
|
Raffaele, What version of wayback are you using? I recall the XML API went missing for 1.6.0 (not very sure on the version number). May be 1.7.x has it enabled back. http://builds.archive.org:8080/maven2/org/archive/wayback/dist/ Best, Pranay Software Developer Library of Congress (Contractor) On Mon, May 7, 2012 at 8:17 AM, raffaele messuti <raf...@at...>wrote: > > http://inkdroid.org/journal/2012/05/03/way-way-back/comment-page-1/#comment-85380 > > @gojomo replied me on this blog post about enabling xml api. > but i've still not understand, someone could point me at working example? > > i should modify/add the bean org.archive.wayback.query.Renderer > and then? where do i map that new renderer to the url /xmlquery ? > > thank you > > > > -- > raf...@at... > > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > -- Regards, Pranay |
From: raffaele m. <raf...@at...> - 2012-05-07 12:43:07
|
http://inkdroid.org/journal/2012/05/03/way-way-back/comment-page-1/#comment-85380 @gojomo replied me on this blog post about enabling xml api. but i've still not understand, someone could point me at working example? i should modify/add the bean org.archive.wayback.query.Renderer and then? where do i map that new renderer to the url /xmlquery ? thank you -- raf...@at... |
From: Herbert v. de S. <hv...@gm...> - 2012-04-25 20:29:50
|
On Wed, Apr 25, 2012 at 12:35 PM, Erik Hetzner <eri...@uc...> wrote: > I am having some problems with the Firefox plugin, but it looks like > we’ve got Memento set up right now. > Any info on this would be very welcome too. Thanks Herbert > Thanks again! > > best, Erik > > GET /memento/timegate/http://www.abc.ca.gov/ HTTP/1.1 > User-Agent: curl/7.21.6 (x86_64-pc-linux-gnu) libcurl/7.21.6 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3 > Host: XXX.cdlib.org:8090 > Accept: */* > Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT > > HTTP/1.1 302 Moved Temporarily > Server: Apache-Coyote/1.1 > Vary: negotiate,accept-datetime > Link: <http://XXX.cdlib.org:8090/list/timebundle/http://www.abc.ca.gov/>;rel="timebundle", <http://www.abc.ca.gov/>;rel="original", <http://XXX.cdlib.org:8090/memento/20090204072340/http://www.abc.ca.gov/>;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", <http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/>;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", <http://XXX.cdlib.org:8090/memento/20101208015408/http://www.abc.ca.gov/>;rel="next "; datetime="Wed, 08 Dec 2010 01:54:08 GMT" , <http://XXX.cdlib.org:8090/list/timemap/link/http://www.abc.ca.gov/>;rel="timemap"; type="application/link-format" > Location: http://XXX.cdlib.org:8090/memento/20090204072340/http://www.abc.ca.gov/ > Content-Type: text/html > Content-Length: 0 > Date: Wed, 25 Apr 2012 18:28:23 GMT > > Sent from my free software system <http://fsf.org/>. > -- Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ == |
From: Erik H. <eri...@uc...> - 2012-04-25 18:42:24
|
At Tue, 24 Apr 2012 22:40:01 -0700, Ahmed AlSum wrote: > > Hello Eric, > > Additional to the details explanation from Herbert, I would like to add > two things based on my experience in helping in fixing a memento problem > with global wayback machine. > > Coding Inconsistency: two months ago, the wayback memento code has > inconsistency, some classes were missing so it has a strange behavior > (not similar to yours). IA has fixed it and it should be working now. > Also, IA has moved wayback open source to GitHub > (https://github.com/internetarchive/wayback ). Could you send me the > link of the repository that you grab the code from? I'm sorry I couldn't > find r3625 version. > > Configuration: If the code is up-to-date, then we may have a > configuration problem. I believe your attached configuration is correct. > I have the same configuration on my machine and it works fine. > > So if you can confirm the code version and I will try to make sure of > the consistency of this version. Hi Ahmed, Thanks for the pointer. I hadn’t realized that Wayback development had moved. I will be checking out the new code. I was using the svn repo linked on the archive access page. It looks like the issue was me using the wrong timegate URL. Whoops! Everything seems to be working as expected now. Thanks for your help! I hope to be able to report a working Memento installation at CDL soon. best, Erik |
From: Erik H. <eri...@uc...> - 2012-04-25 18:35:31
|
At Wed, 25 Apr 2012 15:43:25 +0000, Balakireva, Lyudmila L wrote: > > I downloaded version 1.6 of wayback . > I have to correct that timegate is expected at http://[myarchive]/memento/timegate/[url] Hi Lyudmila, Thanks so much! This seems to be exactly the issue I was having. I didn’t realize the timegate was located under /memento/timegate. Now when I try to request it works for me (see below). I am having some problems with the Firefox plugin, but it looks like we’ve got Memento set up right now. Thanks again! best, Erik GET /memento/timegate/http://www.abc.ca.gov/ HTTP/1.1 User-Agent: curl/7.21.6 (x86_64-pc-linux-gnu) libcurl/7.21.6 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3 Host: XXX.cdlib.org:8090 Accept: */* Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT HTTP/1.1 302 Moved Temporarily Server: Apache-Coyote/1.1 Vary: negotiate,accept-datetime Link: <http://XXX.cdlib.org:8090/list/timebundle/http://www.abc.ca.gov/>;rel="timebundle", <http://www.abc.ca.gov/>;rel="original", <http://XXX.cdlib.org:8090/memento/20090204072340/http://www.abc.ca.gov/>;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", <http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/>;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", <http://XXX.cdlib.org:8090/memento/20101208015408/http://www.abc.ca.gov/>;rel="next "; datetime="Wed, 08 Dec 2010 01:54:08 GMT" , <http://XXX.cdlib.org:8090/list/timemap/link/http://www.abc.ca.gov/>;rel="timemap"; type="application/link-format" Location: http://XXX.cdlib.org:8090/memento/20090204072340/http://www.abc.ca.gov/ Content-Type: text/html Content-Length: 0 Date: Wed, 25 Apr 2012 18:28:23 GMT |
From: Balakireva, L. L <lu...@la...> - 2012-04-25 15:43:42
|
I downloaded version 1.6 of wayback . I have to correct that timegate is expected at http://[myarchive]/memento/timegate/[url] curl -D my.txt http://lanlproto.santafe.edu:8080/memento/timegate/http://dans.knaw.nl/ [ludab@megalodon ~]$ more my.txt HTTP/1.1 302 Moved Temporarily Server: Apache-Coyote/1.1 Set-Cookie: JSESSIONID=8C88E2D937C605D65390B0347E183D3D; Path=/ Vary: negotiate,accept-datetime Link: <http://lanlproto.santafe.edu:8080/list/timebundle/http://dans.knaw.nl/>;rel="timebundle", <http://dans.knaw.nl/>;rel="original", <http://lanlproto.santafe.edu:8080/memento/2012 0409233226/http://dans.knaw.nl/>;rel="last memento"; datetime="Mon, 09 Apr 2012 23:32:26 GMT", <http://lanlproto.santafe.edu:8080/memento/20120328164435/http://dans.knaw.nl/>;rel="fir st memento"; datetime="Wed, 28 Mar 2012 16:44:35 GMT", <http://lanlproto.santafe.edu:8080/memento/20120409233006/http://dans.knaw.nl/>;rel="prev memento"; datetime="Mon, 09 Apr 2012 2 3:30:06 GMT" , <http://lanlproto.santafe.edu:8080/list/timemap/link/http://dans.knaw.nl/>;rel="timemap"; type="application/link-format" Location: http://lanlproto.santafe.edu:8080/memento/20120409233226/http://dans.knaw.nl/ Content-Type: text/html Content-Length: 0 Date: Wed, 25 Apr 2012 15:05:10 GMT curl -D my.txt -H Accept-Datetime:"Wed, 28 Mar 2012 16:44:36 GMT" http://lanlproto.santafe.edu:8080/memento/timegate/http://dans.knaw.nl/ [ludab@megalodon ~]$ more my.txt HTTP/1.1 302 Moved Temporarily Server: Apache-Coyote/1.1 Set-Cookie: JSESSIONID=767A080EF3BB8B6E384541324C504BDA; Path=/ Vary: negotiate,accept-datetime Link: <http://lanlproto.santafe.edu:8080/list/timebundle/http://dans.knaw.nl/>;rel="timebundle", <http://dans.knaw.nl/>;rel="original", <http://lanlproto.santafe.edu:8080/memento/2012 0328164435/http://dans.knaw.nl/>;rel="first memento"; datetime="Wed, 28 Mar 2012 16:44:35 GMT", <http://lanlproto.santafe.edu:8080/memento/20120409233226/http://dans.knaw.nl/>;rel="la st memento"; datetime="Mon, 09 Apr 2012 23:32:26 GMT", <http://lanlproto.santafe.edu:8080/memento/20120328165056/http://dans.knaw.nl/>;rel="next "; datetime="Wed, 28 Mar 2012 16:50:56 GMT" , <http://lanlproto.santafe.edu:8080/list/timemap/link/http://dans.knaw.nl/>;rel="timemap"; type="application/link-format" Location: http://lanlproto.santafe.edu:8080/memento/20120328164435/http://dans.knaw.nl/ Content-Type: text/html Content-Length: 0 Date: Wed, 25 Apr 2012 15:14:34 GMT ________________________________ From: mem...@go... [mem...@go...] on behalf of Ahmed AlSum [aa...@cs...] Sent: Tuesday, April 24, 2012 11:40 PM To: mem...@go... Cc: Herbert Van de Sompel; eri...@uc...; archive-access-discuss; Abhishek Salve Subject: Re: Problem with memento configuration Hello Eric, Additional to the details explanation from Herbert, I would like to add two things based on my experience in helping in fixing a memento problem with global wayback machine. Coding Inconsistency: two months ago, the wayback memento code has inconsistency, some classes were missing so it has a strange behavior (not similar to yours). IA has fixed it and it should be working now. Also, IA has moved wayback open source to GitHub (https://github.com/internetarchive/wayback ). Could you send me the link of the repository that you grab the code from? I'm sorry I couldn't find r3625 version. Configuration: If the code is up-to-date, then we may have a configuration problem. I believe your attached configuration is correct. I have the same configuration on my machine and it works fine. So if you can confirm the code version and I will try to make sure of the consistency of this version. Thanks for your interest in memento. Best regards, Ahmed AlSum Internet Archive, Software Engineer Intern Old Dominion University, PhD Student On 04/24/2012 07:20 PM, Herbert Van de Sompel wrote: Erik, I will discuss in detail with my team, tomorrow. A few observations: - I am not sure what the status of Memento compliance is of the latest Wayback. I know work has been going on at IA to revise the approach towards Memento compliance (closer integration between regular Wayback operation and Memento operation, e.g. Mementos with the same URI for both access points). But, again, I am not sure what the status of the work is. Next week, several of us will be at IIPC and that would be a good occasion to talk into some detail about the status. - In your below, I assume (I am it sure) /memento/http://www.abc.ca.gov/ is a TimeGate for http://www.abc.ca.gov/. A TimeGate that is approached without an Accept-Datetime value is supposed to redirect to the most recent Memento. Now, I see that the request has an Accept-Datetime. If this is indeed a TimeGate, as suggested by the redirect to the most recent Memento, it seems that the TimeGate does't "see" the Accept-Datetime. Then again, if this is a TimeGate, it doesn't behave like one with other regards: there are is no HTTP Link header, and a TimeGate should provide all kinds of HTTP Links. - I have no immediate explanation for the second redirect, except that - maybe - that could just be an internal Wayback redirect that doesn't have to do with Memento. But, I'm guessing. - Another clear indication that something is really wrong can be seen in the response of the final Memento: the Link header contains a lot of want-to-be HTTP links but not a single URI. Not a lot of linking gong on there ;-) Again, I will discuss in detail with my team, tomorrow. And hopefully get back to you with a better interpretation. And, next week at IIPC, we can discuss with colleagues from IA. Will you be there? Thanks a lot for your continued interest in Memento! Cheers Herbert Sent from my iPad On Apr 24, 2012, at 17:34, Erik Hetzner <eri...@uc...><mailto:eri...@uc...> wrote: Hi, We are experimenting with the Memento configuration for Open Source Wayback, but are experiencing some difficulties. We have built and installed the latest wayback from svn (r3625). We have made minimal changes to the config, changing only hostnames and ports, and enabling a CDX collection. (See cleaned results of diff -r below). Wayback is working fine. We can visit, e.g., http://XXX.cdlib.org:8090/wayback/*/http://www.abc.ca.gov Unfortunately, memento keeps redirecting us to the current date, and then to the closest version to that (in the below case, 3 Feb). As you can see by the Link headers in the last response, a closer version to the requested Accept-Datetime does exist. Does anybody have an idea of what is happening here? Let me know if there is any other information that would help with this. Thank you! best, Erik Here is an example of a session, from Firebug, using Mementofox: GET /memento/http://www.abc.ca.gov/ HTTP/1.1 Host: XXX.cdlib.org:8090 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT HTTP/1.1 302 Moved Temporarily Server: Apache-Coyote/1.1 Location: http://XXX.cdlib.org:8090/memento/20120424231906/http://www.abc.ca.gov/ Content-Length: 0 Date: Tue, 24 Apr 2012 23:19:06 GMT GET /memento/20120424231906/http://www.abc.ca.gov/ HTTP/1.1 Host: XXX.cdlib.org:8090 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT HTTP/1.1 302 Moved Temporarily Server: Apache-Coyote/1.1 Location: http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/ Content-Length: 0 Date: Tue, 24 Apr 2012 23:19:06 GMT GET /memento/20120203005345/http://www.abc.ca.gov/ HTTP/1.1 Host: XXX.cdlib.org:8090 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Memento-Datetime: Fri, 03 Feb 2012 00:53:45 GMT Link: ;rel="timebundle", ;rel="original", ;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", ;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", ;rel="prev memento"; datetime="Mon, 27 Jun 2011 01:01:27 GMT" , ;rel="timemap"; type="application/link-format",;rel="timegate" X-Archive-Guessed-Charset: cp1252 X-Archive-Orig-Connection: close X-Archive-Orig-Content-Length: 30168 X-Archive-Orig-Content-Type: text/html X-Archive-Orig-X-Powered-By: ASP.NET X-Archive-Orig-Server: Microsoft-IIS/6.0 X-Archive-Orig-Date: Fri, 03 Feb 2012 00:53:45 GMT Content-Type: text/html;charset=cp1252 Content-Length: 30835 Date: Tue, 24 Apr 2012 23:19:06 GMT diff -r wayback-1.7.1/WEB-INF/CDXCollection.xml wayback-1.7.1-ours//WEB-INF/CDXCollection.xml 32,37c32,33 < <bean class="org.archive.wayback.resourcestore.LocationDBResourceStore"> < <property name="db"> < <bean class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB"> < <property name="path" value="${wayback.basedir}/path-index.txt" /> < </bean> < </property> --- <bean class="org.archive.wayback.resourcestore.SimpleResourceStore"> <property name="prefix" value="http://XXX.cdlib.org:YYYYY/arcs/"/> 50c46 < <property name="path" value="${wayback.basedir}/cdx-index/index.cdx" /> --- <property name="path" value="/was/wayback.public.index/everything.cdx"/> diff -r wayback-1.7.1/WEB-INF/wayback.xml wayback-1.7.1-ours//WEB-INF/wayback.xml 18c18 < wayback.urlprefix=http://localhost.archive.org:8080/wayback/ --- wayback.urlprefix=http://XXX.cdlib.org:8090/wayback/ 72d71 < <import resource="BDBCollection.xml"/> 73a73,74 <import resource="BDBCollection.xml"/> --> 74a76 <!-- 148c150 < <property name="matchPort" value="8080" /> --- <property name="matchPort" value="8090" /> 152c154 < <bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint"> --- <bean name="8090:wayback" class="org.archive.wayback.webapp.AccessPoint"> 176d177 < <property name="collection" ref="localbdbcollection" /> 178c179 < <property name="collection" ref="localcdxcollection" /> --- <property name="collection" ref="localbdbcollection" /> 179a181 <property name="collection" ref="localcdxcollection" /> 227d228 < <!-- 229,231c230,232 < <bean name="8080:memento" parent="8080:wayback"> < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/"<http://localhost.archive.org:8080/memento/> /> < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/"<http://localhost.archive.org:8080/list/> /> --- <bean name="8090:memento" parent="8090:wayback"> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/"<http://XXX.cdlib.org:8090/memento/> /> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/"<http://XXX.cdlib.org:8090/list/> /> 234c235 < <prop key="aggregationPrefix">http://localhost.archive.org:8080/list/</prop> --- <prop key="aggregationPrefix">http://XXX.cdlib.org:8090/list/</prop> 247c248 < <property name="replayURIPrefix" value="http://localhost.archive.org:8080/memento/"<http://localhost.archive.org:8080/memento/>/> --- <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/memento/"<http://XXX.cdlib.org:8090/memento/>/> 264,267c265,268 < <bean name="8080:list" parent="8080:memento"> < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/"<http://localhost.archive.org:8080/memento/> /> < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/"<http://localhost.archive.org:8080/list/> /> < <property name="staticPrefix" value="http://localhost.archive.org:8080/list/"<http://localhost.archive.org:8080/list/> /> --- <bean name="8090:list" parent="8090:memento"> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/"<http://XXX.cdlib.org:8090/memento/> /> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/"<http://XXX.cdlib.org:8090/list/> /> <property name="staticPrefix" value="http://XXX.cdlib.org:8090/list/"<http://XXX.cdlib.org:8090/list/> /> 270c271 < <prop key="Prefix">http://localhost.archive.org:8080/memento/</prop> --- <prop key="Prefix">http://XXX.cdlib.org:8090/memento/</prop> 283c284 < <property name="replayURIPrefix" value="http://memento.localhost.archive.org:8080/list/"<http://memento.localhost.archive.org:8080/list/>/> --- <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/list/"<http://XXX.cdlib.org:8090/list/>/> 287d287 < --> Sent from my free software system <http://fsf.org/><http://fsf.org/>. |
From: Ahmed A. <aa...@cs...> - 2012-04-25 05:54:24
|
Hello Eric, Additional to the details explanation from Herbert, I would like to add two things based on my experience in helping in fixing a memento problem with global wayback machine. Coding Inconsistency: two months ago, the wayback memento code has inconsistency, some classes were missing so it has a strange behavior (not similar to yours). IA has fixed it and it should be working now. Also, IA has moved wayback open source to GitHub (https://github.com/internetarchive/wayback ). Could you send me the link of the repository that you grab the code from? I'm sorry I couldn't find r3625 version. Configuration: If the code is up-to-date, then we may have a configuration problem. I believe your attached configuration is correct. I have the same configuration on my machine and it works fine. So if you can confirm the code version and I will try to make sure of the consistency of this version. Thanks for your interest in memento. Best regards, Ahmed AlSum Internet Archive, Software Engineer Intern Old Dominion University, PhD Student On 04/24/2012 07:20 PM, Herbert Van de Sompel wrote: > Erik, > > I will discuss in detail with my team, tomorrow. A few observations: > > - I am not sure what the status of Memento compliance is of the latest Wayback. I know work has been going on at IA to revise the approach towards Memento compliance (closer integration between regular Wayback operation and Memento operation, e.g. Mementos with the same URI for both access points). But, again, I am not sure what the status of the work is. Next week, several of us will be at IIPC and that would be a good occasion to talk into some detail about the status. > > - In your below, I assume (I am it sure) /memento/http://www.abc.ca.gov/ is a TimeGate for http://www.abc.ca.gov/. A TimeGate that is approached without an Accept-Datetime value is supposed to redirect to the most recent Memento. Now, I see that the request has an Accept-Datetime. If this is indeed a TimeGate, as suggested by the redirect to the most recent Memento, it seems that the TimeGate does't "see" the Accept-Datetime. Then again, if this is a TimeGate, it doesn't behave like one with other regards: there are is no HTTP Link header, and a TimeGate should provide all kinds of HTTP Links. > > - I have no immediate explanation for the second redirect, except that - maybe - that could just be an internal Wayback redirect that doesn't have to do with Memento. But, I'm guessing. > > - Another clear indication that something is really wrong can be seen in the response of the final Memento: the Link header contains a lot of want-to-be HTTP links but not a single URI. Not a lot of linking gong on there ;-) > > Again, I will discuss in detail with my team, tomorrow. And hopefully get back to you with a better interpretation. And, next week at IIPC, we can discuss with colleagues from IA. Will you be there? > > Thanks a lot for your continued interest in Memento! > > Cheers > > Herbert > > Sent from my iPad > > On Apr 24, 2012, at 17:34, Erik Hetzner<eri...@uc...> wrote: > >> Hi, >> >> We are experimenting with the Memento configuration for Open Source >> Wayback, but are experiencing some difficulties. >> >> We have built and installed the latest wayback from svn (r3625). We >> have made minimal changes to the config, changing only hostnames and >> ports, and enabling a CDX collection. (See cleaned results of diff -r >> below). Wayback is working fine. We can visit, e.g., >> http://XXX.cdlib.org:8090/wayback/*/http://www.abc.ca.gov >> >> Unfortunately, memento keeps redirecting us to the current date, and >> then to the closest version to that (in the below case, 3 Feb). >> >> As you can see by the Link headers in the last response, a closer >> version to the requested Accept-Datetime does exist. >> >> Does anybody have an idea of what is happening here? Let me know if >> there is any other information that would help with this. >> >> Thank you! >> >> best, Erik >> >> Here is an example of a session, from Firebug, using Mementofox: >> >> GET /memento/http://www.abc.ca.gov/ HTTP/1.1 >> Host: XXX.cdlib.org:8090 >> User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 >> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 >> Accept-Language: en-us,en;q=0.5 >> Accept-Encoding: gzip, deflate >> Connection: keep-alive >> Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT >> >> HTTP/1.1 302 Moved Temporarily >> Server: Apache-Coyote/1.1 >> Location: http://XXX.cdlib.org:8090/memento/20120424231906/http://www.abc.ca.gov/ >> Content-Length: 0 >> Date: Tue, 24 Apr 2012 23:19:06 GMT >> >> GET /memento/20120424231906/http://www.abc.ca.gov/ HTTP/1.1 >> Host: XXX.cdlib.org:8090 >> User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 >> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 >> Accept-Language: en-us,en;q=0.5 >> Accept-Encoding: gzip, deflate >> Connection: keep-alive >> Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT >> >> HTTP/1.1 302 Moved Temporarily >> Server: Apache-Coyote/1.1 >> Location: http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/ >> Content-Length: 0 >> Date: Tue, 24 Apr 2012 23:19:06 GMT >> >> GET /memento/20120203005345/http://www.abc.ca.gov/ HTTP/1.1 >> Host: XXX.cdlib.org:8090 >> User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 >> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 >> Accept-Language: en-us,en;q=0.5 >> Accept-Encoding: gzip, deflate >> Connection: keep-alive >> Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT >> >> HTTP/1.1 200 OK >> Server: Apache-Coyote/1.1 >> Memento-Datetime: Fri, 03 Feb 2012 00:53:45 GMT >> Link: ;rel="timebundle", ;rel="original", ;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", ;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", ;rel="prev memento"; datetime="Mon, 27 Jun 2011 01:01:27 GMT" , ;rel="timemap"; type="application/link-format",;rel="timegate" >> X-Archive-Guessed-Charset: cp1252 >> X-Archive-Orig-Connection: close >> X-Archive-Orig-Content-Length: 30168 >> X-Archive-Orig-Content-Type: text/html >> X-Archive-Orig-X-Powered-By: ASP.NET >> X-Archive-Orig-Server: Microsoft-IIS/6.0 >> X-Archive-Orig-Date: Fri, 03 Feb 2012 00:53:45 GMT >> Content-Type: text/html;charset=cp1252 >> Content-Length: 30835 >> Date: Tue, 24 Apr 2012 23:19:06 GMT >> >> diff -r wayback-1.7.1/WEB-INF/CDXCollection.xml wayback-1.7.1-ours//WEB-INF/CDXCollection.xml >> 32,37c32,33 >> < <bean class="org.archive.wayback.resourcestore.LocationDBResourceStore"> >> < <property name="db"> >> < <bean class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB"> >> < <property name="path" value="${wayback.basedir}/path-index.txt" /> >> < </bean> >> < </property> >> --- >>> <bean class="org.archive.wayback.resourcestore.SimpleResourceStore"> >>> <property name="prefix" value="http://XXX.cdlib.org:YYYYY/arcs/"/> >> 50c46 >> < <property name="path" value="${wayback.basedir}/cdx-index/index.cdx" /> >> --- >>> <property name="path" value="/was/wayback.public.index/everything.cdx"/> >> diff -r wayback-1.7.1/WEB-INF/wayback.xml wayback-1.7.1-ours//WEB-INF/wayback.xml >> 18c18 >> < wayback.urlprefix=http://localhost.archive.org:8080/wayback/ >> --- >>> wayback.urlprefix=http://XXX.cdlib.org:8090/wayback/ >> 72d71 >> < <import resource="BDBCollection.xml"/> >> 73a73,74 >>> <import resource="BDBCollection.xml"/> >>> --> >> 74a76 >>> <!-- >> 148c150 >> < <property name="matchPort" value="8080" /> >> --- >>> <property name="matchPort" value="8090" /> >> 152c154 >> < <bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint"> >> --- >>> <bean name="8090:wayback" class="org.archive.wayback.webapp.AccessPoint"> >> 176d177 >> < <property name="collection" ref="localbdbcollection" /> >> 178c179 >> < <property name="collection" ref="localcdxcollection" /> >> --- >>> <property name="collection" ref="localbdbcollection" /> >> 179a181 >>> <property name="collection" ref="localcdxcollection" /> >> 227d228 >> < <!-- >> 229,231c230,232 >> < <bean name="8080:memento" parent="8080:wayback"> >> < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" /> >> < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" /> >> --- >>> <bean name="8090:memento" parent="8090:wayback"> >>> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" /> >>> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" /> >> 234c235 >> < <prop key="aggregationPrefix">http://localhost.archive.org:8080/list/</prop> >> --- >>> <prop key="aggregationPrefix">http://XXX.cdlib.org:8090/list/</prop> >> 247c248 >> < <property name="replayURIPrefix" value="http://localhost.archive.org:8080/memento/"/> >> --- >>> <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/memento/"/> >> 264,267c265,268 >> < <bean name="8080:list" parent="8080:memento"> >> < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" /> >> < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" /> >> < <property name="staticPrefix" value="http://localhost.archive.org:8080/list/" /> >> --- >>> <bean name="8090:list" parent="8090:memento"> >>> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" /> >>> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" /> >>> <property name="staticPrefix" value="http://XXX.cdlib.org:8090/list/" /> >> 270c271 >> < <prop key="Prefix">http://localhost.archive.org:8080/memento/</prop> >> --- >>> <prop key="Prefix">http://XXX.cdlib.org:8090/memento/</prop> >> 283c284 >> < <property name="replayURIPrefix" value="http://memento.localhost.archive.org:8080/list/"/> >> --- >>> <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/list/"/> >> 287d287 >> < --> >> Sent from my free software system<http://fsf.org/>. |
From: Herbert V. de S. <hv...@gm...> - 2012-04-25 02:20:50
|
Erik, I will discuss in detail with my team, tomorrow. A few observations: - I am not sure what the status of Memento compliance is of the latest Wayback. I know work has been going on at IA to revise the approach towards Memento compliance (closer integration between regular Wayback operation and Memento operation, e.g. Mementos with the same URI for both access points). But, again, I am not sure what the status of the work is. Next week, several of us will be at IIPC and that would be a good occasion to talk into some detail about the status. - In your below, I assume (I am it sure) /memento/http://www.abc.ca.gov/ is a TimeGate for http://www.abc.ca.gov/. A TimeGate that is approached without an Accept-Datetime value is supposed to redirect to the most recent Memento. Now, I see that the request has an Accept-Datetime. If this is indeed a TimeGate, as suggested by the redirect to the most recent Memento, it seems that the TimeGate does't "see" the Accept-Datetime. Then again, if this is a TimeGate, it doesn't behave like one with other regards: there are is no HTTP Link header, and a TimeGate should provide all kinds of HTTP Links. - I have no immediate explanation for the second redirect, except that - maybe - that could just be an internal Wayback redirect that doesn't have to do with Memento. But, I'm guessing. - Another clear indication that something is really wrong can be seen in the response of the final Memento: the Link header contains a lot of want-to-be HTTP links but not a single URI. Not a lot of linking gong on there ;-) Again, I will discuss in detail with my team, tomorrow. And hopefully get back to you with a better interpretation. And, next week at IIPC, we can discuss with colleagues from IA. Will you be there? Thanks a lot for your continued interest in Memento! Cheers Herbert Sent from my iPad On Apr 24, 2012, at 17:34, Erik Hetzner <eri...@uc...> wrote: > Hi, > > We are experimenting with the Memento configuration for Open Source > Wayback, but are experiencing some difficulties. > > We have built and installed the latest wayback from svn (r3625). We > have made minimal changes to the config, changing only hostnames and > ports, and enabling a CDX collection. (See cleaned results of diff -r > below). Wayback is working fine. We can visit, e.g., > http://XXX.cdlib.org:8090/wayback/*/http://www.abc.ca.gov > > Unfortunately, memento keeps redirecting us to the current date, and > then to the closest version to that (in the below case, 3 Feb). > > As you can see by the Link headers in the last response, a closer > version to the requested Accept-Datetime does exist. > > Does anybody have an idea of what is happening here? Let me know if > there is any other information that would help with this. > > Thank you! > > best, Erik > > Here is an example of a session, from Firebug, using Mementofox: > > GET /memento/http://www.abc.ca.gov/ HTTP/1.1 > Host: XXX.cdlib.org:8090 > User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > Accept-Language: en-us,en;q=0.5 > Accept-Encoding: gzip, deflate > Connection: keep-alive > Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT > > HTTP/1.1 302 Moved Temporarily > Server: Apache-Coyote/1.1 > Location: http://XXX.cdlib.org:8090/memento/20120424231906/http://www.abc.ca.gov/ > Content-Length: 0 > Date: Tue, 24 Apr 2012 23:19:06 GMT > > GET /memento/20120424231906/http://www.abc.ca.gov/ HTTP/1.1 > Host: XXX.cdlib.org:8090 > User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > Accept-Language: en-us,en;q=0.5 > Accept-Encoding: gzip, deflate > Connection: keep-alive > Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT > > HTTP/1.1 302 Moved Temporarily > Server: Apache-Coyote/1.1 > Location: http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/ > Content-Length: 0 > Date: Tue, 24 Apr 2012 23:19:06 GMT > > GET /memento/20120203005345/http://www.abc.ca.gov/ HTTP/1.1 > Host: XXX.cdlib.org:8090 > User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > Accept-Language: en-us,en;q=0.5 > Accept-Encoding: gzip, deflate > Connection: keep-alive > Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT > > HTTP/1.1 200 OK > Server: Apache-Coyote/1.1 > Memento-Datetime: Fri, 03 Feb 2012 00:53:45 GMT > Link: ;rel="timebundle", ;rel="original", ;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", ;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", ;rel="prev memento"; datetime="Mon, 27 Jun 2011 01:01:27 GMT" , ;rel="timemap"; type="application/link-format",;rel="timegate" > X-Archive-Guessed-Charset: cp1252 > X-Archive-Orig-Connection: close > X-Archive-Orig-Content-Length: 30168 > X-Archive-Orig-Content-Type: text/html > X-Archive-Orig-X-Powered-By: ASP.NET > X-Archive-Orig-Server: Microsoft-IIS/6.0 > X-Archive-Orig-Date: Fri, 03 Feb 2012 00:53:45 GMT > Content-Type: text/html;charset=cp1252 > Content-Length: 30835 > Date: Tue, 24 Apr 2012 23:19:06 GMT > > diff -r wayback-1.7.1/WEB-INF/CDXCollection.xml wayback-1.7.1-ours//WEB-INF/CDXCollection.xml > 32,37c32,33 > < <bean class="org.archive.wayback.resourcestore.LocationDBResourceStore"> > < <property name="db"> > < <bean class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB"> > < <property name="path" value="${wayback.basedir}/path-index.txt" /> > < </bean> > < </property> > --- >> <bean class="org.archive.wayback.resourcestore.SimpleResourceStore"> >> <property name="prefix" value="http://XXX.cdlib.org:YYYYY/arcs/"/> > 50c46 > < <property name="path" value="${wayback.basedir}/cdx-index/index.cdx" /> > --- >> <property name="path" value="/was/wayback.public.index/everything.cdx"/> > diff -r wayback-1.7.1/WEB-INF/wayback.xml wayback-1.7.1-ours//WEB-INF/wayback.xml > 18c18 > < wayback.urlprefix=http://localhost.archive.org:8080/wayback/ > --- >> wayback.urlprefix=http://XXX.cdlib.org:8090/wayback/ > 72d71 > < <import resource="BDBCollection.xml"/> > 73a73,74 >> <import resource="BDBCollection.xml"/> >> --> > 74a76 >> <!-- > 148c150 > < <property name="matchPort" value="8080" /> > --- >> <property name="matchPort" value="8090" /> > 152c154 > < <bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint"> > --- >> <bean name="8090:wayback" class="org.archive.wayback.webapp.AccessPoint"> > 176d177 > < <property name="collection" ref="localbdbcollection" /> > 178c179 > < <property name="collection" ref="localcdxcollection" /> > --- >> <property name="collection" ref="localbdbcollection" /> > 179a181 >> <property name="collection" ref="localcdxcollection" /> > 227d228 > < <!-- > 229,231c230,232 > < <bean name="8080:memento" parent="8080:wayback"> > < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" /> > < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" /> > --- >> <bean name="8090:memento" parent="8090:wayback"> >> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" /> >> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" /> > 234c235 > < <prop key="aggregationPrefix">http://localhost.archive.org:8080/list/</prop> > --- >> <prop key="aggregationPrefix">http://XXX.cdlib.org:8090/list/</prop> > 247c248 > < <property name="replayURIPrefix" value="http://localhost.archive.org:8080/memento/"/> > --- >> <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/memento/"/> > 264,267c265,268 > < <bean name="8080:list" parent="8080:memento"> > < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" /> > < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" /> > < <property name="staticPrefix" value="http://localhost.archive.org:8080/list/" /> > --- >> <bean name="8090:list" parent="8090:memento"> >> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" /> >> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" /> >> <property name="staticPrefix" value="http://XXX.cdlib.org:8090/list/" /> > 270c271 > < <prop key="Prefix">http://localhost.archive.org:8080/memento/</prop> > --- >> <prop key="Prefix">http://XXX.cdlib.org:8090/memento/</prop> > 283c284 > < <property name="replayURIPrefix" value="http://memento.localhost.archive.org:8080/list/"/> > --- >> <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/list/"/> > 287d287 > < --> > Sent from my free software system <http://fsf.org/>. |
From: Erik H. <eri...@uc...> - 2012-04-24 23:51:38
|
Hi, We are experimenting with the Memento configuration for Open Source Wayback, but are experiencing some difficulties. We have built and installed the latest wayback from svn (r3625). We have made minimal changes to the config, changing only hostnames and ports, and enabling a CDX collection. (See cleaned results of diff -r below). Wayback is working fine. We can visit, e.g., http://XXX.cdlib.org:8090/wayback/*/http://www.abc.ca.gov Unfortunately, memento keeps redirecting us to the current date, and then to the closest version to that (in the below case, 3 Feb). As you can see by the Link headers in the last response, a closer version to the requested Accept-Datetime does exist. Does anybody have an idea of what is happening here? Let me know if there is any other information that would help with this. Thank you! best, Erik Here is an example of a session, from Firebug, using Mementofox: GET /memento/http://www.abc.ca.gov/ HTTP/1.1 Host: XXX.cdlib.org:8090 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT HTTP/1.1 302 Moved Temporarily Server: Apache-Coyote/1.1 Location: http://XXX.cdlib.org:8090/memento/20120424231906/http://www.abc.ca.gov/ Content-Length: 0 Date: Tue, 24 Apr 2012 23:19:06 GMT GET /memento/20120424231906/http://www.abc.ca.gov/ HTTP/1.1 Host: XXX.cdlib.org:8090 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT HTTP/1.1 302 Moved Temporarily Server: Apache-Coyote/1.1 Location: http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/ Content-Length: 0 Date: Tue, 24 Apr 2012 23:19:06 GMT GET /memento/20120203005345/http://www.abc.ca.gov/ HTTP/1.1 Host: XXX.cdlib.org:8090 User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Memento-Datetime: Fri, 03 Feb 2012 00:53:45 GMT Link: ;rel="timebundle", ;rel="original", ;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", ;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", ;rel="prev memento"; datetime="Mon, 27 Jun 2011 01:01:27 GMT" , ;rel="timemap"; type="application/link-format",;rel="timegate" X-Archive-Guessed-Charset: cp1252 X-Archive-Orig-Connection: close X-Archive-Orig-Content-Length: 30168 X-Archive-Orig-Content-Type: text/html X-Archive-Orig-X-Powered-By: ASP.NET X-Archive-Orig-Server: Microsoft-IIS/6.0 X-Archive-Orig-Date: Fri, 03 Feb 2012 00:53:45 GMT Content-Type: text/html;charset=cp1252 Content-Length: 30835 Date: Tue, 24 Apr 2012 23:19:06 GMT diff -r wayback-1.7.1/WEB-INF/CDXCollection.xml wayback-1.7.1-ours//WEB-INF/CDXCollection.xml 32,37c32,33 < <bean class="org.archive.wayback.resourcestore.LocationDBResourceStore"> < <property name="db"> < <bean class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB"> < <property name="path" value="${wayback.basedir}/path-index.txt" /> < </bean> < </property> --- > <bean class="org.archive.wayback.resourcestore.SimpleResourceStore"> > <property name="prefix" value="http://XXX.cdlib.org:YYYYY/arcs/"/> 50c46 < <property name="path" value="${wayback.basedir}/cdx-index/index.cdx" /> --- > <property name="path" value="/was/wayback.public.index/everything.cdx"/> diff -r wayback-1.7.1/WEB-INF/wayback.xml wayback-1.7.1-ours//WEB-INF/wayback.xml 18c18 < wayback.urlprefix=http://localhost.archive.org:8080/wayback/ --- > wayback.urlprefix=http://XXX.cdlib.org:8090/wayback/ 72d71 < <import resource="BDBCollection.xml"/> 73a73,74 > <import resource="BDBCollection.xml"/> > --> 74a76 > <!-- 148c150 < <property name="matchPort" value="8080" /> --- > <property name="matchPort" value="8090" /> 152c154 < <bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint"> --- > <bean name="8090:wayback" class="org.archive.wayback.webapp.AccessPoint"> 176d177 < <property name="collection" ref="localbdbcollection" /> 178c179 < <property name="collection" ref="localcdxcollection" /> --- > <property name="collection" ref="localbdbcollection" /> 179a181 > <property name="collection" ref="localcdxcollection" /> 227d228 < <!-- 229,231c230,232 < <bean name="8080:memento" parent="8080:wayback"> < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" /> < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" /> --- > <bean name="8090:memento" parent="8090:wayback"> > <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" /> > <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" /> 234c235 < <prop key="aggregationPrefix">http://localhost.archive.org:8080/list/</prop> --- > <prop key="aggregationPrefix">http://XXX.cdlib.org:8090/list/</prop> 247c248 < <property name="replayURIPrefix" value="http://localhost.archive.org:8080/memento/"/> --- > <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/memento/"/> 264,267c265,268 < <bean name="8080:list" parent="8080:memento"> < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" /> < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" /> < <property name="staticPrefix" value="http://localhost.archive.org:8080/list/" /> --- > <bean name="8090:list" parent="8090:memento"> > <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" /> > <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" /> > <property name="staticPrefix" value="http://XXX.cdlib.org:8090/list/" /> 270c271 < <prop key="Prefix">http://localhost.archive.org:8080/memento/</prop> --- > <prop key="Prefix">http://XXX.cdlib.org:8090/memento/</prop> 283c284 < <property name="replayURIPrefix" value="http://memento.localhost.archive.org:8080/list/"/> --- > <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/list/"/> 287d287 < --> |
From: Rudolf K. <wes...@gm...> - 2012-04-16 12:26:15
|
Hi, I already sent message to heritrix group ( http://tech.groups.yahoo.com/group/archive-crawler/message/7653 ). But it probably take more sense here. We are looking for most effecient way (in terms of speed and precision) to count number of harvested documents in our archives. We dont have crawl-report.logs nor crawl.logs from whole history of our harvests. And we don't have hadoop infrastructure to make fast map-reduce operations. So contemporary approach is to work with arcreader utility: find /archives/ -name '*arc.gz*' -exec ./arcreader -d false '{}' \; > arcs.cdx then getting rid of patterns"CDX b e a m s c v n g" and "filedesc://" and finaly counting lines with wc -l resulting with number of documents in our archive. Is there any more faster or more precise apporach? Occasionally arcreader doesnt like entries in ARC files regarding corrupt Zips (invalid stored block lengths, corrupt GZIP traielers etc.) or by means of whole corrupted ARC (*.arc.gz is not an Internet Archive ARC file). Such stastics are depedening on arcreader ability to read properly ARC files and there is chance that statistic will change with new version of arcreader. Thank you very much, rudolf |
From: Armin S. <sch...@gm...> - 2012-04-16 08:01:55
|
Hello, i am currently working on a web archive project for our university. First of all, thank you for the tools you provide, they are really neat and save a lot of work. We are currently experiencing problems with our local wayback installation. We moved a set of archives to a new deployment of Wayback. When i search for an URL, wayback correctly shows me all the capture dates. The problem is now, that i can only replay the sites where the asterisk is shown (indicating that the content has changed on this capture). When i try to click on one of the other capture dates, it seems loading for a while and then times out. Is this a configuration problem? It worked just fine before we moved it to the new deployment. I really appreciate every hint. Thanks a lot in advance! Kind Regards, Armin |
From: <Dom...@sw...> - 2012-04-10 16:08:01
|
I get this java.lang.NullPointerException error message when I try to open an archived url. However this error does not appear all the time. It seems only urls that are redirected (Got an HTTP 302 response at crawl time) are affected. Does annyone has a clue and can give me a hint? I have Wayback 1.6.0 installed on apache-tomcat-6.0.35, java version (JRE) 1.6.0 and use a cdx index Thank you and best regards Dominik 2012-04-10 16:13:51.071 SEVERE thread-4 org.apache.catalina.core.StandardWrapperValve.invoke() Servlet.service() for servlet default threw exception java.lang.NullPointerException at java.lang.String.compareTo(String.java:441) at org.archive.wayback.resourceindex.filters.SelfRedirectFilter.filterObject (SelfRedirectFilter.java:63) at org.archive.wayback.resourceindex.filters.SelfRedirectFilter.filterObject (SelfRedirectFilter.java:36) at org.archive.wayback.util.ObjectFilterChain.filterObject (ObjectFilterChain.java:81) at org.archive.wayback.util.ObjectFilterIterator.hasNext (ObjectFilterIterator.java:61) at org.archive.wayback.resourceindex.LocalResourceIndex.doCaptureQuery (LocalResourceIndex.java:185) at org.archive.wayback.resourceindex.LocalResourceIndex.query (LocalResourceIndex.java:275) at org.archive.wayback.webapp.AccessPoint.handleReplay (AccessPoint.java:309) at org.archive.wayback.webapp.AccessPoint.handleRequest (AccessPoint.java:213) at org.archive.wayback.util.webapp.RequestMapper.handleRequest (RequestMapper.java:183) at org.archive.wayback.util.webapp.RequestFilter.doFilter (RequestFilter.java:109) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke (StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke (StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke (StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke (ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke (StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service (CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process (Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run (JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:736) |
From: raffaele m. <raf...@at...> - 2012-03-02 10:47:37
|
On Feb 18, 2012, at 3:22 PM, raffaele messuti wrote: > i'm not finding documentation for enabling api (1.6.0 or later) > https://webarchive.jira.com/wiki/display/wayback/OS+Wayback+API+Documentation could you help me about this? how i can enable /xmlquery ? i've deployed the wayback in webapps/ROOT and the AccessPoint is configured as follow: <bean name="+" class="org.archive.wayback.webapp.ServerRelativeArchivalRedirect"> <property name="matchPort" value="8080" /> <property name="useCollection" value="true" /> </bean> <bean name="8080" class="org.archive.wayback.webapp.AccessPoint"> but /xmlquery give me 404 thanks -- raf...@at... |
From: Mat K. <mk...@cs...> - 2012-02-28 20:40:41
|
I wish to use cdx-indexer as a means to validate WARCs but would like to do so without the need for a full Wayback installation. Is there a way to decouple cdx-indexer from wayback for stand-alone execution? I hope to use cdx-indexer's archive validation capabilities on a machine without net access or server software (i.e. a file system with many WARC files). Thank you, Mat |
From: Ko, L. <Lau...@un...> - 2012-02-22 22:07:03
|
Is there support for replaying uncompressed ARC files in Wayback 1.6? I get a Resource Not Available--The Resource you have requested is temporarily unavailable message when I try to replay them. In Tomcat, it logs: "Feb 21, 2012 5:22:47 PM org.archive.wayback.resourcestore.LocationDBResourceStore retrieveResource WARNING: Unable to retrieve resource from..." However, in the Apache logs of the server where the ARC is hosted I see a 206 response to the Wayback request. Thanks in advance! Lauren Ko Web Archiving Programmer UNT Libraries |
From: Kris C. N. <kca...@ar...> - 2012-02-22 01:01:17
|
All: We decided to extend the deadline until Feb 29 for getting applications in. Can you send the attached out to the places oyu posted previously, and highlight that the application deadline has been Extended? Thank you. Bill Conference Chair, 2013 iConference http://www.ischools.org/iConference13/2013index/ ________________________ William Moen, Ph.D. Associate Dean for Research Director, Texas Center for Digital Knowledge (TxCDK) Associate Professor, Department of Library and Information Sciences College of Information, University of North Texas 1155 Union Circle 311068, Denton, Texas 76203-5017 Email: wil...@un... Website: http://www.unt.edu/wmoen Voice: 940-565-2473 Fax: 940-369-7872 -----Original Message----- From: Moen, William Sent: Tuesday, January 03, 2012 12:09 PM To: 'Potter, Abigail'; Aaron Binns (aa...@ar...); Hartman, Cathy; Kris Carpenter Negulescu (kca...@ar...) Cc: Miksa, Shawne Subject: RE: IIPC Doctoral Support Award -- Please Distribute Please find attached final version of the Call for Applications announcement for the award. Thanks. Bill ________________________ William Moen, Ph.D. Associate Dean for Research Director, Texas Center for Digital Knowledge (TxCDK) Associate Professor, Department of Library and Information Sciences College of Information, University of North Texas 1155 Union Circle 311068, Denton, Texas 76203-5017 Email: Wil...@un... Website: http://www.unt.edu/wmoen Voice: 940-565-2473 Fax: 940-369-7872 |
From: raffaele m. <raf...@at...> - 2012-02-18 14:22:53
|
i'm not finding documentation for enabling api (1.6.0 or later) https://webarchive.jira.com/wiki/display/wayback/OS+Wayback+API+Documentation hints? ciao -- raf...@at... |
From: Matjaž K. <Mat...@nu...> - 2012-02-02 08:44:22
|
Hi again, I've done some more research.. The situation seems to be much clearer now, but I would definitely need some help from you to. XMLquery is (http://nukrobi2.nuk.uni-lj.si:8080/wayback/xmlquery?type=urlquery&url=http://www.delo.si/ ) Wayback: If wayback-core-1.6.0.jar (in lib directory) then XMLquery doesn't work - redirection to table with date harvested for specified site. else if wayback-core-1.6.1.jar (in lib directory) then XMLquery works fine, but - after tried to go to archived site http://nukrobi2.nuk.uni-lj.si:8080/wayback/20110307160805/http://www.delo.si/<http://nukrobi2.nuk.uni-lj.si:8080/wayback/20110307160805/http:/www.delo.si/> I get >Resource Not Available The Resource you have requested is temporarily unavailable. Please try again later.< Else if wayback-core-1.7.0.jar (in lib directory) then XMLquery doesn't work - redirection to table with date harvested for specified site. End As I said, I used to have both (1.6.1 and 1.6.0 (in two different sites)) but I'm sure that's not the way to have two indexes, and two sites to resolve the problem. Another thing - with two instances of Wayback RAM consumption is enormous, and JVM runs out of memory constantly. Could anybody know what mystery is going on here? Best, matjaz |
From: Youssef E. <you...@gm...> - 2012-02-02 08:28:09
|
At the Bibliotheca Alexandrina, we are in the process of migrating to the open-source Wayback. Our uncompressed CDX is around 13.5 TB. Compressed, those should come down to around 2 TB by rough interpolation. We have successfully deployed a Hadoop instance. How do we compress the 13.5 TB of CDX in HDFS such that the result is usable by the Wayback? Does the open-source Wayback expect the compressed CDX to be in ziplines format? Any hints or recommendations are much appreciated. - Youssef Eldakar |
From: Matjaž K. <Mat...@nu...> - 2012-02-01 17:41:59
|
Hello again, After running bunch of tests, tried several JVM, two Tomcat (6.35 and 7.25), etc. I think I know why Tomcat is slow. The situation is like this: We have WCT and Wayback 1.6.0. Because Wayback 1.6.0 doesn't works alright with XMLquery (as described before in this site - http://sourceforge.net/mailarchive/forum.php?thread_name=23CFE6A5AD05E34EBCAD84B004C313BA01F511BA1D%40LCXCLMB01.LCDS.LOC.GOV&forum_name=archive-access-discuss ) Brad made a version 1.6.1. Version 1.6.1 works fine with XMLQUERY but has (maybe only in our server) memory leak. Within an hour JVS stays without memory (I'm talking about 20GB RAM) Another thing - version 1.6.1 doesn't redirect (it may be that just in our implementation) to result page - it keeps saying something about resource not found. So, I had to have 2 installations of Wayback (1.6.1 - to get date results in XML (XMLquery) and 1.6.0 - to redirect to destination. After removing 1.6.1 form webapps Tomcat runs smoothly even if I have SOLR for indexing and it has approx. 50gb of index. And now - the question: Could anyone check this 1.6.1 version (http://home.us.archive.org/~brad/wayback-1.6.1RC3.tar.gz)for memory leaks and/or why it doesn't redirect as 1.6.0 used to Or >to make< 1.6.0. to work with XMLquery. Version 1.6.0. keeps redirecting to site just like Gina Jones mentioned in thread (link is above) The web page which uses this code is: http://arhiv.nuk.uni-lj.si at the moment you cannot browse through categories (I had to remove 1.6.1) but can use fulltext index and then redirection to archived site via Wayback 1.6.0. people, thanks in advance. I'll be glad for help and for checking the site http://arhiv.nuk.uni-lj.si best wishes, Matjaž |