From: Ahmed A. <aa...@cs...> - 2012-04-25 05:54:24
|
Hello Eric, Additional to the details explanation from Herbert, I would like to add two things based on my experience in helping in fixing a memento problem with global wayback machine. Coding Inconsistency: two months ago, the wayback memento code has inconsistency, some classes were missing so it has a strange behavior (not similar to yours). IA has fixed it and it should be working now. Also, IA has moved wayback open source to GitHub (https://github.com/internetarchive/wayback ). Could you send me the link of the repository that you grab the code from? I'm sorry I couldn't find r3625 version. Configuration: If the code is up-to-date, then we may have a configuration problem. I believe your attached configuration is correct. I have the same configuration on my machine and it works fine. So if you can confirm the code version and I will try to make sure of the consistency of this version. Thanks for your interest in memento. Best regards, Ahmed AlSum Internet Archive, Software Engineer Intern Old Dominion University, PhD Student On 04/24/2012 07:20 PM, Herbert Van de Sompel wrote: > Erik, > > I will discuss in detail with my team, tomorrow. A few observations: > > - I am not sure what the status of Memento compliance is of the latest Wayback. I know work has been going on at IA to revise the approach towards Memento compliance (closer integration between regular Wayback operation and Memento operation, e.g. Mementos with the same URI for both access points). But, again, I am not sure what the status of the work is. Next week, several of us will be at IIPC and that would be a good occasion to talk into some detail about the status. > > - In your below, I assume (I am it sure) /memento/http://www.abc.ca.gov/ is a TimeGate for http://www.abc.ca.gov/. A TimeGate that is approached without an Accept-Datetime value is supposed to redirect to the most recent Memento. Now, I see that the request has an Accept-Datetime. If this is indeed a TimeGate, as suggested by the redirect to the most recent Memento, it seems that the TimeGate does't "see" the Accept-Datetime. Then again, if this is a TimeGate, it doesn't behave like one with other regards: there are is no HTTP Link header, and a TimeGate should provide all kinds of HTTP Links. > > - I have no immediate explanation for the second redirect, except that - maybe - that could just be an internal Wayback redirect that doesn't have to do with Memento. But, I'm guessing. > > - Another clear indication that something is really wrong can be seen in the response of the final Memento: the Link header contains a lot of want-to-be HTTP links but not a single URI. Not a lot of linking gong on there ;-) > > Again, I will discuss in detail with my team, tomorrow. And hopefully get back to you with a better interpretation. And, next week at IIPC, we can discuss with colleagues from IA. Will you be there? > > Thanks a lot for your continued interest in Memento! > > Cheers > > Herbert > > Sent from my iPad > > On Apr 24, 2012, at 17:34, Erik Hetzner<eri...@uc...> wrote: > >> Hi, >> >> We are experimenting with the Memento configuration for Open Source >> Wayback, but are experiencing some difficulties. >> >> We have built and installed the latest wayback from svn (r3625). We >> have made minimal changes to the config, changing only hostnames and >> ports, and enabling a CDX collection. (See cleaned results of diff -r >> below). Wayback is working fine. We can visit, e.g., >> http://XXX.cdlib.org:8090/wayback/*/http://www.abc.ca.gov >> >> Unfortunately, memento keeps redirecting us to the current date, and >> then to the closest version to that (in the below case, 3 Feb). >> >> As you can see by the Link headers in the last response, a closer >> version to the requested Accept-Datetime does exist. >> >> Does anybody have an idea of what is happening here? Let me know if >> there is any other information that would help with this. >> >> Thank you! >> >> best, Erik >> >> Here is an example of a session, from Firebug, using Mementofox: >> >> GET /memento/http://www.abc.ca.gov/ HTTP/1.1 >> Host: XXX.cdlib.org:8090 >> User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 >> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 >> Accept-Language: en-us,en;q=0.5 >> Accept-Encoding: gzip, deflate >> Connection: keep-alive >> Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT >> >> HTTP/1.1 302 Moved Temporarily >> Server: Apache-Coyote/1.1 >> Location: http://XXX.cdlib.org:8090/memento/20120424231906/http://www.abc.ca.gov/ >> Content-Length: 0 >> Date: Tue, 24 Apr 2012 23:19:06 GMT >> >> GET /memento/20120424231906/http://www.abc.ca.gov/ HTTP/1.1 >> Host: XXX.cdlib.org:8090 >> User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 >> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 >> Accept-Language: en-us,en;q=0.5 >> Accept-Encoding: gzip, deflate >> Connection: keep-alive >> Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT >> >> HTTP/1.1 302 Moved Temporarily >> Server: Apache-Coyote/1.1 >> Location: http://XXX.cdlib.org:8090/memento/20120203005345/http://www.abc.ca.gov/ >> Content-Length: 0 >> Date: Tue, 24 Apr 2012 23:19:06 GMT >> >> GET /memento/20120203005345/http://www.abc.ca.gov/ HTTP/1.1 >> Host: XXX.cdlib.org:8090 >> User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0 >> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 >> Accept-Language: en-us,en;q=0.5 >> Accept-Encoding: gzip, deflate >> Connection: keep-alive >> Accept-Datetime: Sun, 04 Feb 2007 12:00:00 GMT >> >> HTTP/1.1 200 OK >> Server: Apache-Coyote/1.1 >> Memento-Datetime: Fri, 03 Feb 2012 00:53:45 GMT >> Link: ;rel="timebundle", ;rel="original", ;rel="last memento"; datetime="Fri, 03 Feb 2012 00:53:45 GMT", ;rel="first memento"; datetime="Wed, 04 Feb 2009 07:23:40 GMT", ;rel="prev memento"; datetime="Mon, 27 Jun 2011 01:01:27 GMT" , ;rel="timemap"; type="application/link-format",;rel="timegate" >> X-Archive-Guessed-Charset: cp1252 >> X-Archive-Orig-Connection: close >> X-Archive-Orig-Content-Length: 30168 >> X-Archive-Orig-Content-Type: text/html >> X-Archive-Orig-X-Powered-By: ASP.NET >> X-Archive-Orig-Server: Microsoft-IIS/6.0 >> X-Archive-Orig-Date: Fri, 03 Feb 2012 00:53:45 GMT >> Content-Type: text/html;charset=cp1252 >> Content-Length: 30835 >> Date: Tue, 24 Apr 2012 23:19:06 GMT >> >> diff -r wayback-1.7.1/WEB-INF/CDXCollection.xml wayback-1.7.1-ours//WEB-INF/CDXCollection.xml >> 32,37c32,33 >> < <bean class="org.archive.wayback.resourcestore.LocationDBResourceStore"> >> < <property name="db"> >> < <bean class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB"> >> < <property name="path" value="${wayback.basedir}/path-index.txt" /> >> < </bean> >> < </property> >> --- >>> <bean class="org.archive.wayback.resourcestore.SimpleResourceStore"> >>> <property name="prefix" value="http://XXX.cdlib.org:YYYYY/arcs/"/> >> 50c46 >> < <property name="path" value="${wayback.basedir}/cdx-index/index.cdx" /> >> --- >>> <property name="path" value="/was/wayback.public.index/everything.cdx"/> >> diff -r wayback-1.7.1/WEB-INF/wayback.xml wayback-1.7.1-ours//WEB-INF/wayback.xml >> 18c18 >> < wayback.urlprefix=http://localhost.archive.org:8080/wayback/ >> --- >>> wayback.urlprefix=http://XXX.cdlib.org:8090/wayback/ >> 72d71 >> < <import resource="BDBCollection.xml"/> >> 73a73,74 >>> <import resource="BDBCollection.xml"/> >>> --> >> 74a76 >>> <!-- >> 148c150 >> < <property name="matchPort" value="8080" /> >> --- >>> <property name="matchPort" value="8090" /> >> 152c154 >> < <bean name="8080:wayback" class="org.archive.wayback.webapp.AccessPoint"> >> --- >>> <bean name="8090:wayback" class="org.archive.wayback.webapp.AccessPoint"> >> 176d177 >> < <property name="collection" ref="localbdbcollection" /> >> 178c179 >> < <property name="collection" ref="localcdxcollection" /> >> --- >>> <property name="collection" ref="localbdbcollection" /> >> 179a181 >>> <property name="collection" ref="localcdxcollection" /> >> 227d228 >> < <!-- >> 229,231c230,232 >> < <bean name="8080:memento" parent="8080:wayback"> >> < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" /> >> < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" /> >> --- >>> <bean name="8090:memento" parent="8090:wayback"> >>> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" /> >>> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" /> >> 234c235 >> < <prop key="aggregationPrefix">http://localhost.archive.org:8080/list/</prop> >> --- >>> <prop key="aggregationPrefix">http://XXX.cdlib.org:8090/list/</prop> >> 247c248 >> < <property name="replayURIPrefix" value="http://localhost.archive.org:8080/memento/"/> >> --- >>> <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/memento/"/> >> 264,267c265,268 >> < <bean name="8080:list" parent="8080:memento"> >> < <property name="replayPrefix" value="http://localhost.archive.org:8080/memento/" /> >> < <property name="queryPrefix" value="http://localhost.archive.org:8080/list/" /> >> < <property name="staticPrefix" value="http://localhost.archive.org:8080/list/" /> >> --- >>> <bean name="8090:list" parent="8090:memento"> >>> <property name="replayPrefix" value="http://XXX.cdlib.org:8090/memento/" /> >>> <property name="queryPrefix" value="http://XXX.cdlib.org:8090/list/" /> >>> <property name="staticPrefix" value="http://XXX.cdlib.org:8090/list/" /> >> 270c271 >> < <prop key="Prefix">http://localhost.archive.org:8080/memento/</prop> >> --- >>> <prop key="Prefix">http://XXX.cdlib.org:8090/memento/</prop> >> 283c284 >> < <property name="replayURIPrefix" value="http://memento.localhost.archive.org:8080/list/"/> >> --- >>> <property name="replayURIPrefix" value="http://XXX.cdlib.org:8090/list/"/> >> 287d287 >> < --> >> Sent from my free software system<http://fsf.org/>. |