You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(5) |
Oct
(17) |
Nov
(30) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(4) |
Feb
(14) |
Mar
(8) |
Apr
(11) |
May
(2) |
Jun
(13) |
Jul
(9) |
Aug
(2) |
Sep
(2) |
Oct
(9) |
Nov
(20) |
Dec
(9) |
2007 |
Jan
(6) |
Feb
(4) |
Mar
(6) |
Apr
(7) |
May
(6) |
Jun
(6) |
Jul
(4) |
Aug
(3) |
Sep
(9) |
Oct
(26) |
Nov
(23) |
Dec
(2) |
2008 |
Jan
(17) |
Feb
(19) |
Mar
(16) |
Apr
(27) |
May
(3) |
Jun
(21) |
Jul
(21) |
Aug
(8) |
Sep
(13) |
Oct
(7) |
Nov
(8) |
Dec
(8) |
2009 |
Jan
(18) |
Feb
(14) |
Mar
(27) |
Apr
(14) |
May
(10) |
Jun
(14) |
Jul
(18) |
Aug
(30) |
Sep
(18) |
Oct
(12) |
Nov
(5) |
Dec
(26) |
2010 |
Jan
(27) |
Feb
(3) |
Mar
(8) |
Apr
(4) |
May
(6) |
Jun
(13) |
Jul
(25) |
Aug
(11) |
Sep
(2) |
Oct
(4) |
Nov
(7) |
Dec
(6) |
2011 |
Jan
(25) |
Feb
(17) |
Mar
(25) |
Apr
(23) |
May
(15) |
Jun
(12) |
Jul
(8) |
Aug
(13) |
Sep
(4) |
Oct
(17) |
Nov
(7) |
Dec
(6) |
2012 |
Jan
(4) |
Feb
(7) |
Mar
(1) |
Apr
(10) |
May
(11) |
Jun
(5) |
Jul
(7) |
Aug
(1) |
Sep
(1) |
Oct
(5) |
Nov
(6) |
Dec
(13) |
2013 |
Jan
(9) |
Feb
(7) |
Mar
(3) |
Apr
(1) |
May
(3) |
Jun
(19) |
Jul
(3) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2014 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(4) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2019 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Bradley T. <br...@ar...> - 2011-03-01 12:11:17
|
I'm not aware of any way to record fetched-but-not-stored metadata in ARC files, only in WARC files. Is the information about the second(,third,etc) downloaded-but-not-stored being recorded only in the crawl logs? Forging CDX records from information in crawl logs may be possible, but as far as I know has never been attempted. We use WARC files with content-digest duplicate reduction (as opposed to sending if-modified/if-none-match headers, which has only been used and replayed via Wayback experimentally.) Brad On 3/1/11 4:43 PM, Natalia Torres wrote: > Hi Brad, > > thanks a lot for your advice. I added the "dedupeRecords" property to > the LocalResourceIndex Bean in CDXCollection.xml and restart tomcat, but > I can't view correctly the crawls as before: viewing the first crawl > everything is correct and viewing the second version the images/css/pdf > (only crawled at the first time) aren't displayed... > > We are using arc files, the behavior is the same that using warc or we > need to change to warc? > > Here is the CDXCollection.xml file: > > <?xml version="1.0" encoding="UTF-8"?> > <beans xmlns="http://www.springframework.org/schema/beans" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="http://www.springframework.org/schema/beans > http://www.springframework.org/schema/beans/spring-beans-2.5.xsd" > default-init-method="init"> > > <bean id="localcdxcollection" > class="org.archive.wayback.webapp.WaybackCollection"> > <property name="resourceStore"> > <bean class="org.archive.wayback.resourcestore.LocationDBResourceStore"> > <property name="db"> > <bean > class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB"> > <property name="path" value="${wayback.basedir}/path-ind > ex.txt" /> > </bean> > </property> > </bean> > </property> > > <property name="resourceIndex"> > <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> > <property name="canonicalizer" ref="waybackCanonicalizer" /> > <property name="source"> > > <!-- > A single CDX SearchResultSource example. > --> > <bean class="org.archive.wayback.resourceindex.cdx.CDXIndex"> > <property name="path" value="${wayback.basedir}/dedup2011.cdx" /> > </bean> > > </property> > <property name="maxRecords" value="10000" /> > <property name="dedupeRecords" value="true" /> > </bean> > </property> > </bean> > > </beans> > > thanks, > > natalia > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Natalia T. <nt...@ce...> - 2011-03-01 09:43:33
|
Hi Brad, thanks a lot for your advice. I added the "dedupeRecords" property to the LocalResourceIndex Bean in CDXCollection.xml and restart tomcat, but I can't view correctly the crawls as before: viewing the first crawl everything is correct and viewing the second version the images/css/pdf (only crawled at the first time) aren't displayed... We are using arc files, the behavior is the same that using warc or we need to change to warc? Here is the CDXCollection.xml file: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd" default-init-method="init"> <bean id="localcdxcollection" class="org.archive.wayback.webapp.WaybackCollection"> <property name="resourceStore"> <bean class="org.archive.wayback.resourcestore.LocationDBResourceStore"> <property name="db"> <bean class="org.archive.wayback.resourcestore.locationdb.FlatFileResourceFileLocationDB"> <property name="path" value="${wayback.basedir}/path-ind ex.txt" /> </bean> </property> </bean> </property> <property name="resourceIndex"> <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> <property name="canonicalizer" ref="waybackCanonicalizer" /> <property name="source"> <!-- A single CDX SearchResultSource example. --> <bean class="org.archive.wayback.resourceindex.cdx.CDXIndex"> <property name="path" value="${wayback.basedir}/dedup2011.cdx" /> </bean> </property> <property name="maxRecords" value="10000" /> <property name="dedupeRecords" value="true" /> </bean> </property> </bean> </beans> thanks, natalia |
From: Bradley T. <br...@ar...> - 2011-03-01 00:08:24
|
Hi Natalia, I just looked at the default Spring config files and notice there's no "LocalResourceIndex.dedupeRecords" property mentioned, even if commented out. Adding this property to a LocalResourceIndex Bean: <property name="dedupeRecords" value="true" /> Will cause Wayback to attempt to use data from a previously seen capture to populate subsequent dedupe records. Since records that were actually stored will appear in the index before the dedupe records, a forward scan through them will encounter the data for the copy that was stored. This data is overlayed onto subsequent matching dedupe records, and from the perspective of the rest of Wayback, they look like normal captures. When Replaying these records, the capture is fetched from the most recent copy that was stored. At IA we've been using "duplicate reduction WARC" files with Wayback for a couple years now. Let me know if you run into problems with this, or have more questions, Brad On 3/1/11 12:03 AM, Natalia Torres wrote: > Hy all > > Is anyone using dedup crawling with Heritrix and then displaying them > with wayback? Has anybody encountered any problem? > > Can anybody help us? > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Natalia T. <nt...@ce...> - 2011-02-28 17:03:30
|
Hy all Is anyone using dedup crawling with Heritrix and then displaying them with wayback? Has anybody encountered any problem? Can anybody help us? |
From: Erik H. <eri...@uc...> - 2011-02-26 00:25:20
|
At Fri, 25 Feb 2011 16:40:05 -0700 (MST), Lyudmila L. Balakireva wrote: > > > Thank you for information. > > I am wondering if any parameter can be set at wayback.xml to fix my "not > host only " url prefix > > if I access > http://mementoarchive.lanl.gov/store/waybacktest/ > > and put http://news.bbc.co.uk/ to the form I got redirected to > http://mementoarchive.lanl.gov/query?type=urlquery&url=http%3A%2F%2Fnews.bbc.co.uk%2F&date=&Submit=Take+Me+Back > not http://mementoarchive.lanl.gov/store/waybacktest/query > even I set > > <property name="urlRoot" > value="http://mementoarchive.lanl.gov/store/waybacktest/"/> > and tried > <property name="queryPrefix" > value="http://mementoarchive.lanl.gov/store/waybacktest/" /> > at the wayback.xml Hi Lyudmila, We don’t use the query interface, so I’m afraid I can’t help here. Sorry! best, Erik |
From: Lyudmila L. B. <lu...@la...> - 2011-02-25 23:40:14
|
Thank you for information. I am wondering if any parameter can be set at wayback.xml to fix my "not host only " url prefix if I access http://mementoarchive.lanl.gov/store/waybacktest/ and put http://news.bbc.co.uk/ to the form I got redirected to http://mementoarchive.lanl.gov/query?type=urlquery&url=http%3A%2F%2Fnews.bbc.co.uk%2F&date=&Submit=Take+Me+Back not http://mementoarchive.lanl.gov/store/waybacktest/query even I set <property name="urlRoot" value="http://mementoarchive.lanl.gov/store/waybacktest/"/> and tried <property name="queryPrefix" value="http://mementoarchive.lanl.gov/store/waybacktest/" /> at the wayback.xml thanks, Lyudmila > At Thu, 24 Feb 2011 16:48:23 -0700 (MST), > Lyudmila L. Balakireva wrote: >> >> Hi, >> I need help with configuring wayback.xml. >> >> We have httpd apache configured with tomcat via ajp connector >> >> LoadModule proxy_module modules/mod_proxy.so >> LoadModule proxy_ajp_module modules/mod_proxy_ajp.so >> >> <IfModule mod_proxy.c> >> # turn off forward proxy >> ProxyRequests Off >> # turn on reverse proxy for tomcat >> >> ProxyPass /store/ ajp://localhost:8009/ >> ProxyPassReverse /store/ ajp://localhost:8009/ >> <Proxy *> >> Order deny,allow >> Allow from all >> </Proxy> >> </IfModule> >> >> so any application installed under tomcat webapps for example >> webapps/test >> can be accessed as http://myarchive.org/store/test >> I am deploying wayback application as wayback-1.6.0.war >> what should be settings in wayback.xml >> I have >> >> wayback.urlprefix=http://myarchive.org/store/wayback-1.6.0/wayback/ >> >> <bean name="80:wayback" class="org.archive.wayback.webapp.AccessPoint"> >> >> and I have 404 with http://myarchive.org/store/wayback-1.6.0/wayback/ >> >> please not that tomcat internally sees request url as >> http://myarchive.org:80/test/ for my sample application. > > Hi Lyudmila, > > Our config is similar, but uses http proxy rather than AJP. Here is > what we have. In our apache.conf: > > LoadModule rewrite_module > /cdlcommon/products/httpd-2.2.8/modules/mod_rewrite.so > LoadModule proxy_module > /cdlcommon/products/httpd-2.2.8/modules/mod_proxy.so > LoadModule proxy_balancer_module > /cdlcommon/products/httpd-2.2.8/modules/mod_proxy_balancer.so > LoadModule proxy_http_module > /cdlcommon/products/httpd-2.2.8/modules/mod_proxy_http.so > > # workaround for Wayback https > RewriteCond %{REQUEST_URI} ^(.*)https:(.*) > RewriteRule ^/wayback.public.* %1http:%2 [P] > > # Redirect wayback requests to wayback cluster > RewriteRule ^/wayback\.public/(.*)$ > balancer://wayback_cluster%{REQUEST_URI} [P,QSA,L] > > # Define the wayback cluster > <Proxy balancer://wayback_cluster> > BalancerMember http://XXX:37264 max=1 acquire=1 > BalancerMember http://YYY:37264 max=1 acquire=1 > </Proxy> > > We have 2 load balanced wayback servers. They listen on port 37264. > Here is part of the config: > > <bean name="37264:was" class="org.archive.wayback.webapp.AccessPoint"> > <property name="urlRoot" > value="http://webarchives.cdlib.org/wayback.public/"/> > ... > </bean> > > As you can see, all requests to webarchives.cdlib.org/wayback.public/ > are sent to the wayback servers. The name of the webapp itself (in the > tomcat/webapps directory) is wayback.public. > > Hope that helps. > > best, Erik > Sent from my free software system <http://fsf.org/>. > ------------------------------------------------------------------------------ > Free Software Download: Index, Search & Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT > data > generated by your applications, servers and devices whether physical, > virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
From: Erik H. <eri...@uc...> - 2011-02-25 20:59:04
|
At Thu, 24 Feb 2011 16:48:23 -0700 (MST), Lyudmila L. Balakireva wrote: > > Hi, > I need help with configuring wayback.xml. > > We have httpd apache configured with tomcat via ajp connector > > LoadModule proxy_module modules/mod_proxy.so > LoadModule proxy_ajp_module modules/mod_proxy_ajp.so > > <IfModule mod_proxy.c> > # turn off forward proxy > ProxyRequests Off > # turn on reverse proxy for tomcat > > ProxyPass /store/ ajp://localhost:8009/ > ProxyPassReverse /store/ ajp://localhost:8009/ > <Proxy *> > Order deny,allow > Allow from all > </Proxy> > </IfModule> > > so any application installed under tomcat webapps for example webapps/test > can be accessed as http://myarchive.org/store/test > I am deploying wayback application as wayback-1.6.0.war > what should be settings in wayback.xml > I have > > wayback.urlprefix=http://myarchive.org/store/wayback-1.6.0/wayback/ > > <bean name="80:wayback" class="org.archive.wayback.webapp.AccessPoint"> > > and I have 404 with http://myarchive.org/store/wayback-1.6.0/wayback/ > > please not that tomcat internally sees request url as > http://myarchive.org:80/test/ for my sample application. Hi Lyudmila, Our config is similar, but uses http proxy rather than AJP. Here is what we have. In our apache.conf: LoadModule rewrite_module /cdlcommon/products/httpd-2.2.8/modules/mod_rewrite.so LoadModule proxy_module /cdlcommon/products/httpd-2.2.8/modules/mod_proxy.so LoadModule proxy_balancer_module /cdlcommon/products/httpd-2.2.8/modules/mod_proxy_balancer.so LoadModule proxy_http_module /cdlcommon/products/httpd-2.2.8/modules/mod_proxy_http.so # workaround for Wayback https RewriteCond %{REQUEST_URI} ^(.*)https:(.*) RewriteRule ^/wayback.public.* %1http:%2 [P] # Redirect wayback requests to wayback cluster RewriteRule ^/wayback\.public/(.*)$ balancer://wayback_cluster%{REQUEST_URI} [P,QSA,L] # Define the wayback cluster <Proxy balancer://wayback_cluster> BalancerMember http://XXX:37264 max=1 acquire=1 BalancerMember http://YYY:37264 max=1 acquire=1 </Proxy> We have 2 load balanced wayback servers. They listen on port 37264. Here is part of the config: <bean name="37264:was" class="org.archive.wayback.webapp.AccessPoint"> <property name="urlRoot" value="http://webarchives.cdlib.org/wayback.public/"/> ... </bean> As you can see, all requests to webarchives.cdlib.org/wayback.public/ are sent to the wayback servers. The name of the webapp itself (in the tomcat/webapps directory) is wayback.public. Hope that helps. best, Erik |
From: Lyudmila L. B. <lu...@la...> - 2011-02-24 23:48:32
|
Hi, I need help with configuring wayback.xml. We have httpd apache configured with tomcat via ajp connector LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_ajp_module modules/mod_proxy_ajp.so <IfModule mod_proxy.c> # turn off forward proxy ProxyRequests Off # turn on reverse proxy for tomcat ProxyPass /store/ ajp://localhost:8009/ ProxyPassReverse /store/ ajp://localhost:8009/ <Proxy *> Order deny,allow Allow from all </Proxy> </IfModule> so any application installed under tomcat webapps for example webapps/test can be accessed as http://myarchive.org/store/test I am deploying wayback application as wayback-1.6.0.war what should be settings in wayback.xml I have wayback.urlprefix=http://myarchive.org/store/wayback-1.6.0/wayback/ <bean name="80:wayback" class="org.archive.wayback.webapp.AccessPoint"> and I have 404 with http://myarchive.org/store/wayback-1.6.0/wayback/ please not that tomcat internally sees request url as http://myarchive.org:80/test/ for my sample application. thanks for help, Lyudmila Balakireva |
From: Awakash B. <abo...@ac...> - 2011-02-24 15:15:16
|
Hello Brad, I've made this update to read from another location. After the restart, the settings seems good. But I'm still seeing a 404 error after I click on the 'Take Me Back' submit button. Here is the URL: http://ipaddress:8080/query?type=urlquery&url=http%3A%2F%2Fwww.mysite.co m&date=2009&Submit=Take+Me+Back Any suggestions on the issue? I think Tomcat still isn't reading the .warc and .warc.gz files (or the .manifest, .log files). Best, Awakash ________________________________ From: Bradley Tofel [mailto:br...@ar...] Sent: Friday, February 04, 2011 2:44 AM To: Awakash Bodiwala Cc: arc...@li...; Jennie Corman Subject: Re: [Archive-access-discuss] Instructions on running wayback and to unpack files Hi Awakash, You can change the basedir to whatever is simpler for your installation, in this case, likely /home/site/archivefiles/ or wherever they will show up - no need to move them to /tmp/wayback - that's just the default directory wayback uses in the default configuration. Let me know how this works for you, Brad |
From: Crawford, L. <Lew...@bl...> - 2011-02-16 16:51:16
|
Hi, We have run into this problem when testing 1.6 with a view to moving our production instance. With 1.42 we had configured wayback to run in a "wayback" context on tomcat with an AccessPoint named "archive" to give a url http://webarchive.org.uk:8080/wayback/archive/ However I have not been able to recreate this url structure with the 1.6 configuration. Running wayback in the ROOT context with AccessPoint "archive" works but gives us http://webarchive.org.uk:8080/archive/ and running wayback in "wayback" context with no named AccessPoint gives http://webarchive.org.uk:8080/wayback/ Running in a in "wayback" context with AccessPoint "access" simply doesn't work. The staticPrefix value comes through as "/" so image links are broken on the initial wayback page, but more importantly /wayback/archive/ results in 404. Am I missing something? Are there any work a rounds as we need to preserve the url structure /wayback/archive/ Thanks Lewis. -----Original Message----- From: Ko, Lauren [mailto:Lau...@un...] Sent: 13 January 2011 15:28 To: Erik Hetzner; arc...@li... Subject: Re: [Archive-access-discuss] Wayback configuration error... Thanks for the reply. Adding that property didn't seem to fix it. I will keep looking at other changes and the code. Lauren ________________________________________ From: Erik Hetzner [eri...@uc...] Sent: Tuesday, January 11, 2011 5:23 PM To: arc...@li... Subject: Re: [Archive-access-discuss] Wayback configuration error... At Tue, 11 Jan 2011 17:05:00 -0600, Ko, Lauren wrote: > > Gérard, Thanks for the quick response. Removing the first_path > component did get things working, but in production I would like to > use the first_path to expose multiple collections as I am currently > doing with wayback-1.4.2. Hopefully I can find a different solution > for my configuration. Hi Lauren, I believe you need to set a urlRoot, e.g.: <bean name="37254:wayback" class="org.archive.wayback.webapp.AccessPoint"> <property name="urlRoot" value="http://example.org/wayback/"/> ... best, Erik ------------------------------------------------------------------------------ Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Natalia T. <nt...@ce...> - 2011-02-10 15:55:44
|
Hy all Some time ago we started to used Heritrix to crawl web sites and waybaks to display this crawls (using cdx indexes). We recently upgraded to version 1.6 of the wayback. Now we are testing dedup in Heritrix and how it affects when you display the crawl with Wayback. I had understood that when find a file that was not crawled because it was not modified betwen two crawls it goes to the nearest version to find the file. You need to configure some additional settings in waback to use this type of crawls? So far in out tests we found some different behaviours: a) in a second capture were recaptured all pages except the images and pdf b) a second capture has not caught anything because the page because there was no change In both cases when viewing the first catch everything is correct. But when viewing the second version in case a the images aren't displayed and in case b appears a message that has found a redirect to "-" and fails. Tahnk you very much for you help. Natalia |
From: Graham, L. <lg...@lo...> - 2011-02-05 11:37:31
|
Apologies, a small configuration fix in wayback that I overlooked--problem fixed. Laura Graham ________________________________________ From: Graham, Laura Sent: Friday, February 04, 2011 1:35 PM To: 'arc...@li...' Subject: Configuring a collection in wayback 1.4 over https We are having problems setting up a Wayback with user:pass access over https: if offsite user:pass, if onsite no user:pass. Both conditions over https. This is so external, offsite partners can view specialized crawls we're doing in our QR Wayback, which is on webarchiveqr.loc.gov. We're not implementing any authentication or access restrictions in the wayback app configuration itself and would actually like to avoid that. And, just noting in case I'm not clear, this issue is not that an archived site that itself was on https, but the wayback app configured on https. Setup: --Wayback 1.4 on Redhat linux. --Tomcat: we have an institutional rule that root has to own tomcat if tomcat is on port 80; but we need our shared user to own tomcat, so this workaround: ---Proxy rule telling the apache server that anything that goes to port 80 shall be redirected to the AJP port (8010). ----Tomcat server.xml connector: <!-- Define an AJP 1.3 Connector on port 8009 --> <Connector port="8010" maxHttpHeaderSize="8192" connectionTimeout="20000" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8843" protocol="AJP/1.3" /> --SSL server on same machine: set to "Satisfy Any", which presents a login box to external users but will allow internal users access based on their IP address. In both cases, all URLs will be forwarded to the tomcat application on port 8010. --wayback.xml: collection bean <bean name="80:lcwa0021" and <property name="replayURIPrefix" value="https://webarchiveqr.loc.gov/lcwa0021/"/> RESULTS: Onsite: Browser: 200 on https://webarchiveqr.loc.gov; 404 on https://webarchiveqr.loc.gov/lcwa0021/ To see what happened, tried http://webarchiveqr.loc.gov: 200 on http://webarchiveqr.loc.gov/lcwa0021 and 200 on http://webarchiveqr.loc.gov/lcwa0021/*/ackerman.house.gov/ but 404 on https://webarchiveqr.loc.gov/lcwa0021/20110102214924/http://ackerman.house.gov/ Offsite: User:Pass on https://webarchiveqr.loc.gov/ AOK Browser: 404 https://webarchiveqr.loc.gov/lcwa0021 So it seems that wayback 1.4, when configured on https, has a problem replaying archived web sites? Any thoughts on what we might have overlooked or what we might do to make this work? We'd like to avoid having to manage this in wayback authentication properties but would like to do it at the server level if possible. Thanks! Laura Graham Library of Congress Web Archive Team |
From: Graham, L. <lg...@lo...> - 2011-02-04 18:34:25
|
We are having problems setting up a Wayback with user:pass access over https: if offsite user:pass, if onsite no user:pass. Both conditions over https. This is so external, offsite partners can view specialized crawls we're doing in our QR Wayback, which is on webarchiveqr.loc.gov. We're not implementing any authentication or access restrictions in the wayback app configuration itself and would actually like to avoid that. And, just noting in case I'm not clear, this issue is not that an archived site that itself was on https, but the wayback app configured on https. Setup: --Wayback 1.4 on Redhat linux. --Tomcat: we have an institutional rule that root has to own tomcat if tomcat is on port 80; but we need our shared user to own tomcat, so this workaround: ---Proxy rule telling the apache server that anything that goes to port 80 shall be redirected to the AJP port (8010). ----Tomcat server.xml connector: <!-- Define an AJP 1.3 Connector on port 8009 --> <Connector port="8010" maxHttpHeaderSize="8192" connectionTimeout="20000" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8843" protocol="AJP/1.3" /> --SSL server on same machine: set to "Satisfy Any", which presents a login box to external users but will allow internal users access based on their IP address. In both cases, all URLs will be forwarded to the tomcat application on port 8010. --wayback.xml: collection bean <bean name="80:lcwa0021" and <property name="replayURIPrefix" value="https://webarchiveqr.loc.gov/lcwa0021/"/> RESULTS: Onsite: Browser: 200 on https://webarchiveqr.loc.gov; 404 on https://webarchiveqr.loc.gov/lcwa0021/ To see what happened, tried http://webarchiveqr.loc.gov: 200 on http://webarchiveqr.loc.gov/lcwa0021 and 200 on http://webarchiveqr.loc.gov/lcwa0021/*/ackerman.house.gov/ but 404 on https://webarchiveqr.loc.gov/lcwa0021/20110102214924/http://ackerman.house.gov/ Offsite: User:Pass on https://webarchiveqr.loc.gov/ AOK Browser: 404 https://webarchiveqr.loc.gov/lcwa0021 So it seems that wayback 1.4, when configured on https, has a problem replaying archived web sites? Any thoughts on what we might have overlooked or what we might do to make this work? We'd like to avoid having to manage this in wayback authentication properties but would like to do it at the server level if possible. Thanks! Laura Graham Library of Congress Web Archive Team |
From: Awakash B. <abo...@ac...> - 2011-02-04 17:52:00
|
Hey Brad, I've made this update to read from another location. After the restart, the settings seems good. But I'm still seeing a 404 error after I click on the 'Take Me Back' submit button. Here is the URL: http://ipaddress:8080/query?type=urlquery&url=http%3A%2F%2Fwww.mysite.co m&date=2009&Submit=Take+Me+Back Any suggestions on the issue? I think Tomcat still isn't reading the .warc and .warc.gz files (or the .manifest, .log files). Thanks, Awakash ________________________________ From: Bradley Tofel [mailto:br...@ar...] Sent: Friday, February 04, 2011 2:44 AM To: Awakash Bodiwala Cc: arc...@li...; Jennie Corman Subject: Re: [Archive-access-discuss] Instructions on running wayback and to unpack files Hi Awakash, You can change the basedir to whatever is simpler for your installation, in this case, likely /home/site/archivefiles/ or wherever they will show up - no need to move them to /tmp/wayback - that's just the default directory wayback uses in the default configuration. Let me know how this works for you, Brad On 2/2/11 11:55 PM, Awakash Bodiwala wrote: Hello Brad, Thanks for your reply - I did not receive your reply by e-mail but did find your answer from Lori. I've setup the wayback program under /webapps successfully. The next step is to figure out how to view the contents contained within the web archive files (.warc/.warc.gz). Here are my config changes I made within the wayback.xml file (I replaced the IP we are using with 'ipaddress'): wayback.basedir=/tmp/wayback wayback.urlprefix=http://ipaddress:8080/wayback-1.6.0/ The http://ipaddress:8080/wayback-1.6.0 can only be accessed if connected through a vpn - is it perhaps of this reason why Wayback cannot read the archived files? Another question is if the basedir should be pointed to the location of the web archive files (/home/site/archivefiles/) or if the web archive files should be brought into the /tmp/wayback directory? Thanks for you help again, Awakash ________________________________ From: Lori Donovan [mailto:lo...@ar...] Sent: Tuesday, February 01, 2011 10:13 AM To: Awakash Bodiwala Cc: Jennie Corman Subject: Re: Instructions on running wayback and to unpack files Hi Awakash, I'm not sure why the response wouldn't have come to your email, but it's posted in the arc...@li... list archives here: http://sourceforge.net/mailarchive/forum.php?thread_name=4D468A88.309090 2%40archive.org&forum_name=archive-access-discuss. Definitely send a follow up email if the answers are not sufficient or if you get stuck again. Best, Lori On Feb 1, 2011, at 6:15 AM, Awakash Bodiwala wrote: Hello Lori, Although I did send an e-mail to Brad at this e-mail address (arc...@li...), I did not receive a reply from him. Should I send him another e-mail? Regards, Awakash ________________________________ From: Lori Donovan [mailto:lo...@ar...] Sent: Monday, January 31, 2011 7:09 PM To: Awakash Bodiwala Cc: Jennie Corman Subject: Re: Instructions on running wayback and to unpack files Hi Awakash, I looked around to see if there is any more detailed or user-friendly instructions on how to deploy Wayback, and noticed that you had also sent your question to the archive-access listserv, and had gotten an answer there. Brad, who answered your question there, will be your best resource if you have any further questions. Best, Lori Lori Donovan Partner Specialist Internet Archive lo...@ar... (415) 561-6799 x 4 On Jan 27, 2011, at 6:20 AM, Awakash Bodiwala wrote: Hello Lori, I've setup Apache Tomcat 6, Wayback 1.6, and JDK 1.6 and now find the instructions to be a bit confusing. I have a collection of .warc.gz files in one location (filenames similar to ARCHIVEIT-1623-20091009194808-00000-crawling015.us.archive.org.warc.gz) while the wayback setup in another directory under /webapps; on the browser I see the wayback homepage screen. When I hit 'query' button, I get a 404 page: type Status report message /query description The requested resource (/query) is not available. I'm trying to follow instructions here: http://archive-access.sourceforge.net/projects/wayback/administrator_man ual.html, but am stuck at the second step. Here is what I have completed so far: 1 - I've edited the wayback.xml file, specifically wayback.basedir and wayback.urlprefix to be /tmp/wayback and the url for the wayback homepage respectively. 2 - How will Tomcat unpack .war files? Should I unzip first? Does .warc differ from .war? Is there a command to unpack these files? Any help on this would be great. Thanks in advance for the help. Regards, Awakash ------------------------------------------------------------------------ ------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Bradley T. <br...@ar...> - 2011-02-04 07:38:28
|
Hi Awakash, You can change the basedir to whatever is simpler for your installation, in this case, likely /home/site/archivefiles/ or wherever they will show up - no need to move them to /tmp/wayback - that's just the default directory wayback uses in the default configuration. Let me know how this works for you, Brad On 2/2/11 11:55 PM, Awakash Bodiwala wrote: > Hello Brad, > Thanks for your reply - I did not receive your reply by e-mail but did > find your answer from Lori. > I've setup the wayback program under /webapps successfully. The next > step is to figure out how to view the contents contained within the > web archive files (.warc/.warc.gz). Here are my config changes I made > within the wayback.xml file (I replaced the IP we are using with > 'ipaddress'): > wayback.basedir=/tmp/wayback > wayback.urlprefix=http://ipaddress:8080/wayback-1.6.0/ > The http://ipaddress:8080/wayback-1.6.0 can only be accessed if > connected through a vpn - is it perhaps of this reason why Wayback > cannot read the archived files? > Another question is if the basedir should be pointed to the location > of the web archive files (/home/site/archivefiles/) or if the web > archive files should be brought into the /tmp/wayback directory? > Thanks for you help again, > Awakash > > ------------------------------------------------------------------------ > *From:* Lori Donovan [mailto:lo...@ar...] > *Sent:* Tuesday, February 01, 2011 10:13 AM > *To:* Awakash Bodiwala > *Cc:* Jennie Corman > *Subject:* Re: Instructions on running wayback and to unpack files > > Hi Awakash, > > I'm not sure why the response wouldn't have come to your email, but > it's posted in the arc...@li... > <mailto:arc...@li...> list archives > here: > http://sourceforge.net/mailarchive/forum.php?thread_name=4D468A88.3090902%40archive.org&forum_name=archive-access-discuss > <http://sourceforge.net/mailarchive/forum.php?thread_name=4D468A88.3090902%40archive.org&forum_name=archive-access-discuss>. > Definitely send a follow up email if the answers are not sufficient or > if you get stuck again. > > Best, > Lori > > On Feb 1, 2011, at 6:15 AM, Awakash Bodiwala wrote: > >> Hello Lori, >> Although I did send an e-mail to Brad at this e-mail address >> (arc...@li... >> <mailto:arc...@li...>), I did not >> receive a reply from him. Should I send him another e-mail? >> Regards, >> Awakash >> >> ------------------------------------------------------------------------ >> *From:* Lori Donovan [mailto:lo...@ar...] >> *Sent:* Monday, January 31, 2011 7:09 PM >> *To:* Awakash Bodiwala >> *Cc:* Jennie Corman >> *Subject:* Re: Instructions on running wayback and to unpack files >> >> Hi Awakash, >> >> I looked around to see if there is any more detailed or user-friendly >> instructions on how to deploy Wayback, and noticed that you had also >> sent your question to the archive-access listserv, and had gotten an >> answer there. Brad, who answered your question there, will be your >> best resource if you have any further questions. >> >> Best, >> Lori >> >> Lori Donovan >> Partner Specialist >> Internet Archive >> lo...@ar... <mailto:lo...@ar...> >> (415) 561-6799 x 4 >> >> >> >> >> >> On Jan 27, 2011, at 6:20 AM, Awakash Bodiwala wrote: >> >>> >>> Hello Lori, >>> >>> I've setup Apache Tomcat 6, Wayback 1.6, and JDK 1.6 and now find >>> the instructions to be a bit confusing. I have a collection of >>> .warc.gz files in one location (filenames similar to >>> ARCHIVEIT-1623-20091009194808-00000-crawling015.us.archive.org.warc.gz) >>> while the wayback setup in another directory under /webapps; on the >>> browser I see the wayback homepage screen. When I hit 'query' >>> button, I get a 404 page: >>> >>> type Status report >>> message /query >>> description The requested resource (/query) is not available. >>> >>> I'm trying to follow instructions here: >>> http://archive-access.sourceforge.net/projects/wayback/administrator_manual.html, >>> but am stuck at the second step. Here is what I have completed so far: >>> 1 - I've edited the wayback.xml file, specifically wayback.basedir >>> and wayback.urlprefix to be /tmp/wayback and the url for the wayback >>> homepage respectively. >>> 2 - How will Tomcat unpack .war files? Should I unzip first? Does >>> .warc differ from .war? Is there a command to unpack these files? >>> Any help on this would be great. >>> >>> Thanks in advance for the help. >>> >>> Regards, >>> Awakash >>> >>> >> > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: <sar...@bn...> - 2011-02-02 16:55:27
|
Hello everyone, We are still using Wayback 1.4.1 and while trying to display a page that has the following headers: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr"> <head> <title></title> <meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8" /> and the following Mime Type application/xhtml+xml;charset=utf-8, we get the following error message: XML analysis error, badly structured on newText = "<"; Has anyone come across this problem and found a solution? Best, Sara Aubry Avant d'imprimer, pensez à l'environnement. |
From: Awakash B. <abo...@ac...> - 2011-02-02 16:55:13
|
Hello Brad, Thanks for your reply - I did not receive your reply by e-mail but did find your answer from Lori. I've setup the wayback program under /webapps successfully. The next step is to figure out how to view the contents contained within the web archive files (.warc/.warc.gz). Here are my config changes I made within the wayback.xml file (I replaced the IP we are using with 'ipaddress'): wayback.basedir=/tmp/wayback wayback.urlprefix=http://ipaddress:8080/wayback-1.6.0/ The http://ipaddress:8080/wayback-1.6.0 can only be accessed if connected through a vpn - is it perhaps of this reason why Wayback cannot read the archived files? Another question is if the basedir should be pointed to the location of the web archive files (/home/site/archivefiles/) or if the web archive files should be brought into the /tmp/wayback directory? Thanks for you help again, Awakash ________________________________ From: Lori Donovan [mailto:lo...@ar...] Sent: Tuesday, February 01, 2011 10:13 AM To: Awakash Bodiwala Cc: Jennie Corman Subject: Re: Instructions on running wayback and to unpack files Hi Awakash, I'm not sure why the response wouldn't have come to your email, but it's posted in the arc...@li... list archives here: http://sourceforge.net/mailarchive/forum.php?thread_name=4D468A88.309090 2%40archive.org&forum_name=archive-access-discuss. Definitely send a follow up email if the answers are not sufficient or if you get stuck again. Best, Lori On Feb 1, 2011, at 6:15 AM, Awakash Bodiwala wrote: Hello Lori, Although I did send an e-mail to Brad at this e-mail address (arc...@li...), I did not receive a reply from him. Should I send him another e-mail? Regards, Awakash ________________________________ From: Lori Donovan [mailto:lo...@ar...] Sent: Monday, January 31, 2011 7:09 PM To: Awakash Bodiwala Cc: Jennie Corman Subject: Re: Instructions on running wayback and to unpack files Hi Awakash, I looked around to see if there is any more detailed or user-friendly instructions on how to deploy Wayback, and noticed that you had also sent your question to the archive-access listserv, and had gotten an answer there. Brad, who answered your question there, will be your best resource if you have any further questions. Best, Lori Lori Donovan Partner Specialist Internet Archive lo...@ar... (415) 561-6799 x 4 On Jan 27, 2011, at 6:20 AM, Awakash Bodiwala wrote: Hello Lori, I've setup Apache Tomcat 6, Wayback 1.6, and JDK 1.6 and now find the instructions to be a bit confusing. I have a collection of .warc.gz files in one location (filenames similar to ARCHIVEIT-1623-20091009194808-00000-crawling015.us.archive.org.warc.gz) while the wayback setup in another directory under /webapps; on the browser I see the wayback homepage screen. When I hit 'query' button, I get a 404 page: type Status report message /query description The requested resource (/query) is not available. I'm trying to follow instructions here: http://archive-access.sourceforge.net/projects/wayback/administrator_man ual.html, but am stuck at the second step. Here is what I have completed so far: 1 - I've edited the wayback.xml file, specifically wayback.basedir and wayback.urlprefix to be /tmp/wayback and the url for the wayback homepage respectively. 2 - How will Tomcat unpack .war files? Should I unzip first? Does .warc differ from .war? Is there a command to unpack these files? Any help on this would be great. Thanks in advance for the help. Regards, Awakash |
From: Rolf J. K. <Rol...@nl...> - 2011-02-02 15:03:23
|
I will be out of the office starting 01/02/2011 and will not return until 10/02/2011. For urgent issues, please contact Mun Kew Leong, phone: 63337926 / email: Mun...@nl... or contact me at my private email address: r.k...@gm... |
From: <Mac...@nb...> - 2011-02-02 14:31:46
|
After no response at all I`ll give it another try: Hi all, we want to provide access to our webarchive in 4 different languages (german, english, french and italian) with Wayback 1.6.0. Can anybody provide the french and/or italian version of the WaybackUI.properties file? Your help is very much appreciated. Mac Mac Kobus Digitale Archivierung ¦ e-Helvetica Eidgenössisches Departement des Innern EDI Bundesamt für Kultur BAK Schweizerische Nationalbibliothek NB Hallwylstrasse 15, 3003 Bern tel +41 31 322 89 93 fax +41 31 322 84 63 mac...@nb... www.nb.admin.ch ¦ http://www.nb.admin.ch/e-helvetica |
From: Bradley T. <br...@ar...> - 2011-02-01 02:41:42
|
Hi Natalia, Looks like the problem is a combo of javascript in the page, and Wayback's new URL rewriting: Wayback now parses Javascript and attempts to replace any http://HOSTNAME/ Urls within with a version of the URL prefixed with the replay prefix. The site in question consists of a pair of 1-frame framesets, with the actual content nested two layers deep in the framesets. Each of the top two framesets have javascript which checks the hostname currently loaded, and forces the browser to reload the page from the "real" website if the content is not being served from there: ====================== <SCRIPT type='text/javascript'LANGUAGE="JAVASCRIPT"> <!-- // By Jordi Martínez Lobo ren...@re... // advocatstarragona if ((document.location.host != "www.advocatstarragona.com")) { document.location.href = "http://www.advocatstarragona.com"; } // --> </SCRIPT> ====================== Wayback is changing that to: ====================== <SCRIPT type='text/javascript'LANGUAGE="JAVASCRIPT"> <!-- // By Jordi Martínez Lobo ren...@re... // advocatstarragona if ((document.location.host != "www.advocatstarragona.com")) { document.location.href = "http://YOURWAYBACK-HOST:PORT/PATH/DATE/http://www.advocatstarragona.com"; } // --> </SCRIPT> ====================== Which has the effect of making the page notice it's not being loaded from "www.advocatstarragona.com", so it reloads, but the URL it reloads now includes your wayback info, so it goes into the loop. We hope in the near future to try to locate "document.location.href" assignments and handle them better. This is hard - the code could be related to normal site functionality, or could be an attempt like this to keep the content being served from the origin server only. Perhaps Wayback will end up with a way to configure specific site behavior to prevent these redirect loops on a case-by case basis, but there's no functionality to handle this at the moment. Of possible usefulness, some links in the innermost frame do not seem to contain the same javascript (I just glanced at it, didn't look too deep) so you may be able to access some of the archived site at: http://www.advocatstarragona.com/lameva/login.php But, some of the top-navigation links include *another* frameset. I'm guessing this may be unintentional on the authors part - if I understand what's happening on the site, a user navigating through the live website will end up getting an ever-growing set of nested framesets - one more frameset with each navigation link clicked.. Let me know if you have questions, suggestions, corrections, etc. Brad On 1/31/11 6:49 PM, Natalia Torres wrote: > Hi all > > We are testing the new version of Wayback, 1.6 on our website (we use > wayback 1.4.2 currently). In general we found that most crawls with > display problems with the previous version improves. However, some > crawls using redirects behave an strange behavour entering an endless > loop. An example http://www.advocatstarragona.com, that in the previous > version goes to "live site" but wayback 1.6 keeps refreshing the wayback > page. > > Exists any problem with redirects in this version? > > Thank you very much for your help > > Natalia > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Natalia T. <nt...@ce...> - 2011-01-31 11:49:45
|
Hi all We are testing the new version of Wayback, 1.6 on our website (we use wayback 1.4.2 currently). In general we found that most crawls with display problems with the previous version improves. However, some crawls using redirects behave an strange behavour entering an endless loop. An example http://www.advocatstarragona.com, that in the previous version goes to "live site" but wayback 1.6 keeps refreshing the wayback page. Exists any problem with redirects in this version? Thank you very much for your help Natalia |
From: Bradley T. <br...@ar...> - 2011-01-31 10:01:06
|
Hi Awakash, (by default) Tomcat will "unpack" .war files it finds under it's .../webapps/ directory. .war files are actually just .zip files, so really this means it "unzips" them using the .war file name to determine the webapp directory: placing "cool.war" in the webapps directory will result in: .../webapps/cool.war .../webapps/cool/ (Tomcat leaves the cool.war in place after unpacking it) The Wayback has a .war file which contains a file ./WEB-INF/wayback.xml. Placing the Wayback .war file in the webapps directory with the name "archive.war" will result in: .../webapps/archive/WEB-INF/wayback.xml (among many other files ; ) Once you've placed the .war file in the webapps directory, and waited for Tomcat to "unpack" it, you can edit wayback.xml (or other configuration files as needed). Be sure to restart Tomcat after editing these xml config files! .war files are different from .warc or .warc.gz files, which are web archives, containing metadata and transcript records of HTTP requests and responses. Let us know if you're still having problems! Brad On 1/28/11 9:12 PM, Awakash Bodiwala wrote: > > Hello, > > I've setup Apache Tomcat 6, Wayback 1.6, and JDK 1.6 and now find the > instructions to be a bit confusing. I have a collection of .warc.gz > files in one location (filenames similar to > ARCHIVEIT-1623-20091009194808-00000-crawling015.us.archive.org.warc.gz) while > the wayback setup in another directory under /webapps; on the browser > I see the wayback homepage screen. When I hit 'query' button, I get a > 404 page: > > type Status report > > message /query > > description The requested resource (/query) is not available. > > I'm trying to follow instructions here: > > _http://archive-access.sourceforge.net/projects/wayback/administrator_manual.html_, > but am stuck at the second step. Here is what I have completed so far: > > 1 - I've edited the wayback.xml file, specifically wayback.basedir and > wayback.urlprefix to be /tmp/wayback and the url for the wayback > homepage respectively. > > 2 - How will Tomcat unpack .war files? Should I unzip first? Does > .warc differ from .war? Is there a command to unpack these files? Any > help on this would be great. > > Thanks in advance for the help. > > Regards, > > Awakash > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: raffaele m. <raf...@at...> - 2011-01-29 10:08:29
|
On Jan 28, 2011, at 1:58 AM, Bradley Tofel wrote: > You should be able to alter the (currently cumbersome) URL prefix > settings to force Wayback to generate URLs for port 80, just omit :8080, > in all the URL prefix configurations. > > You do however, need to inform Wayback that requests will still be > received on port 8080 - so the bean name should still include 8080. The > "host" component of the AccessPoint bean name allows you do set up > virtual hosts, but with only one AccessPoint, you should be able to use > the name "8080". fine, it works. i was missing the AccessPoint bean name, now i realize that there is also this page explaining http://archive-access.sourceforge.net/projects/wayback/access_point_naming.html thank you -- raf...@at... |
From: Awakash B. <abo...@ac...> - 2011-01-28 14:24:51
|
Hello, I've setup Apache Tomcat 6, Wayback 1.6, and JDK 1.6 and now find the instructions to be a bit confusing. I have a collection of .warc.gz files in one location (filenames similar to ARCHIVEIT-1623-20091009194808-00000-crawling015.us.archive.org.warc.gz) while the wayback setup in another directory under /webapps; on the browser I see the wayback homepage screen. When I hit 'query' button, I get a 404 page: type Status report message /query description The requested resource (/query) is not available. I'm trying to follow instructions here: http://archive-access.sourceforge.net/projects/wayback/administrator_man ual.html <http://archive-access.sourceforge.net/projects/wayback/administrator_ma nual.html> , but am stuck at the second step. Here is what I have completed so far: 1 - I've edited the wayback.xml file, specifically wayback.basedir and wayback.urlprefix to be /tmp/wayback and the url for the wayback homepage respectively. 2 - How will Tomcat unpack .war files? Should I unzip first? Does .warc differ from .war? Is there a command to unpack these files? Any help on this would be great. Thanks in advance for the help. Regards, Awakash |
From: Bradley T. <br...@ar...> - 2011-01-28 00:58:12
|
Hi Raffaele, You should be able to alter the (currently cumbersome) URL prefix settings to force Wayback to generate URLs for port 80, just omit :8080, in all the URL prefix configurations. You do however, need to inform Wayback that requests will still be received on port 8080 - so the bean name should still include 8080. The "host" component of the AccessPoint bean name allows you do set up virtual hosts, but with only one AccessPoint, you should be able to use the name "8080". Let me know if you're still having problems, Brad On 1/27/11 1:52 AM, raffaele messuti wrote: > Hi, > i've a wayback (1.6.0), deployed on tomcat, in ROOT > working smoothly from http://mydomain.it:8080 > > in wayback.xml i got the following conf: > > wayback.urlprefix=http://mydomain.it:8080/ > .. > <bean name="mydomain.it:8080" class="org.archive.wayback.webapp.AccessPoint"> > > now i want to put tomcat behind a reverse proxy, just because i'm not so > confident with tomcat security. > i was thinking to use nginx, it's simple and fast > > this piece of conf should be enough to set up the proxy > https://gist.github.com/5b1bbc64ef20a0bf1844 > > but how i had to modify the AccessPoint bean to let the proxy to work? > > > i see that waybackmachine.org uses tomcat behind varnish, > is possible to look at your configuration? > > > ~ curl -I waybackmachine.org > Server: Apache-Coyote/1.1 > Via: 1.1 varnish > > > thank you, ciao. > > > > > -- > raf...@at... > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |