You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(5) |
Oct
(17) |
Nov
(30) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(4) |
Feb
(14) |
Mar
(8) |
Apr
(11) |
May
(2) |
Jun
(13) |
Jul
(9) |
Aug
(2) |
Sep
(2) |
Oct
(9) |
Nov
(20) |
Dec
(9) |
2007 |
Jan
(6) |
Feb
(4) |
Mar
(6) |
Apr
(7) |
May
(6) |
Jun
(6) |
Jul
(4) |
Aug
(3) |
Sep
(9) |
Oct
(26) |
Nov
(23) |
Dec
(2) |
2008 |
Jan
(17) |
Feb
(19) |
Mar
(16) |
Apr
(27) |
May
(3) |
Jun
(21) |
Jul
(21) |
Aug
(8) |
Sep
(13) |
Oct
(7) |
Nov
(8) |
Dec
(8) |
2009 |
Jan
(18) |
Feb
(14) |
Mar
(27) |
Apr
(14) |
May
(10) |
Jun
(14) |
Jul
(18) |
Aug
(30) |
Sep
(18) |
Oct
(12) |
Nov
(5) |
Dec
(26) |
2010 |
Jan
(27) |
Feb
(3) |
Mar
(8) |
Apr
(4) |
May
(6) |
Jun
(13) |
Jul
(25) |
Aug
(11) |
Sep
(2) |
Oct
(4) |
Nov
(7) |
Dec
(6) |
2011 |
Jan
(25) |
Feb
(17) |
Mar
(25) |
Apr
(23) |
May
(15) |
Jun
(12) |
Jul
(8) |
Aug
(13) |
Sep
(4) |
Oct
(17) |
Nov
(7) |
Dec
(6) |
2012 |
Jan
(4) |
Feb
(7) |
Mar
(1) |
Apr
(10) |
May
(11) |
Jun
(5) |
Jul
(7) |
Aug
(1) |
Sep
(1) |
Oct
(5) |
Nov
(6) |
Dec
(13) |
2013 |
Jan
(9) |
Feb
(7) |
Mar
(3) |
Apr
(1) |
May
(3) |
Jun
(19) |
Jul
(3) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2014 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(4) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2019 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: <Mac...@nb...> - 2011-08-02 11:57:04
|
Hi Robin, the workflow for reindexing the ARC-files did work for us, although our wayback-instance wasn't located in a /tmp/-folder either. We just replaced it with the actual location of our waybackmachine. Another thing that gave us a hard time with indexing the ARCs were the given rights. We work on different machines for Harvesting and Reviewing and after the filetransfer all the files had an 644 (what can not be indexed), so we set them to 664. Did you get it done? Best regards Mac Mac Kobus Digitale Archivierung ¦ e-Helvetica Eidgenössisches Departement des Innern EDI Bundesamt für Kultur BAK Schweizerische Nationalbibliothek NB Hallwylstrasse 15, 3003 Bern tel +41 31 322 89 93 fax +41 31 322 84 63 mac...@nb... www.nb.admin.ch ¦ http://www.nb.admin.ch/e-helvetica -----Ursprüngliche Nachricht----- Von: Robin Davis [mailto:rob...@gm...] Gesendet: Dienstag, 19. Juli 2011 17:38 An: arc...@li... Betreff: [Archive-access-discuss] Reindexing in Wayback Hello, I'm the web preservation intern at the Smithsonian Institution Archives. We've been using Heritrix and Wayback to crawl and view websites affiliated with the Institution. In general, it's been a success. We did run into a problem, however. We'd had Heritrix configured to write both ARC and WARC files, although we were only interested in WARCs. To make space, I deleted the ARC files. All the affected resources are now "temporarily unavailable" and won't display. The WARCs are all still there (uncompressed), as are their associated pointer files in /wayback/index-data/merged. "Searching all pages under" the domains displays all the documents with the correct dates, but they are all unavailable. I have tried to reindex three test files by --shutting down Tomcat --removing the three crawl job folders from /smithsonian-archive (where all of our crawl job folders live) --removing the associated files from /wayback/index-data/merged --restarting Tomcat --copying over the three crawl job folders again into /smithsonian-archive. New files corresponding to these are added automatically to index-data/merged... but the resource remains "temporarily unavailable." Clearing our browser's history and cache had no effect. As an experiment, I have also --shut down Tomcat --removed two crawl job folders from /smithsonian-archive and their pointer files in /wayback/index-data/merged --restarted Tomcat I thought the files would be gone completely - but the URLs with the correct dates still show up in searches. When these "ghost links" are clicked, the same error page appears: "temporarily unavailable." I've looked at the only related mailing list thread I could rustle up: http://sourceforge.net/mailarchive/message.php?msg_id=25800307 The problems Jerome Kowalczyk and Mac Kobus were having seem similar to ours. Brad suggested stopping Tomcat, removing the WARC files from the /tmp/wayback/ folder, typing the command find /tmp/wayback/ -type f -print0 | xargs -0 -r rm -fv, moving the files back, and restarting Tomcat... But the /tmp/wayback/ directory doesn't seem to exist on our machine (probably /smithsonian-archive for us?), and I'm also not sure what the command is supposed to do and so haven't tried tweaking it. What's in the way of getting our WARC files to display? How can we reindex and/or completely delete crawled sites from Wayback? Any insights are appreciated. For reference: we have around 700 WARCs total in our collection. We're using Wayback 1.6.0 on a Linux machine, set up by a contractor (support expired). All recent crawls were written as WARCs only, and they display without issue in Wayback. Best, Robin -- Robin Camille Davis Smithsonian Institution Archives Intern, Digital Services Division ------------------------------------------------------------------------------ Magic Quadrant for Content-Aware Data Loss Prevention Research study explores the data loss prevention market. Includes in-depth analysis on the changes within the DLP market, and the criteria used to evaluate the strengths and weaknesses of these DLP solutions. http://www.accelacomm.com/jaw/sfnl/114/51385063/ _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Coufal L. <Lib...@nk...> - 2011-07-25 09:00:05
|
Hi Roger, At the National Library of the Czech Republic, we encountered a similar problem. We used a heuristic which takes first N bytes and chooses a Czech character set with the biggest number of characters (wincp-1250, iso 8859-2, utf-8) with diacritics. However, your case might be different. The problem could be a difference between encoding specified in the HTTP header (Content-Type) and in the body of the document. It should be possible to set Wayback to ignore encoding in the document body and use encoding specified in the HTTP header. Best, Libor Libor Coufal =================================== Dept. of web archiving National Library of the Czech Republic Klementinum 190, 110 01 Praha 1 T: ?221663-256 F: ?221663-301 lib...@nk... www.webarchiv.cz www.nkp.cz =================================== ________________________________ From: Coram, Roger [mailto:Rog...@bl...] Sent: Wednesday, July 20, 2011 12:27 PM To: arc...@li... Subject: [Archive-access-discuss] Encoding Issue We appear to have found a problem with the replay of some site, an example of which is here: http://www.webarchive.org.uk/wayback/archive/20110604080034/http://www.chilton-computing.org.uk/<http://www.webarchive.org.uk/wayback/archive/20110604080034/http:/www.chilton-computing.org.uk/> After doing some digging it seems that a UTF-16 BOM is added to the response - the culprit appears to be this line in the original site which (incorrectly, I'm guessing) specifies the encoding: <meta http-equiv="Content-type" content="text/html;charset=utf-16"> As a test, if we switch the ArchivalUrlReplay.xml to exclusively use the 'identityreplayrenderer' then this doesn't happen. Presumably when Wayback needs to amend the response it sets the encoding which results in the above. Has anyone else seem anything similar? Or know how to prevent it? Thanks, Roger G. Coram Web Archiving Engineer The British Library T: +44 (0)1937 546607 F: +44 (0)1937 546872 E: rog...@bl...<mailto:rog...@bl...> |
From: Thakur, P. <Pra...@on...> - 2011-07-20 15:01:53
|
Hello Everyone, I am having a need to crawl and index some pages from the Nunavut province. Their site contains pages in Aboriginal Languages like Mikmac and Inuit. For displaying purpose I can download the font, but searching, I don't know how to do it. Has anyone dealt with this kind of languages before? If so I need some guidance. An example website would be http://www.assembly.nu.ca/ius Thanks, --Pramila Thakur ________________________________ From: Coram, Roger [mailto:Rog...@bl...] Sent: Wednesday, July 20, 2011 6:27 AM To: arc...@li... Subject: [Archive-access-discuss] Encoding Issue We appear to have found a problem with the replay of some site, an example of which is here: http://www.webarchive.org.uk/wayback/archive/20110604080034/http://www.chilton-computing.org.uk/<http://www.webarchive.org.uk/wayback/archive/20110604080034/http:/www.chilton-computing.org.uk/> After doing some digging it seems that a UTF-16 BOM is added to the response - the culprit appears to be this line in the original site which (incorrectly, I'm guessing) specifies the encoding: <meta http-equiv="Content-type" content="text/html;charset=utf-16"> As a test, if we switch the ArchivalUrlReplay.xml to exclusively use the 'identityreplayrenderer' then this doesn't happen. Presumably when Wayback needs to amend the response it sets the encoding which results in the above. Has anyone else seem anything similar? Or know how to prevent it? Thanks, Roger G. Coram Web Archiving Engineer The British Library T: +44 (0)1937 546607 F: +44 (0)1937 546872 E: rog...@bl...<mailto:rog...@bl...> |
From: Coram, R. <Rog...@bl...> - 2011-07-20 10:39:39
|
We appear to have found a problem with the replay of some site, an example of which is here: http://www.webarchive.org.uk/wayback/archive/20110604080034/http://www.c hilton-computing.org.uk/ <http://www.webarchive.org.uk/wayback/archive/20110604080034/http:/www.c hilton-computing.org.uk/> After doing some digging it seems that a UTF-16 BOM is added to the response - the culprit appears to be this line in the original site which (incorrectly, I'm guessing) specifies the encoding: <meta http-equiv="Content-type" content="text/html;charset=utf-16"> As a test, if we switch the ArchivalUrlReplay.xml to exclusively use the 'identityreplayrenderer' then this doesn't happen. Presumably when Wayback needs to amend the response it sets the encoding which results in the above. Has anyone else seem anything similar? Or know how to prevent it? Thanks, Roger G. Coram Web Archiving Engineer The British Library T: +44 (0)1937 546607 F: +44 (0)1937 546872 E: rog...@bl... <mailto:rog...@bl...> |
From: Robin D. <rob...@gm...> - 2011-07-19 15:38:02
|
Hello, I’m the web preservation intern at the Smithsonian Institution Archives. We’ve been using Heritrix and Wayback to crawl and view websites affiliated with the Institution. In general, it's been a success. We did run into a problem, however. We'd had Heritrix configured to write both ARC and WARC files, although we were only interested in WARCs. To make space, I deleted the ARC files. All the affected resources are now "temporarily unavailable" and won't display. The WARCs are all still there (uncompressed), as are their associated pointer files in /wayback/index-data/merged. "Searching all pages under" the domains displays all the documents with the correct dates, but they are all unavailable. I have tried to reindex three test files by --shutting down Tomcat --removing the three crawl job folders from /smithsonian-archive (where all of our crawl job folders live) --removing the associated files from /wayback/index-data/merged --restarting Tomcat --copying over the three crawl job folders again into /smithsonian-archive. New files corresponding to these are added automatically to index-data/merged... but the resource remains "temporarily unavailable." Clearing our browser's history and cache had no effect. As an experiment, I have also --shut down Tomcat --removed two crawl job folders from /smithsonian-archive and their pointer files in /wayback/index-data/merged --restarted Tomcat I thought the files would be gone completely — but the URLs with the correct dates still show up in searches. When these "ghost links" are clicked, the same error page appears: "temporarily unavailable." I've looked at the only related mailing list thread I could rustle up: http://sourceforge.net/mailarchive/message.php?msg_id=25800307 The problems Jerome Kowalczyk and Mac Kobus were having seem similar to ours. Brad suggested stopping Tomcat, removing the WARC files from the /tmp/wayback/ folder, typing the command find /tmp/wayback/ -type f -print0 | xargs -0 -r rm -fv, moving the files back, and restarting Tomcat... But the /tmp/wayback/ directory doesn't seem to exist on our machine (probably /smithsonian-archive for us?), and I'm also not sure what the command is supposed to do and so haven't tried tweaking it. What's in the way of getting our WARC files to display? How can we reindex and/or completely delete crawled sites from Wayback? Any insights are appreciated. For reference: we have around 700 WARCs total in our collection. We're using Wayback 1.6.0 on a Linux machine, set up by a contractor (support expired). All recent crawls were written as WARCs only, and they display without issue in Wayback. Best, Robin -- Robin Camille Davis Smithsonian Institution Archives Intern, Digital Services Division |
From: Gordon M. <go...@ar...> - 2011-07-14 18:39:47
|
Best to ask Heritrix-specific questions on its project list: http://tech.groups.yahoo.com/group/archive-crawler/ But also, a typical pattern for a focused crawl is for it to collect URIs rapidly when there are many different sites to contact. But later, once all URIs from smaller and fast/responsive sites have been collected, only those sites that are large, slow, and/or unresponsive are left. The rate of URI collection thus drops to what can be requested politely from those sites. Also, when a site doesn't respond at all, a 'long retry snooze' (default 15 minutes) occurs before trying that host again, so that all the configured retries (default 30) aren't used up rapidly due to a transient server/network problem. More info is available at the FAQ: https://webarchive.jira.com/wiki/display/Heritrix/unexpectedly+slow+crawling+on+idle+crawler - Gordon @ IA On 7/14/11 6:56 AM, Thakur, Pramila wrote: > Hi Everyone, > > Most of the time when I crawl a site, after some time it snoozes few > urls and the crawling process is like hanging, not terminated just > going on without any activity. > > Has any one of you faced this situation? Is there a work around it > that can solve this issue? > > Thanks, > > --Pramila Thakur > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------------ > > AppSumo Presents a FREE Video for the SourceForge Community by Eric > Ries, the creator of the Lean Startup Methodology on "Lean Startup > Secrets Revealed." This video shows you how to validate your ideas, > optimize your ideas and identify your business strategy. > http://p.sf.net/sfu/appsumosfdev2dev > > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Thakur, P. <Pra...@on...> - 2011-07-14 13:56:31
|
Hi Everyone, Most of the time when I crawl a site, after some time it snoozes few urls and the crawling process is like hanging, not terminated just going on without any activity. Has any one of you faced this situation? Is there a work around it that can solve this issue? Thanks, --Pramila Thakur ________________________________ |
From: Lyudmila L. B. <lu...@la...> - 2011-07-13 21:26:43
|
I downloaded new wayback source today from IA svn . I have problem to compile it. If I try compile with mvn install I got checksum errors. [ERROR] Failed to execute goal on project wayback-core: Could not resolve dependencies for project org.archive.wayback:wayback-core:jar:1.7.0: Failed to collect dependencies for [junit:junit:jar:3.8.1 (test), org.apache.geronimo.specs:geronimo-servlet_2.5_spec:jar:1.2 (provided), org.archive.heritrix:heritrix-commons:jar:3.1.0-SNAPSHOT (compile), org.archive.access-control:access-control:jar:0.0.1-SNAPSHOT (compile), org.mozilla:juniversalchardet:jar:1.0.3 (compile), org.springframework:spring-core:jar:2.5.1 (compile), org.springframework:spring-beans:jar:2.5.1 (compile), org.beanshell:bsh:jar:2.0b4 (compile), org.htmlparser:htmlparser:jar:1.6 (compile), com.flagstone:transform:jar:3.0.2 (compile), org.apache.hadoop:hadoop-core:jar:0.20.2 (compile)]: Failed to read artifact descriptor for org.archive.heritrix:heritrix-commons:jar:3.1.0-SNAPSHOT: Could not transfer artifact org.archive.heritrix:heritrix-commons:pom:3.1.0-SNAPSHOT from/to internetarchive (http://builds.archive.org:8080/maven2): Checksum validation failed, no checksums available from the repository -> [Help 1] I changed pom.xml to ignore checksum errors and compiled again, got this error: [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /Users/ludab/projects/wayback1.7/wayback-core/src/main/java/org/archive/wayback/replay/swf/SWFReplayRenderer.java:[49,-1] cannot access com.flagstone.transform.DoAction bad class file: /Users/ludab/.m2/repository/com/flagstone/transform/3.0.2/transform-3.0.2.jar(com/flagstone/transform/DoAction.class) class file has wrong version 50.0, should be 49.0 [INFO] 1 error [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Wayback ........................................... SUCCESS [0.630s] [INFO] Wayback Core Java Classes ......................... FAILURE [5:19.089s] How can I resolve this issue? thank you, Lyudmila |
From: Colin R. <cs...@st...> - 2011-07-04 09:55:16
|
Hi, Does anyone know how to browse harvested https:// sites in wayback proxy mode? -- Colin Rosenthal IT Developer State and University Library Aarhus |
From: Michael N. <mik...@gm...> - 2011-06-28 04:01:53
|
Not sure if I replied correctly, but I got it working. Bradley Tofel pointed me to this link about Access Point Naming: http://archive-access.**sourceforge.net/projects/** wayback/access_point_naming.**html<http://archive-access.sourceforge.net/projects/wayback/access_point_naming.html> My main mistake was playing with the settings in the ComplexAccessPoint.xml file when the change needed to be made in wayback.xml Final setup is having the webapp as ROOT and bean name as 8080. It's all working now, brings a tear to my eye. On Mon, Jun 27, 2011 at 4:01 PM, Michael Nichols <mik...@gm...>wrote: > I am going nuts trying to get wayback to work. I have an instance on tomcat > 6 running one openSUSE, I installed the wayback webapp, everything seems > fine but when I click "Take me back" i get a 404 page saying *The > requested resource (/query) is not available. > > *I have poured over the config files and cannot figure out what is wrong. > My wayback.urlprefix is set correctly. One thing I notice is that if I do > not install this webapp as ROOT, images and other resources do not load, as > they point to the wrong location. At a glance in the templates it looks like > results.getStaticPrefix(); is not returning the correct prefix from the > config file. Also if i do not install as ROOT when I click the wayback > button it goes to localhost:8080/query?... rather than > localhost:8080/wayback/query?.... > > > My main problem is the 404 when you click "Take my back!", has anyone > figured this out? I can see an email here from 2010 that was not resolved : > http://sourceforge.net/mailarchive/forum.php?thread_name=4CD0311A.6080308%40cs.stanford.edu&forum_name=archive-access-discuss > > > I appreciate the help, > |
From: Bradley T. <br...@ar...> - 2011-06-28 01:03:16
|
FTP and HTTP are both protocols for accessing content over TCP/IP connections. Access over HTTP should entail no additional network hops/routing compared to FTP. If there are (highly) unusual routing policies for HTTP on your network, it could conceivably incur more network latency, but I'd check with your network administrators to see how your network is set up. Brad On 6/28/11 3:23 AM, Thakur, Pramila wrote: > > Hi Brad, > > Currently we use a shared drive, which works. > > But trying FTP was one of the option, because everyone has access to > shared drive. > > Using http, I guess will go through internet and will slow down the > retrieval process, that's what I think . > > Thanks, > > --Pramila Thakur > > ------------------------------------------------------------------------ > > *From:*Bradley Tofel [mailto:br...@ar...] > *Sent:* Monday, June 27, 2011 4:29 AM > *To:* arc...@li... > *Subject:* Re: [Archive-access-discuss] FTP URL in base dir > > Hi Pramila, > > While FTP does support the "REST" command, and there's even discussion > of a new "RANG" command, Wayback does not currently support FTP access > to a ResourceStore. > > I'm not familiar with an Java FTP APIs to comment on the difficulty of > creating such an implementation. > > Are you unable to use HTTP? > > Brad > > On 6/17/11 12:32 AM, Thakur, Pramila wrote: > > Hi everyone, > > I was wondering if an ftp url can be in wayback.xml? > > I am having issues with it. I am on windows XP. > > The error is *C:\Program Files\Apache Software Foundation\Tomcat > 6.0\ftp:\user:pass@188.44.99.300\repository\waybacktest\election2011Test > <ftp://user:pass@188.44.99.300/repository/waybacktest/election2011Test> *is > not a directory > > My configuration is wayback.xml is > > *wayback.basedir=ftp://user:pass@188.44.99.300/repository/waybacktest* > > Can someone please give me some ideas? > > Thanks, > > --Pramila Thakur > > ------------------------------------------------------------------------ > > *From:*Bradley Tofel [mailto:br...@ar...] > *Sent:* Tuesday, June 07, 2011 4:08 PM > *To:* arc...@li... > <mailto:arc...@li...> > *Subject:* Re: [Archive-access-discuss] Absolute path and relative > path solution > > Hi Pramila, > > I didn't yet have a chance to respond to your previous post about > javascript menus. Is this the situation you're facing? > > Can you provide some example URLs, or HTML + JS snippets to help > clarify the situation? > > Some background - there are 3 types of URLs: > > 1) absolute (ex, "http://www.example.org/images/foo.gif" > <http://www.example.org/images/foo.gif>) > 2) server relative (ex, "/images/foo.gif") > 3) path relative (ex, "../images/foo.gif", "images/foo.gif") > > Wayback attempts to rewrite all URLs found in HTML. It attempts to > rewrite absolute URLs found in javascript and CSS. > > This leaves server-relative and path-relative URLs. > > Path relative URLs should require no rewriting (unless they include > ../../../../ references which back up beyond the actual server root...): > > Page: > http://wayback.example.com/20010101000000/http://www.example.org/ > <http://wayback.example.com/20010101000000/http:/www.example.org/> > Containing: "images/foo.gif" > Resolves to > "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif" > <http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> > > which is correct. > > Server-relative links do not go to the right place: > Page: > http://wayback.example.com/20010101000000/http://www.example.org/ > <http://wayback.example.com/20010101000000/http:/www.example.org/> > Containing: "/images/foo.gif" > Resolves to "http://wayback.example.com/images/foo.gif" > <http://wayback.example.com/images/foo.gif> > > which is *not* correct. > > Wayback does include a server-relative redirection handler. This > notices incoming URLs like: > > "/images/foo.gif" > > and uses the "Referer" HTTP request header to correctly resolve the > URL, redirecting the request back to the correct: > > "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif" > <http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> > > Note that this requires some specific configuration choices, most > importantly, Wayback must be deployed at the ROOT context. > > Unless Wayback is run at the ROOT context, it will not have an > opportunity to handle the stray server-relative requests to bounce > them back on track. > > I'm guessing this is the issue you're seeing, but I might be > misinterpreting your question. > > Brad > > On 6/7/11 12:09 PM, Pra...@on... > <mailto:Pra...@on...> wrote: > > Hi Everyone, > > I am facing an issue with links that have absolute and relative URL's > in their href. > > If a site has absolute path in the links, then wayback is replaying it > fine. > > But if the links have the URL's that are relative to the document, > then wayback does not seem to replay it correctly. > > Has anyone faced the same situation? > > Is there a workaround for this to replay it correctly in wayback? > > Thanks, > > --Pramila > > > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... <mailto:Arc...@li...> > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... <mailto:Arc...@li...> > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Thakur, P. <Pra...@on...> - 2011-06-27 20:23:31
|
Hi Brad, Currently we use a shared drive, which works. But trying FTP was one of the option, because everyone has access to shared drive. Using http, I guess will go through internet and will slow down the retrieval process, that's what I think . Thanks, --Pramila Thakur ________________________________ From: Bradley Tofel [mailto:br...@ar...] Sent: Monday, June 27, 2011 4:29 AM To: arc...@li... Subject: Re: [Archive-access-discuss] FTP URL in base dir Hi Pramila, While FTP does support the "REST" command, and there's even discussion of a new "RANG" command, Wayback does not currently support FTP access to a ResourceStore. I'm not familiar with an Java FTP APIs to comment on the difficulty of creating such an implementation. Are you unable to use HTTP? Brad On 6/17/11 12:32 AM, Thakur, Pramila wrote: Hi everyone, I was wondering if an ftp url can be in wayback.xml? I am having issues with it. I am on windows XP. The error is C:\Program Files\Apache Software Foundation\Tomcat 6.0\ftp:\user:pass@188.44.99.300\repository\waybacktest\election2011Test<ftp://user:pass@188.44.99.300/repository/waybacktest/election2011Test> is not a directory My configuration is wayback.xml is wayback.basedir=ftp://user:pass@188.44.99.300/repository/waybacktest Can someone please give me some ideas? Thanks, --Pramila Thakur ________________________________ From: Bradley Tofel [mailto:br...@ar...] Sent: Tuesday, June 07, 2011 4:08 PM To: arc...@li...<mailto:arc...@li...> Subject: Re: [Archive-access-discuss] Absolute path and relative path solution Hi Pramila, I didn't yet have a chance to respond to your previous post about javascript menus. Is this the situation you're facing? Can you provide some example URLs, or HTML + JS snippets to help clarify the situation? Some background - there are 3 types of URLs: 1) absolute (ex, "http://www.example.org/images/foo.gif"<http://www.example.org/images/foo.gif>) 2) server relative (ex, "/images/foo.gif") 3) path relative (ex, "../images/foo.gif", "images/foo.gif") Wayback attempts to rewrite all URLs found in HTML. It attempts to rewrite absolute URLs found in javascript and CSS. This leaves server-relative and path-relative URLs. Path relative URLs should require no rewriting (unless they include ../../../../ references which back up beyond the actual server root...): Page: http://wayback.example.com/20010101000000/http://www.example.org/<http://wayback.example.com/20010101000000/http:/www.example.org/> Containing: "images/foo.gif" Resolves to "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif"<http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> which is correct. Server-relative links do not go to the right place: Page: http://wayback.example.com/20010101000000/http://www.example.org/<http://wayback.example.com/20010101000000/http:/www.example.org/> Containing: "/images/foo.gif" Resolves to "http://wayback.example.com/images/foo.gif"<http://wayback.example.com/images/foo.gif> which is *not* correct. Wayback does include a server-relative redirection handler. This notices incoming URLs like: "/images/foo.gif" and uses the "Referer" HTTP request header to correctly resolve the URL, redirecting the request back to the correct: "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif"<http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> Note that this requires some specific configuration choices, most importantly, Wayback must be deployed at the ROOT context. Unless Wayback is run at the ROOT context, it will not have an opportunity to handle the stray server-relative requests to bounce them back on track. I'm guessing this is the issue you're seeing, but I might be misinterpreting your question. Brad On 6/7/11 12:09 PM, Pra...@on...<mailto:Pra...@on...> wrote: Hi Everyone, I am facing an issue with links that have absolute and relative URL's in their href. If a site has absolute path in the links, then wayback is replaying it fine. But if the links have the URL's that are relative to the document, then wayback does not seem to replay it correctly. Has anyone faced the same situation? Is there a workaround for this to replay it correctly in wayback? Thanks, --Pramila ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Archive-access-discuss mailing list Arc...@li...<mailto:Arc...@li...> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Archive-access-discuss mailing list Arc...@li...<mailto:Arc...@li...> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Michael N. <mik...@gm...> - 2011-06-27 20:01:16
|
I am going nuts trying to get wayback to work. I have an instance on tomcat 6 running one openSUSE, I installed the wayback webapp, everything seems fine but when I click "Take me back" i get a 404 page saying *The requested resource (/query) is not available. *I have poured over the config files and cannot figure out what is wrong. My wayback.urlprefix is set correctly. One thing I notice is that if I do not install this webapp as ROOT, images and other resources do not load, as they point to the wrong location. At a glance in the templates it looks like results.getStaticPrefix(); is not returning the correct prefix from the config file. Also if i do not install as ROOT when I click the wayback button it goes to localhost:8080/query?... rather than localhost:8080/wayback/query?.... My main problem is the 404 when you click "Take my back!", has anyone figured this out? I can see an email here from 2010 that was not resolved : http://sourceforge.net/mailarchive/forum.php?thread_name=4CD0311A.6080308%40cs.stanford.edu&forum_name=archive-access-discuss I appreciate the help, |
From: Bradley T. <br...@ar...> - 2011-06-27 08:27:55
|
Hi Pramila, While FTP does support the "REST" command, and there's even discussion of a new "RANG" command, Wayback does not currently support FTP access to a ResourceStore. I'm not familiar with an Java FTP APIs to comment on the difficulty of creating such an implementation. Are you unable to use HTTP? Brad On 6/17/11 12:32 AM, Thakur, Pramila wrote: > > Hi everyone, > > I was wondering if an ftp url can be in wayback.xml? > > I am having issues with it. I am on windows XP. > > The error is *C:\Program Files\Apache Software Foundation\Tomcat > 6.0\ftp:\user:pass@188.44.99.300\repository\waybacktest\election2011Test > *is not a directory > > My configuration is wayback.xml is > > *wayback.basedir=ftp://user:pass@188.44.99.300/repository/waybacktest* > > Can someone please give me some ideas? > > Thanks, > > --Pramila Thakur > > ------------------------------------------------------------------------ > > *From:*Bradley Tofel [mailto:br...@ar...] > *Sent:* Tuesday, June 07, 2011 4:08 PM > *To:* arc...@li... > *Subject:* Re: [Archive-access-discuss] Absolute path and relative > path solution > > Hi Pramila, > > I didn't yet have a chance to respond to your previous post about > javascript menus. Is this the situation you're facing? > > Can you provide some example URLs, or HTML + JS snippets to help > clarify the situation? > > Some background - there are 3 types of URLs: > > 1) absolute (ex, "http://www.example.org/images/foo.gif" > <http://www.example.org/images/foo.gif>) > 2) server relative (ex, "/images/foo.gif") > 3) path relative (ex, "../images/foo.gif", "images/foo.gif") > > Wayback attempts to rewrite all URLs found in HTML. It attempts to > rewrite absolute URLs found in javascript and CSS. > > This leaves server-relative and path-relative URLs. > > Path relative URLs should require no rewriting (unless they include > ../../../../ references which back up beyond the actual server root...): > > Page: > http://wayback.example.com/20010101000000/http://www.example.org/ > <http://wayback.example.com/20010101000000/http:/www.example.org/> > Containing: "images/foo.gif" > Resolves to > "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif" > <http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> > > which is correct. > > Server-relative links do not go to the right place: > Page: > http://wayback.example.com/20010101000000/http://www.example.org/ > <http://wayback.example.com/20010101000000/http:/www.example.org/> > Containing: "/images/foo.gif" > Resolves to "http://wayback.example.com/images/foo.gif" > <http://wayback.example.com/images/foo.gif> > > which is *not* correct. > > Wayback does include a server-relative redirection handler. This > notices incoming URLs like: > > "/images/foo.gif" > > and uses the "Referer" HTTP request header to correctly resolve the > URL, redirecting the request back to the correct: > > "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif" > <http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> > > Note that this requires some specific configuration choices, most > importantly, Wayback must be deployed at the ROOT context. > > Unless Wayback is run at the ROOT context, it will not have an > opportunity to handle the stray server-relative requests to bounce > them back on track. > > I'm guessing this is the issue you're seeing, but I might be > misinterpreting your question. > > Brad > > On 6/7/11 12:09 PM, Pra...@on... > <mailto:Pra...@on...> wrote: > > Hi Everyone, > > I am facing an issue with links that have absolute and relative URL's > in their href. > > If a site has absolute path in the links, then wayback is replaying it > fine. > > But if the links have the URL's that are relative to the document, > then wayback does not seem to replay it correctly. > > Has anyone faced the same situation? > > Is there a workaround for this to replay it correctly in wayback? > > Thanks, > > --Pramila > > > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... <mailto:Arc...@li...> > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Bradley T. <br...@ar...> - 2011-06-27 06:44:48
|
We're currently looking into some options for this, using a combination of: 1) generic server side Javascript rewriting 2) site-host-path-date specific custom rules 3) additional client side javascript insertion The idea is that we're trying to replace all occurrences of things like "window.location.replace()" with "Wayback_window_location_replace()". For most sites, we insert a new javascript function into the page, called Wayback_window_location_replace() which inspects the arguments, ensures they have been rewritten correctly as archival URLs (prepending the correct archival url prefix if not) and then calls the real window.location.replace(). For a small number of configured sites, we'll insert a specialized javascript implementation of "Wayback_window_location_replace". In the case of twitter.com, it will be a no-op - leaving the browser on the current page (/BARACKOBAMA). This is highly experimental, but we have had some preliminary successes with this combination, and hope to be starting production testing in the next few weeks. We also hope to use this framework to assist in replay of other popular and problematic sites. I'll keep the list posted with the progress. Brad On 6/17/11 6:00 PM, Gerard Suades i Méndez wrote: > Hi all, > > Wayback redirects queries on new twitter's accounts URL based on "#!" > to twitter.com domain. > > Example: > http://wayback.archive.org/web/*/http://twitter.com/BARACKOBAMA > > vs > > http://wayback.archive.org/web/*/http://twitter.com/#!/BARACKOBAMA > <http://wayback.archive.org/web/*/http://twitter.com/#%21/BARACKOBAMA> > > Any idea on how this could be handled? > > Best regards, > -- Gerard > ........................................................................... > __ > / / Gerard Suades Méndez > C E / S / C A Departament d'Aplicacions i Projectes > /_/ Centre de Serveis Científics i Acadèmics de Catalunya > > Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona > T. 93 551 62 20 · F. 93 205 6979 ·gs...@ce... > ........................................................................... > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Gerard S. i M. <gs...@ce...> - 2011-06-17 11:00:36
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> </head> <body bgcolor="#ffffff" text="#000000"> Hi all,<br> <br> Wayback redirects queries on new twitter's accounts URL based on "#!" to twitter.com domain.<br> <br> Example: <br> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <a href="http://wayback.archive.org/web/*/http://twitter.com/BARACKOBAMA">http://wayback.archive.org/web/*/http://twitter.com/BARACKOBAMA</a><br> <br> vs<br> <br> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <a href="http://wayback.archive.org/web/*/http://twitter.com/#%21/BARACKOBAMA">http://wayback.archive.org/web/*/http://twitter.com/#!/BARACKOBAMA</a><br> <br> Any idea on how this could be handled?<br> <br> Best regards,<br> <pre class="moz-signature" cols="72">-- Gerard ........................................................................... __ / / Gerard Suades Méndez C E / S / C A Departament d'Aplicacions i Projectes /_/ Centre de Serveis Científics i Acadèmics de Catalunya Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona T. 93 551 62 20 · F. 93 205 6979 · <a class="moz-txt-link-abbreviated" href="mailto:gs...@ce...">gs...@ce...</a> ........................................................................... </pre> </body> </html> |
From: Thakur, P. <Pra...@on...> - 2011-06-16 17:33:01
|
Hi everyone, I was wondering if an ftp url can be in wayback.xml? I am having issues with it. I am on windows XP. The error is C:\Program Files\Apache Software Foundation\Tomcat 6.0\ftp:\user:pass@188.44.99.300\repository\waybacktest\election2011Test is not a directory My configuration is wayback.xml is wayback.basedir=ftp://user:pass@188.44.99.300/repository/waybacktest Can someone please give me some ideas? Thanks, --Pramila Thakur ________________________________ From: Bradley Tofel [mailto:br...@ar...] Sent: Tuesday, June 07, 2011 4:08 PM To: arc...@li... Subject: Re: [Archive-access-discuss] Absolute path and relative path solution Hi Pramila, I didn't yet have a chance to respond to your previous post about javascript menus. Is this the situation you're facing? Can you provide some example URLs, or HTML + JS snippets to help clarify the situation? Some background - there are 3 types of URLs: 1) absolute (ex, "http://www.example.org/images/foo.gif"<http://www.example.org/images/foo.gif>) 2) server relative (ex, "/images/foo.gif") 3) path relative (ex, "../images/foo.gif", "images/foo.gif") Wayback attempts to rewrite all URLs found in HTML. It attempts to rewrite absolute URLs found in javascript and CSS. This leaves server-relative and path-relative URLs. Path relative URLs should require no rewriting (unless they include ../../../../ references which back up beyond the actual server root...): Page: http://wayback.example.com/20010101000000/http://www.example.org/<http://wayback.example.com/20010101000000/http:/www.example.org/> Containing: "images/foo.gif" Resolves to "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif"<http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> which is correct. Server-relative links do not go to the right place: Page: http://wayback.example.com/20010101000000/http://www.example.org/<http://wayback.example.com/20010101000000/http:/www.example.org/> Containing: "/images/foo.gif" Resolves to "http://wayback.example.com/images/foo.gif"<http://wayback.example.com/images/foo.gif> which is *not* correct. Wayback does include a server-relative redirection handler. This notices incoming URLs like: "/images/foo.gif" and uses the "Referer" HTTP request header to correctly resolve the URL, redirecting the request back to the correct: "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif"<http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> Note that this requires some specific configuration choices, most importantly, Wayback must be deployed at the ROOT context. Unless Wayback is run at the ROOT context, it will not have an opportunity to handle the stray server-relative requests to bounce them back on track. I'm guessing this is the issue you're seeing, but I might be misinterpreting your question. Brad On 6/7/11 12:09 PM, Pra...@on...<mailto:Pra...@on...> wrote: Hi Everyone, I am facing an issue with links that have absolute and relative URL's in their href. If a site has absolute path in the links, then wayback is replaying it fine. But if the links have the URL's that are relative to the document, then wayback does not seem to replay it correctly. Has anyone faced the same situation? Is there a workaround for this to replay it correctly in wayback? Thanks, --Pramila ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Archive-access-discuss mailing list Arc...@li...<mailto:Arc...@li...> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: <Pra...@on...> - 2011-06-07 20:43:57
|
Hi Brad, I don't have the wayback installed as root. That is where I start getting the issues. One of the URLs that uses Javascript menus are http://www.ipc.on.ca/english/Annual-Report/ Menus are not displayed correctly in the wayback at all. The other issue is with ontariopc party's website. http://www.ontariopc.com/ In this case it always brings me back to the landing page. Does not allow me to surf other pages. Thanks, --Pramila Thakur ________________________________ From: Bradley Tofel [mailto:br...@ar...] Sent: Tuesday, June 07, 2011 4:08 PM To: arc...@li... Subject: Re: [Archive-access-discuss] Absolute path and relative path solution Hi Pramila, I didn't yet have a chance to respond to your previous post about javascript menus. Is this the situation you're facing? Can you provide some example URLs, or HTML + JS snippets to help clarify the situation? Some background - there are 3 types of URLs: 1) absolute (ex, "http://www.example.org/images/foo.gif"<http://www.example.org/images/foo.gif>) 2) server relative (ex, "/images/foo.gif") 3) path relative (ex, "../images/foo.gif", "images/foo.gif") Wayback attempts to rewrite all URLs found in HTML. It attempts to rewrite absolute URLs found in javascript and CSS. This leaves server-relative and path-relative URLs. Path relative URLs should require no rewriting (unless they include ../../../../ references which back up beyond the actual server root...): Page: http://wayback.example.com/20010101000000/http://www.example.org/<http://wayback.example.com/20010101000000/http:/www.example.org/> Containing: "images/foo.gif" Resolves to "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif"<http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> which is correct. Server-relative links do not go to the right place: Page: http://wayback.example.com/20010101000000/http://www.example.org/<http://wayback.example.com/20010101000000/http:/www.example.org/> Containing: "/images/foo.gif" Resolves to "http://wayback.example.com/images/foo.gif"<http://wayback.example.com/images/foo.gif> which is *not* correct. Wayback does include a server-relative redirection handler. This notices incoming URLs like: "/images/foo.gif" and uses the "Referer" HTTP request header to correctly resolve the URL, redirecting the request back to the correct: "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif"<http://wayback.example.com/20010101000000/http:/www.example.org/images/foo.gif> Note that this requires some specific configuration choices, most importantly, Wayback must be deployed at the ROOT context. Unless Wayback is run at the ROOT context, it will not have an opportunity to handle the stray server-relative requests to bounce them back on track. I'm guessing this is the issue you're seeing, but I might be misinterpreting your question. Brad On 6/7/11 12:09 PM, Pra...@on...<mailto:Pra...@on...> wrote: Hi Everyone, I am facing an issue with links that have absolute and relative URL's in their href. If a site has absolute path in the links, then wayback is replaying it fine. But if the links have the URL's that are relative to the document, then wayback does not seem to replay it correctly. Has anyone faced the same situation? Is there a workaround for this to replay it correctly in wayback? Thanks, --Pramila ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Archive-access-discuss mailing list Arc...@li...<mailto:Arc...@li...> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Bradley T. <br...@ar...> - 2011-06-07 20:07:48
|
Hi Pramila, I didn't yet have a chance to respond to your previous post about javascript menus. Is this the situation you're facing? Can you provide some example URLs, or HTML + JS snippets to help clarify the situation? Some background - there are 3 types of URLs: 1) absolute (ex, "http://www.example.org/images/foo.gif") 2) server relative (ex, "/images/foo.gif") 3) path relative (ex, "../images/foo.gif", "images/foo.gif") Wayback attempts to rewrite all URLs found in HTML. It attempts to rewrite absolute URLs found in javascript and CSS. This leaves server-relative and path-relative URLs. Path relative URLs should require no rewriting (unless they include ../../../../ references which back up beyond the actual server root...): Page: http://wayback.example.com/20010101000000/http://www.example.org/ Containing: "images/foo.gif" Resolves to "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif" which is correct. Server-relative links do not go to the right place: Page: http://wayback.example.com/20010101000000/http://www.example.org/ Containing: "/images/foo.gif" Resolves to "http://wayback.example.com/images/foo.gif" which is *not* correct. Wayback does include a server-relative redirection handler. This notices incoming URLs like: "/images/foo.gif" and uses the "Referer" HTTP request header to correctly resolve the URL, redirecting the request back to the correct: "http://wayback.example.com/20010101000000/http://www.example.org/images/foo.gif" Note that this requires some specific configuration choices, most importantly, Wayback must be deployed at the ROOT context. Unless Wayback is run at the ROOT context, it will not have an opportunity to handle the stray server-relative requests to bounce them back on track. I'm guessing this is the issue you're seeing, but I might be misinterpreting your question. Brad On 6/7/11 12:09 PM, Pra...@on... wrote: > > Hi Everyone, > > I am facing an issue with links that have absolute and relative URL's > in their href. > > If a site has absolute path in the links, then wayback is replaying it > fine. > > But if the links have the URL's that are relative to the document, > then wayback does not seem to replay it correctly. > > Has anyone faced the same situation? > > Is there a workaround for this to replay it correctly in wayback? > > Thanks, > > --Pramila > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: <Pra...@on...> - 2011-06-07 19:09:35
|
Hi Everyone, I am facing an issue with links that have absolute and relative URL's in their href. If a site has absolute path in the links, then wayback is replaying it fine. But if the links have the URL's that are relative to the document, then wayback does not seem to replay it correctly. Has anyone faced the same situation? Is there a workaround for this to replay it correctly in wayback? Thanks, --Pramila |
From: Mrs S. W. <sar...@af...> - 2011-06-01 15:13:02
|
<HEAD> <META content="text/html; charset=windows-1252" http-equiv=Content-Type> <META name=GENERATOR content="MSHTML 8.00.6001.19048"></HEAD> <BODY><FONT size=2 face="Arial, Helvetica, sans-serif"><FONT size=3 face="Times New Roman">This is a message from Mrs. Sarah.Willoughbys in Queens Hospital. Please kindly read the attached message and reply back to her</FONT><BR></FONT></BODY> |
From: <Pra...@on...> - 2011-05-25 15:12:32
|
Hi Everyone, Does anyone have any idea how to crawl websites that has their menus created with JavaScript? Also if the links have relative paths as opposed to absolute path in their href, it does not work. Right now if the link has absolute path the crawled acrs work fine. But with relative paths they don't. For e.g. this site http://www.ipc.on.ca/english/Annual-Report/ Another problematic site is http://ontariopc.com<http://ontariopc.com/> After crawling and replaying it back takes to the landing page, but unable to replay the rest of the pages. Reason is it keeps coming back to landing page every time. Has anyone encountered this kind of issue? Any solution for it? Thanks, --Pramila Thakur ________________________________ |
From: Bradley T. <br...@ar...> - 2011-05-24 21:09:39
|
That's correct, thanks Mac! More specifically, the '*' indicates that the SHA1 digest of the HTTP body (the HTTP response not including the HTTP headers) changed between versions. Brad On 5/24/11 9:01 AM, Mac...@nb... wrote: > > Hi Pramila, > > In the classic version of the wayback-machine it said: " * denotes > when site was updated". > > Example: > http://classic-web.archive.org/web/*/http://www.gemeinde-bauen.ch > <http://classic-web.archive.org/web/*/http:/www.gemeinde-bauen.ch> > > Hope this helps. > > Best Regards > > Mac > > *Mac Kobus > *Digitale Archivierung ¦ e-Helvetica > > Eidgenössisches Departement des Innern EDI > Bundesamt für Kultur BAK > Schweizerische Nationalbibliothek NB > > Hallwylstrasse 15, 3003 Bern > > tel +41 31 322 89 93 > fax +41 31 322 84 63 > mac...@nb... <mailto:mac...@nb...> > www.nb.admin.ch <http://www.nb.admin.ch> ¦ > http://www.nb.admin.ch/e-helvetica > > ------------------------------------------------------------------------ > > *Von:*Pra...@on... [mailto:Pra...@on...] > *Gesendet:* Dienstag, 24. Mai 2011 17:15 > *An:* arc...@li... > *Betreff:* [Archive-access-discuss] wayback Calendar jsp > > Hi Everyone, > > Can someone clarify this for me please? In the Calendar.jsp , > sometimes it shows that the site has been crawled with links. > > Some of the links have star besides them and some don't. What is the > difference between them? > > Here is the screen shot that I am talking about. > > Thanks, > > --Pramila Thakur > > ------------------------------------------------------------------------ > > *From:*Graham, Laura [mailto:lg...@lo...] > *Sent:* Tuesday, May 24, 2011 9:45 AM > *To:* 'arc...@li...' > *Subject:* [Archive-access-discuss] Problem displaying revisit records > in wayback-1.4.2 due to indexer? > > Hi, > > From what we can tell, in order to display warc revisit records in > wayback 1.4.2, the warcs have to be indexed with the warc-indexer from > 1.4.0. > > Cdx files indexed with wayback 1.4.2 warc-indexer have a > "warc/revisit" field that wayback 1.4.2 cannot display. We just got a > blank screen. > > After doing some research, we discovered that if we index the warcs > with the wayback 1.4.0 warc-indexer, the same revisit records do > display in 1.4.2. The resulting cdx does NOT have the warc/revisit > field. > > It's quite possible we're missing something, but could it be that > 1.4.2 warc indexer was updated but not the wayback itself? We have > found that 1.6.0 can display the 1.4.2 warc/revisit cdx's. > > Thanks, > > Laura Graham > > Library of Congress > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Graham, L. <lg...@lo...> - 2011-05-24 13:45:16
|
Hi, >From what we can tell, in order to display warc revisit records in wayback 1.4.2, the warcs have to be indexed with the warc-indexer from 1.4.0. Cdx files indexed with wayback 1.4.2 warc-indexer have a "warc/revisit" field that wayback 1.4.2 cannot display. We just got a blank screen. After doing some research, we discovered that if we index the warcs with the wayback 1.4.0 warc-indexer, the same revisit records do display in 1.4.2. The resulting cdx does NOT have the warc/revisit field. It's quite possible we're missing something, but could it be that 1.4.2 warc indexer was updated but not the wayback itself? We have found that 1.6.0 can display the 1.4.2 warc/revisit cdx's. Thanks, Laura Graham Library of Congress |
From: Sawood A. <ibn...@gm...> - 2011-05-23 22:35:04
|
Hi, I am trying to setup Wayback Machine on my server. But I am getting strange problem. I am fighting with this issue for last several weeks. I followed step-by-step guide available at https://webarchive.jira.com/wiki/display/wayback/Wayback+Installation+and+Configuration+Guide By default when I restart the tomcat, it shows the intitial page. But the moment I do any changes in the XML config files, it fails to work. And throw following 404 error. HTTP Status 404 - type Status report message description The requested resource () is not available. Apache Tomcat/6.0.24 I tried making changes in the WAR file in advance and then deploying the modified WAR file. But I get the same issue again. I made sure that related ARC file directory is readable and Index file directory is writable and every folder in the path is at least executable by all. I am using RHEL6 Server with Apache Tomcat6 installed in it. I have some other WARs deployed and working fine on the same server. Thanks, -- Sawood Alam |