You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(5) |
Oct
(17) |
Nov
(30) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(4) |
Feb
(14) |
Mar
(8) |
Apr
(11) |
May
(2) |
Jun
(13) |
Jul
(9) |
Aug
(2) |
Sep
(2) |
Oct
(9) |
Nov
(20) |
Dec
(9) |
2007 |
Jan
(6) |
Feb
(4) |
Mar
(6) |
Apr
(7) |
May
(6) |
Jun
(6) |
Jul
(4) |
Aug
(3) |
Sep
(9) |
Oct
(26) |
Nov
(23) |
Dec
(2) |
2008 |
Jan
(17) |
Feb
(19) |
Mar
(16) |
Apr
(27) |
May
(3) |
Jun
(21) |
Jul
(21) |
Aug
(8) |
Sep
(13) |
Oct
(7) |
Nov
(8) |
Dec
(8) |
2009 |
Jan
(18) |
Feb
(14) |
Mar
(27) |
Apr
(14) |
May
(10) |
Jun
(14) |
Jul
(18) |
Aug
(30) |
Sep
(18) |
Oct
(12) |
Nov
(5) |
Dec
(26) |
2010 |
Jan
(27) |
Feb
(3) |
Mar
(8) |
Apr
(4) |
May
(6) |
Jun
(13) |
Jul
(25) |
Aug
(11) |
Sep
(2) |
Oct
(4) |
Nov
(7) |
Dec
(6) |
2011 |
Jan
(25) |
Feb
(17) |
Mar
(25) |
Apr
(23) |
May
(15) |
Jun
(12) |
Jul
(8) |
Aug
(13) |
Sep
(4) |
Oct
(17) |
Nov
(7) |
Dec
(6) |
2012 |
Jan
(4) |
Feb
(7) |
Mar
(1) |
Apr
(10) |
May
(11) |
Jun
(5) |
Jul
(7) |
Aug
(1) |
Sep
(1) |
Oct
(5) |
Nov
(6) |
Dec
(13) |
2013 |
Jan
(9) |
Feb
(7) |
Mar
(3) |
Apr
(1) |
May
(3) |
Jun
(19) |
Jul
(3) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2014 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(4) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2019 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Søren V. C. <sv...@kb...> - 2013-02-01 11:31:57
|
Hi all. I have installed wayback 1.7.1-SNAPSHOT, built myself directly from the pom.xml after downloading the code from https://github.com/internetarchive/wayback I'm using the locationDBResourceStore that the CDXCollection.xml uses, and it can find the correct files from the CDX. However, it fails to extract the record, as it somehow assumes that all files are GZIPPED, and when it is now, it fails miserably with the following log-entries: Jan 31, 2013 6:49:18 PM org.archive.wayback.resourcestore.resourcefile.ResourceFactory getResource INFO: Fetching: /home/prod/wayback/arcs/83807-92-0000-1.arc : 39136770 Jan 31, 2013 6:49:18 PM org.archive.wayback.resourcestore.resourcefile.ResourceFactory getResource WARNING: ResourceNotAvailable for /home/prod/wayback/arcs/83807-92-0000-1.arc Not in GZIP format Jan 31, 2013 6:49:18 PM org.archive.wayback.resourcestore.LocationDBResourceStore retrieveResource INFO: Unable to retrieve /home/prod/wayback/arcs/83807-92-0000-1.arc - java.util.zip.ZipException: Not in GZIP format Jan 31, 2013 6:49:18 PM org.archive.wayback.webapp.AccessPoint handleReplay WARNING: (1)LOADFAIL: /home/prod/wayback/arcs/83807-92-0000-1.arc - java.util.zip.ZipException: Not in GZIP format /20100107153228/http://www2.kb.dk/elib/mss/skatte/aeldre_danske/ln185.htm Can anyone help me here? /Søren --------------------------------------------------------------------------- Søren Vejrup Carlsen, Department of Digital Preservation, Royal Library, Copenhagen, Denmark tlf: (+45) 33 47 48 41 email: sv...@kb... ---------------------------------------------------------------------------- Non omnia possumus omnes --- Macrobius, Saturnalia, VI, 1, 35 ------- |
From: Aimilia A. <emi...@au...> - 2013-01-25 11:26:23
|
Hello everybody! I use Heritrix 3.1.0 to collect some .warc files. The format of these files is WARC 1.0. Then, with Wayback, I index the urls I have collect. But except from this indexing I also want to have text-indexing. So I tried nutchwax. First, I installed Hadoop 0.9.2 and then Nutchwax 0.10.0 (according to this documentation http://archive-access.sourceforge.net/projects/nutch/apidocs/overview-summary.html#toc). When I run the command ${HADOOP_HOME}/bin/hadoop jar ${NUTCHWAX_HOME}/nutchwax-0.10.0.jar all /tmp/inputs /tmp/outputs test The indexing starts normally but then I get the following errors: ..... 13/01/25 02:19:35 INFO conf.Configuration: found resource regex-urlfilter.txt at file:/tmp/hadoop-unjar7698782854953199110/regex-urlfilter.txt 13/01/25 02:19:35 INFO nutch.ImportArcs: opening /home/admin/archive/heritrix-3.1.0/jobs/aueb_v2/20120229120116/warcs/AUEB-20120229120127116-00000-6074~localhost.localdomain~8443.warc.gz 13/01/25 02:19:36 INFO conf.Configuration: found resource wax-parse-plugins.xml at file:/tmp/hadoop-unjar7698782854953199110/wax-parse-plugins.xml 13/01/25 02:20:08 INFO nutch.ImportArcs: Error parsing /home/admin/archive/heritrix-3.1.0/jobs/aueb_v2/20120229120116/warcs/AUEB-20120229120127116-00000-6074~localhost.localdomain~8443.warc.gz 13/01/25 02:20:08 INFO mapred.LocalJobRunner: Error parsing /home/admin/archive/heritrix-3.1.0/jobs/aueb_v2/20120229120116/warcs/AUEB-20120229120127116-00000-6074~localhost.localdomain~8443.warc.gz 13/01/25 02:20:08 WARN nutch.ImportArcs: Error parsing /home/admin/archive/heritrix-3.1.0/jobs/aueb_v2/20120229120116/warcs/AUEB-20120229120127116-00000-6074~localhost.localdomain~8443.warc.gz java.lang.RuntimeException: Retried but no next record (Offset 0) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:503) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:449) at org.archive.access.nutch.ImportArcs$IndexingThread.run(ImportArcs.java:356) Caused by: java.io.IOException: Failed parse of Header Line: WARC/1.0 at org.archive.io.warc.WARCRecord.parseHeaderLine(WARCRecord.java:248) at org.archive.io.warc.WARCRecord.parseHeaders(WARCRecord.java:136) at org.archive.io.warc.WARCRecord.<init>(WARCRecord.java:112) at org.archive.io.warc.WARCReader.createArchiveRecord(WARCReader.java:97) at org.archive.io.warc.WARCReaderFactory$CompressedWARCReader$1.innerNext(WARCReaderFactory.java:280) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:532) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:491) ... 2 more .... (the same error for all warc files) .... 13/01/25 02:21:04 INFO mapred.JobClient: Running job: job_tyetcg 13/01/25 02:21:04 INFO conf.Configuration: parsing file:/home/admin/archive/hadoop/hadoop-0.9.2/conf/hadoop-default.xml 13/01/25 02:21:04 INFO conf.Configuration: parsing file:/home/admin/archive/hadoop/hadoop-0.9.2/conf/mapred-default.xml 13/01/25 02:21:04 INFO conf.Configuration: parsing /tmp/hadoop-root/mapred/local/localRunner/job_tyetcg.xml 13/01/25 02:21:04 INFO conf.Configuration: parsing file:/home/admin/archive/hadoop/hadoop-0.9.2/conf/mapred-default.xml 13/01/25 02:21:04 INFO mapred.MapTask: opened part-0.out 13/01/25 02:21:04 WARN mapred.LocalJobRunner: job_tyetcg java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:109) at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:177) at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:203) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:215) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:109) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399) at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:433) at org.archive.access.nutch.Nutchwax.doDedup(Nutchwax.java:257) at org.archive.access.nutch.Nutchwax.doAll(Nutchwax.java:156) at org.archive.access.nutch.Nutchwax.doJob(Nutchwax.java:389) at org.archive.access.nutch.Nutchwax.main(Nutchwax.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) So, the output files are created (crawldb indexes linkdb segments) but they are empty. I believe that these errors result from the version of the WARC files. Can you suggest me anything (or the correct version of Nutchwax) that would create/update correctly the output files: crawldb index indexes linkdb segments? If Nutchwax is not supported any more, is there another tool to make text indexing and connect it with Wayback?Thanks in advance,Emily __________ Information from ESET NOD32 Antivirus, version of virus signature database 7930 (20130125) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com |
From: Kristinn S. <kri...@la...> - 2013-01-18 10:59:23
|
Yep, clearing out ALL eclipse configuration and then doing "import -> existing maven projects" worked reasonably well. Thanks to everyone for the assist. - Kris > -----Original Message----- > From: Nicholas Clarke [mailto:ni...@kb...] > Sent: 17. janúar 2013 21:57 > To: arc...@li... > Subject: Re: [Archive-access-discuss] Importing Wayback project into > Eclipse > > Hi yall > > If you are getting missing tomcat stuff it is because you need an EE > edition of eclipse. > > As Andy wrote, remove all .classpath, .project and .settings/**. They > should not be included in maven project repos. > > Besides this "import -> existing maven projects" should do the trick. > > M2e plugin for juno > http://download.eclipse.org/technology/m2e/releases > > The project will maybe have com.sun.tools artifacts missing. > One solution seems to be to ensure the project java used is a jdke > and not a jre. > > However, adding this to the pom worked for me. > > <profiles> > <profile> > <id>default-tools.jar</id> > <activation> > <property> > <name>java.vendor</name> > <value>Sun Microsystems Inc.</value> > </property> > </activation> > <dependencies> > <dependency> > <groupId>com.sun</groupId> > <artifactId>tools</artifactId> > <version>1.6.0</version> > <scope>system</scope> > <systemPath>${JAVA_HOME}/../lib/tools.jar</systemPath> > </dependency> > </dependencies> > </profile> > </profiles> > > This brings the problems down to 19 missing ConcurrentSkipListSet. > I didn't have the patience to install a jdk1.5 and use the requested > com.sun.tools:1.5.0. > > Best > Nicholas > > > -----Oprindelig meddelelse----- > > Fra: Jackson, Andrew [mailto:And...@bl...] > > Sendt: 17. januar 2013 17:58 > > Til: Kristinn Sigurðsson; archive-access- > di...@li... > > Emne: Re: [Archive-access-discuss] Importing Wayback project into > > Eclipse > > > > Hi Kris, > > > > I got it working a while ago. My memory is a bit fuzzy, but IIRC > the > > problem is that the repo has the Eclipse project files checked in > > (.classpath .project .settings/**). I think I had to remove these > > before importing the project into Eclipse/m2e. > > > > Best wishes, > > Andy > > > > > -----Original Message----- > > > From: Kristinn Sigurðsson [mailto:kri...@la...] > > > Sent: 17 January 2013 15:42 > > > To: arc...@li... > > > Subject: [Archive-access-discuss] Importing Wayback project into > > Eclipse > > > > > > Hi all, > > > > > > Are there any instructions available for checking Wayback out of > Git > > and > > > getting it set up in Eclipse? > > > > > > Checking it out and compiling it with Maven on the command line > > > works > > fine > > > but whenever I try to import it into Eclipse (Juno with m2eclipse > > > for > > Maven > > > support) it all goes haywire and fails to find numerous > dependencies. > > > > > > Best regards, > > > Kris > > > > > > ----------------------------------------------------------------- > --- > > > - > > ---- > > > Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 > > Reykjavík > > > Sími/Tel: +354 5255600 | www.landsbokasafn.is > > > ----------------------------------------------------------------- > --- > > > - > > ---- > > > fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is > > > ----------------------------------------------------------------- > --- > > > - > > --------- > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > MVC, > > > Windows 8 Apps, JavaScript and much more. Keep your skills > current > > with > > > LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs > > and > > > experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Archive-access-discuss mailing list > > > Arc...@li... > > > https://lists.sourceforge.net/lists/listinfo/archive-access- > discuss > > ------------------------------------------------------------------- > --- > > - > > ------- > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > > current with LearnDevNow - 3,200 step-by-step video tutorials by > > Microsoft MVPs and experts. ON SALE this month only -- learn more > at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > --------------------------------------------------------------------- > --------- > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current with LearnDevNow - 3,200 step-by-step video tutorials by > Microsoft MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Nicholas C. <ni...@kb...> - 2013-01-17 22:11:42
|
Also it seems the pom defines the source and target as 1.5. But some of the code uses 1.6 only methods...! Change the pom source/target to 1.6 and presto no errors. :) -Nicholas > -----Oprindelig meddelelse----- > Fra: Nicholas Clarke [mailto:ni...@kb...] > Sendt: 17. januar 2013 22:57 > Til: arc...@li... > Emne: Re: [Archive-access-discuss] Importing Wayback project into > Eclipse > > Hi yall > > If you are getting missing tomcat stuff it is because you need an EE > edition of eclipse. > > As Andy wrote, remove all .classpath, .project and .settings/**. They > should not be included in maven project repos. > > Besides this "import -> existing maven projects" should do the trick. > > M2e plugin for juno > http://download.eclipse.org/technology/m2e/releases > > The project will maybe have com.sun.tools artifacts missing. > One solution seems to be to ensure the project java used is a jdke and > not a jre. > > However, adding this to the pom worked for me. > > <profiles> > <profile> > <id>default-tools.jar</id> > <activation> > <property> > <name>java.vendor</name> > <value>Sun Microsystems Inc.</value> > </property> > </activation> > <dependencies> > <dependency> > <groupId>com.sun</groupId> > <artifactId>tools</artifactId> > <version>1.6.0</version> > <scope>system</scope> > <systemPath>${JAVA_HOME}/../lib/tools.jar</systemPath> > </dependency> > </dependencies> > </profile> > </profiles> > > This brings the problems down to 19 missing ConcurrentSkipListSet. > I didn't have the patience to install a jdk1.5 and use the requested > com.sun.tools:1.5.0. > > Best > Nicholas > > > -----Oprindelig meddelelse----- > > Fra: Jackson, Andrew [mailto:And...@bl...] > > Sendt: 17. januar 2013 17:58 > > Til: Kristinn Sigurðsson; archive-access- > di...@li... > > Emne: Re: [Archive-access-discuss] Importing Wayback project into > > Eclipse > > > > Hi Kris, > > > > I got it working a while ago. My memory is a bit fuzzy, but IIRC the > > problem is that the repo has the Eclipse project files checked in > > (.classpath .project .settings/**). I think I had to remove these > > before importing the project into Eclipse/m2e. > > > > Best wishes, > > Andy > > > > > -----Original Message----- > > > From: Kristinn Sigurðsson [mailto:kri...@la...] > > > Sent: 17 January 2013 15:42 > > > To: arc...@li... > > > Subject: [Archive-access-discuss] Importing Wayback project into > > Eclipse > > > > > > Hi all, > > > > > > Are there any instructions available for checking Wayback out of > Git > > and > > > getting it set up in Eclipse? > > > > > > Checking it out and compiling it with Maven on the command line > works > > fine > > > but whenever I try to import it into Eclipse (Juno with m2eclipse > for > > Maven > > > support) it all goes haywire and fails to find numerous > dependencies. > > > > > > Best regards, > > > Kris > > > > > > ------------------------------------------------------------------- > -- > > ---- > > > Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 > > Reykjavík > > > Sími/Tel: +354 5255600 | www.landsbokasafn.is > > > ------------------------------------------------------------------- > -- > > ---- > > > fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is > > > ------------------------------------------------------------------- > -- > > --------- > > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > CSS, > > MVC, > > > Windows 8 Apps, JavaScript and much more. Keep your skills current > > with > > > LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs > > and > > > experts. ON SALE this month only -- learn more at: > > > http://p.sf.net/sfu/learnmore_122712 > > > _______________________________________________ > > > Archive-access-discuss mailing list > > > Arc...@li... > > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > --------------------------------------------------------------------- > -- > > ------- > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > ----------------------------------------------------------------------- > ------- > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Nicholas C. <ni...@kb...> - 2013-01-17 21:57:29
|
Hi yall If you are getting missing tomcat stuff it is because you need an EE edition of eclipse. As Andy wrote, remove all .classpath, .project and .settings/**. They should not be included in maven project repos. Besides this "import -> existing maven projects" should do the trick. M2e plugin for juno http://download.eclipse.org/technology/m2e/releases The project will maybe have com.sun.tools artifacts missing. One solution seems to be to ensure the project java used is a jdke and not a jre. However, adding this to the pom worked for me. <profiles> <profile> <id>default-tools.jar</id> <activation> <property> <name>java.vendor</name> <value>Sun Microsystems Inc.</value> </property> </activation> <dependencies> <dependency> <groupId>com.sun</groupId> <artifactId>tools</artifactId> <version>1.6.0</version> <scope>system</scope> <systemPath>${JAVA_HOME}/../lib/tools.jar</systemPath> </dependency> </dependencies> </profile> </profiles> This brings the problems down to 19 missing ConcurrentSkipListSet. I didn't have the patience to install a jdk1.5 and use the requested com.sun.tools:1.5.0. Best Nicholas > -----Oprindelig meddelelse----- > Fra: Jackson, Andrew [mailto:And...@bl...] > Sendt: 17. januar 2013 17:58 > Til: Kristinn Sigurðsson; arc...@li... > Emne: Re: [Archive-access-discuss] Importing Wayback project into > Eclipse > > Hi Kris, > > I got it working a while ago. My memory is a bit fuzzy, but IIRC the > problem is that the repo has the Eclipse project files checked in > (.classpath .project .settings/**). I think I had to remove these > before importing the project into Eclipse/m2e. > > Best wishes, > Andy > > > -----Original Message----- > > From: Kristinn Sigurðsson [mailto:kri...@la...] > > Sent: 17 January 2013 15:42 > > To: arc...@li... > > Subject: [Archive-access-discuss] Importing Wayback project into > Eclipse > > > > Hi all, > > > > Are there any instructions available for checking Wayback out of Git > and > > getting it set up in Eclipse? > > > > Checking it out and compiling it with Maven on the command line works > fine > > but whenever I try to import it into Eclipse (Juno with m2eclipse for > Maven > > support) it all goes haywire and fails to find numerous dependencies. > > > > Best regards, > > Kris > > > > --------------------------------------------------------------------- > ---- > > Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 > Reykjavík > > Sími/Tel: +354 5255600 | www.landsbokasafn.is > > --------------------------------------------------------------------- > ---- > > fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is > > --------------------------------------------------------------------- > --------- > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, > > Windows 8 Apps, JavaScript and much more. Keep your skills current > with > > LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs > and > > experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_122712 > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > ----------------------------------------------------------------------- > ------- > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Jackson, A. <And...@bl...> - 2013-01-17 17:10:01
|
Hi Kris, I got it working a while ago. My memory is a bit fuzzy, but IIRC the problem is that the repo has the Eclipse project files checked in (.classpath .project .settings/**). I think I had to remove these before importing the project into Eclipse/m2e. Best wishes, Andy > -----Original Message----- > From: Kristinn Sigurðsson [mailto:kri...@la...] > Sent: 17 January 2013 15:42 > To: arc...@li... > Subject: [Archive-access-discuss] Importing Wayback project into Eclipse > > Hi all, > > Are there any instructions available for checking Wayback out of Git and > getting it set up in Eclipse? > > Checking it out and compiling it with Maven on the command line works fine > but whenever I try to import it into Eclipse (Juno with m2eclipse for Maven > support) it all goes haywire and fails to find numerous dependencies. > > Best regards, > Kris > > ------------------------------------------------------------------------- > Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík > Sími/Tel: +354 5255600 | www.landsbokasafn.is > ------------------------------------------------------------------------- > fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, > Windows 8 Apps, JavaScript and much more. Keep your skills current with > LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and > experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Kristinn S. <kri...@la...> - 2013-01-17 16:19:11
|
Hi all, Are there any instructions available for checking Wayback out of Git and getting it set up in Eclipse? Checking it out and compiling it with Maven on the command line works fine but whenever I try to import it into Eclipse (Juno with m2eclipse for Maven support) it all goes haywire and fails to find numerous dependencies. Best regards, Kris ------------------------------------------------------------------------- Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík Sími/Tel: +354 5255600 | www.landsbokasafn.is ------------------------------------------------------------------------- fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is |
From: Drazenko C. <dra...@sr...> - 2013-01-03 11:08:09
|
Hi, bug in java was resolved on 2011-03-08 and wayback 1.6.0 is older. You should probably use newer wayback (http://builds.archive.org:8080/maven2/org/archive/wayback/dist/) and java version greater than 6u22. Regards, Drazenko On 2.1.2013. 13:24, Henrik Ranthin wrote: > Hi! > > I switched to an old java version (6u22) and now it seems to work! > Previously I used java 1.7.0_03. > > For wayback I use version 1.6.0. > > I saw that the JIRA issue HER-1865 was marked as fixed. Maybe the fix is just not included in the wayback version I'm using? > > > Thanks for all the help! > > Regards, Henrik > > -----Original Message----- > From: Drazenko Celjak [mailto:dra...@sr...] > Sent: den 28 december 2012 21:32 > To: Henrik Ranthin > Cc: arc...@li... > Subject: Re: [Archive-access-discuss] Read compressed warc.gz files with Wayback > > Hi, > > which java and wayback versions do you use? > > I had the same problem when I used old version of Heritrix (1.14.4) and java newer than 6u22. Here was the reason: > https://webarchive.jira.com/browse/HER-1865 > > Regards, > Drazenko > > > On 28.12.2012. 16:04, Henrik Ranthin wrote: >> Hi! >> >> Nope, it is a warc file. I attached a sample warc file. Maybe you can have a quick look at it and see if there is something strange? >> >> Regards, Henrik >> >> -----Original Message----- >> From: Erik Hetzner [mailto:eri...@uc...] >> Sent: den 21 december 2012 22:43 >> To: Henrik Ranthin >> Cc: arc...@li... >> Subject: Re: [Archive-access-discuss] Read compressed warc.gz files >> with Wayback >> >> Hi Henrik, >> >> At Fri, 21 Dec 2012 10:52:24 +0000, >> Henrik Ranthin wrote: >>> >>> Thanks for the quick reply! >>> The warc files I’ve used are created (and compressed) by the Heritrix web crawler (version 3.1.1). >>> >>> I thought the output from Heritrix should be compatible with Wayback. Maybe I’m missing some setting? >> >> Yes, they should be. Sorry for the distraction, but badly gzipped WARC files are often the problem. >>> >>> I’ve also tried to compress the file using the scripts from the warc-tools project: >>> warc2warc.py –Z my_archive.warc> my_archive.warc.gz However, I still >>> get the same result. >>> >>> From the log it seems like Wayback is treating the name of the compressed warc file as an URL: >>> >>> Dec 21, 2012 10:57:27 AM >>> org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter >>> adapt >>> WARNING: FAILED >>> canonicalize(http://filedesc:WEB-20121128091040702-00000-26202~192.16 >>> 8 >>> .24.4~8443.warc.gz:WEB-20121128091040702-00000-26202~192.168.24.4~844 >>> 3 >>> .warc.gz) >> >> I’m just guessing here, but the WARC files I see don’t start with filedesc://... records; only the ARC files. Is this an ARC file that was named with .warc.gz rather than .arc.gz? >> >> best, Erik >> >> >> >> ---------------------------------------------------------------------- >> -------- Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API >> and much more. Get web development skills now with LearnDevNow - >> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. >> SALE $99.99 this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122812 >> >> >> >> _______________________________________________ >> Archive-access-discuss mailing list >> Arc...@li... >> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Drazenko C. <dra...@sr...> - 2012-12-28 20:50:41
|
Hi, which java and wayback versions do you use? I had the same problem when I used old version of Heritrix (1.14.4) and java newer than 6u22. Here was the reason: https://webarchive.jira.com/browse/HER-1865 Regards, Drazenko On 28.12.2012. 16:04, Henrik Ranthin wrote: > Hi! > > Nope, it is a warc file. I attached a sample warc file. Maybe you can have a quick look at it and see if there is something strange? > > Regards, Henrik > > -----Original Message----- > From: Erik Hetzner [mailto:eri...@uc...] > Sent: den 21 december 2012 22:43 > To: Henrik Ranthin > Cc: arc...@li... > Subject: Re: [Archive-access-discuss] Read compressed warc.gz files with Wayback > > Hi Henrik, > > At Fri, 21 Dec 2012 10:52:24 +0000, > Henrik Ranthin wrote: >> >> Thanks for the quick reply! >> The warc files I’ve used are created (and compressed) by the Heritrix web crawler (version 3.1.1). >> >> I thought the output from Heritrix should be compatible with Wayback. Maybe I’m missing some setting? > > Yes, they should be. Sorry for the distraction, but badly gzipped WARC files are often the problem. >> >> I’ve also tried to compress the file using the scripts from the warc-tools project: >> warc2warc.py –Z my_archive.warc> my_archive.warc.gz However, I still >> get the same result. >> >> From the log it seems like Wayback is treating the name of the compressed warc file as an URL: >> >> Dec 21, 2012 10:57:27 AM >> org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter >> adapt >> WARNING: FAILED >> canonicalize(http://filedesc:WEB-20121128091040702-00000-26202~192.168 >> .24.4~8443.warc.gz:WEB-20121128091040702-00000-26202~192.168.24.4~8443 >> .warc.gz) > > I’m just guessing here, but the WARC files I see don’t start with filedesc://... records; only the ARC files. Is this an ARC file that was named with .warc.gz rather than .arc.gz? > > best, Erik > > > > ------------------------------------------------------------------------------ > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > much more. Get web development skills now with LearnDevNow - > 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. > SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122812 > > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Nicholas C. <ni...@kb...> - 2012-12-28 20:37:58
|
The archive itself seems fine. Have you tried without ~ in the filename? # # Summary of 'C:\Java\workspace\jwat-tools\WEB-20121130122928187-00000-26202~192.168.24.4~8443.warc.gz' # GZip.isValid: true GZip.Entries: 1119 GZip.Errors: 0 GZip.Warnings: 0 Warc.isValid: true Warc.Records: 1119 Warc.Errors: 0 Warc.Warnings: 0 # # Job summary # GZip files: 0 + Arc: 0 + Warc: 1 Arc files: 0 Warc files: 0 Errors: 0 Warnings: 0 RuntimeErr: 0 Skipped: 0 Validation took 5413 ms. Best Nicholas > -----Oprindelig meddelelse----- > Fra: Henrik Ranthin [mailto:Hen...@ap...] > Sendt: 28. december 2012 16:04 > Til: Erik Hetzner > Cc: arc...@li... > Emne: Re: [Archive-access-discuss] Read compressed warc.gz files with > Wayback > > Hi! > > Nope, it is a warc file. I attached a sample warc file. Maybe you can > have a quick look at it and see if there is something strange? > > Regards, Henrik > > -----Original Message----- > From: Erik Hetzner [mailto:eri...@uc...] > Sent: den 21 december 2012 22:43 > To: Henrik Ranthin > Cc: arc...@li... > Subject: Re: [Archive-access-discuss] Read compressed warc.gz files > with Wayback > > Hi Henrik, > > At Fri, 21 Dec 2012 10:52:24 +0000, > Henrik Ranthin wrote: > > > > Thanks for the quick reply! > > The warc files I’ve used are created (and compressed) by the Heritrix > web crawler (version 3.1.1). > > > > I thought the output from Heritrix should be compatible with Wayback. > Maybe I’m missing some setting? > > Yes, they should be. Sorry for the distraction, but badly gzipped WARC > files are often the problem. > > > > I’ve also tried to compress the file using the scripts from the warc- > tools project: > > warc2warc.py –Z my_archive.warc > my_archive.warc.gz However, I still > > get the same result. > > > > From the log it seems like Wayback is treating the name of the > compressed warc file as an URL: > > > > Dec 21, 2012 10:57:27 AM > > org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter > > adapt > > WARNING: FAILED > > canonicalize(http://filedesc:WEB-20121128091040702-00000- > 26202~192.168 > > .24.4~8443.warc.gz:WEB-20121128091040702-00000- > 26202~192.168.24.4~8443 > > .warc.gz) > > I’m just guessing here, but the WARC files I see don’t start with > filedesc://... records; only the ARC files. Is this an ARC file that > was named with .warc.gz rather than .arc.gz? > > best, Erik |
From: Henrik R. <Hen...@ap...> - 2012-12-28 15:05:45
|
Hi! Nope, it is a warc file. I attached a sample warc file. Maybe you can have a quick look at it and see if there is something strange? Regards, Henrik -----Original Message----- From: Erik Hetzner [mailto:eri...@uc...] Sent: den 21 december 2012 22:43 To: Henrik Ranthin Cc: arc...@li... Subject: Re: [Archive-access-discuss] Read compressed warc.gz files with Wayback Hi Henrik, At Fri, 21 Dec 2012 10:52:24 +0000, Henrik Ranthin wrote: > > Thanks for the quick reply! > The warc files I’ve used are created (and compressed) by the Heritrix web crawler (version 3.1.1). > > I thought the output from Heritrix should be compatible with Wayback. Maybe I’m missing some setting? Yes, they should be. Sorry for the distraction, but badly gzipped WARC files are often the problem. > > I’ve also tried to compress the file using the scripts from the warc-tools project: > warc2warc.py –Z my_archive.warc > my_archive.warc.gz However, I still > get the same result. > > From the log it seems like Wayback is treating the name of the compressed warc file as an URL: > > Dec 21, 2012 10:57:27 AM > org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter > adapt > WARNING: FAILED > canonicalize(http://filedesc:WEB-20121128091040702-00000-26202~192.168 > .24.4~8443.warc.gz:WEB-20121128091040702-00000-26202~192.168.24.4~8443 > .warc.gz) I’m just guessing here, but the WARC files I see don’t start with filedesc://... records; only the ARC files. Is this an ARC file that was named with .warc.gz rather than .arc.gz? best, Erik |
From: Erik H. <eri...@uc...> - 2012-12-21 21:43:22
|
Hi Henrik, At Fri, 21 Dec 2012 10:52:24 +0000, Henrik Ranthin wrote: > > Thanks for the quick reply! > The warc files I’ve used are created (and compressed) by the Heritrix web crawler (version 3.1.1). > > I thought the output from Heritrix should be compatible with Wayback. Maybe I’m missing some setting? Yes, they should be. Sorry for the distraction, but badly gzipped WARC files are often the problem. > > I’ve also tried to compress the file using the scripts from the warc-tools project: > warc2warc.py –Z my_archive.warc > my_archive.warc.gz > However, I still get the same result. > > From the log it seems like Wayback is treating the name of the compressed warc file as an URL: > > Dec 21, 2012 10:57:27 AM org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter adapt > WARNING: FAILED canonicalize(http://filedesc:WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz:WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz) I’m just guessing here, but the WARC files I see don’t start with filedesc://... records; only the ARC files. Is this an ARC file that was named with .warc.gz rather than .arc.gz? best, Erik |
From: Henrik R. <Hen...@ap...> - 2012-12-21 10:52:36
|
Thanks for the quick reply! The warc files I’ve used are created (and compressed) by the Heritrix web crawler (version 3.1.1). I thought the output from Heritrix should be compatible with Wayback. Maybe I’m missing some setting? I’ve also tried to compress the file using the scripts from the warc-tools project: warc2warc.py –Z my_archive.warc > my_archive.warc.gz However, I still get the same result. From the log it seems like Wayback is treating the name of the compressed warc file as an URL: Dec 21, 2012 10:57:27 AM org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter adapt WARNING: FAILED canonicalize(http://filedesc:WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz:WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz) Regards, Henrik -----Original Message----- From: Erik Hetzner [mailto:eri...@uc...] Sent: den 20 december 2012 18:04 To: Henrik Ranthin Cc: arc...@li... Subject: Re: [Archive-access-discuss] Read compressed warc.gz files with Wayback At Thu, 20 Dec 2012 11:57:48 +0000, Henrik Ranthin wrote: > > Hi! > > I can't get Wayback to work with compressed warc files. Is it possible to make it work or is it required to uncompress the files? > (I've tried uncompressing the warc.gz files and then everything works > as intended) Hi Henrik, How are your WARC files compressed? Do you compress them yourself, or are they compressed using a special (W)ARC tool? (You can’t compress them yourself using a normal gzip tool.) For more information, see, e.g., http://sourceforge.net/mailarchive/message.php?msg_id=28183532 best, Erik |
From: Erik H. <eri...@uc...> - 2012-12-20 17:21:10
|
At Thu, 20 Dec 2012 11:57:48 +0000, Henrik Ranthin wrote: > > Hi! > > I can't get Wayback to work with compressed warc files. Is it possible to make it work or is it required to uncompress the files? > (I've tried uncompressing the warc.gz files and then everything works as intended) Hi Henrik, How are your WARC files compressed? Do you compress them yourself, or are they compressed using a special (W)ARC tool? (You can’t compress them yourself using a normal gzip tool.) For more information, see, e.g., http://sourceforge.net/mailarchive/message.php?msg_id=28183532 best, Erik |
From: Henrik R. <Hen...@ap...> - 2012-12-20 12:10:50
|
Hi! I can't get Wayback to work with compressed warc files. Is it possible to make it work or is it required to uncompress the files? (I've tried uncompressing the warc.gz files and then everything works as intended) I've specified a directory in BDBCollection.xml which contains a single compressed warc file. <bean id="datadirs" class="org.springframework.beans.factory.config.ListFactoryBean"> <property name="sourceList"> <list> <bean class="org.archive.wayback.resourcestore.resourcefile.DirectoryResourceFileSource"> <property name="name" value="files1" /> <property name="prefix" value="/local/heritrix/heritrix-3.1.1/jobs/testjob2/20121128091005/warcs"/> <property name="recurse" value="true" /> </bean> </list> </property> </bean> >From the tomcat log: INFO: Added WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz /local/heritrix/heritrix-3.1.1/jobs/testjob2/20121128091005/warcs/WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz Dec 20, 2012 11:22:04 AM org.archive.wayback.resourcestore.indexer.IndexQueueUpdater updateQueue INFO: Queued WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz for indexing. Dec 20, 2012 11:22:04 AM org.archive.wayback.resourcestore.indexer.IndexWorker doWork INFO: Indexing WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz from /local/heritrix/heritrix-3.1.1/jobs/testjob2/20121128091005/warcs/WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz Dec 20, 2012 11:22:04 AM org.archive.wayback.resourceindex.updater.IndexClient addCDX INFO: Queued /local/wayback/base/index-data/incoming/WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz for merging. Dec 20, 2012 11:22:05 AM org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter adapt WARNING: FAILED canonicalize(http://filedesc:WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz:WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz) Dec 20, 2012 11:22:05 AM org.archive.wayback.resourceindex.updater.LocalResourceIndexUpdater handleMerged INFO: Renamed merged file /local/wayback/base/index-data/incoming/WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz to /local/wayback/base/index-data/merged/WEB-20121128091040702-00000-26202~192.168.24.4~8443.warc.gz Any ideas? Thanks, Henrik |
From: Adam M. <ad...@ar...> - 2012-12-06 00:45:14
|
I haven't tried this in wayback yet, but I've had some success with wrapping JavaScript functions in the browser in the past. I tend to use the following pattern: window.XMLHttpRequestBase = window.XMLHttpRequest; window.XMLHttpRequest = function(){ var base = new window.XMLHttpRequestBase(); base.openBase = base.open; base.open = function(sMethod, sUrl, bAsync, sUser, sPassword){ //console.log("XHR "+sMethod+": "+sUrl); //manipulate sUrl here return base.openBase(sMethod, sUrl, bAsync, sUser, sPassword); } return base; } This should be adaptable to window.open and the various alternatives for JS redirects. This example would be useful for intercepting ajax requests. I've been meaning to play with this myself. Let us know how this works out. Adam Miller On Dec 5, 2012, at 5:03 AM, "Jackson, Andrew" <And...@bl...> wrote: > We’re using the server-side rewriting engine, and it’s not catching this case. We’ll probably experiment with injecting JavaScript to override the window.open function so that there’s a re-write in the function invocation chain, but would be interesting to know if anyone has already tried this or other approaches. > > Thanks again, > Andy > > From: Bjarne Andersen [mailto:bj...@st...] > Sent: 05 December 2012 12:04 > To: Jackson, Andrew; arc...@li... > Subject: SV: [Archive-access-discuss] Can Wayback cope withwindow.open('{URL}','_self'); ? > > I have a feeling that this is not possible when using the broswer-based rewriting engine. > You would need to use either the serverside rewriting (I never tried that) or proxy based replay (which we allways use at Netarchive.dk to gain the most optimal replay) to catch a thing like this. > > Best > Bjarne > > Fra: Jackson, Andrew [mailto:And...@bl...] > Sendt: 5. december 2012 12:32 > Til: arc...@li... > Emne: [Archive-access-discuss] Can Wayback cope with window.open('{URL}','_self'); ? > > Hi All, > > We’re having problems with live-leakage, in that if you go to: > > http://www.webarchive.org.uk/wayback/archive/20081212111012/http://www.woolworthsgroupplc.com/site/woolworths.asp > > Then some JavaScript immediately redirects you to the live version of the site: > > window.open('http://www.woolworths.co.uk/','_self'); > > Does anyone know if there is any existing code in Wayback one can use to override this behaviour? > > Thanks for your time, > Andy Jackson > > -- > Andrew Jackson > Web Archiving Technical Lead > The British Library > > Tel: 01937 546602 > Mobile: 07765 897948 > Web: www.webarchive.org.uk > Twitter: @UKWebArchive > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d_______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Jackson, A. <And...@bl...> - 2012-12-05 13:03:22
|
We're using the server-side rewriting engine, and it's not catching this case. We'll probably experiment with injecting JavaScript to override the window.open function so that there's a re-write in the function invocation chain, but would be interesting to know if anyone has already tried this or other approaches. Thanks again, Andy From: Bjarne Andersen [mailto:bj...@st...] Sent: 05 December 2012 12:04 To: Jackson, Andrew; arc...@li... Subject: SV: [Archive-access-discuss] Can Wayback cope withwindow.open('{URL}','_self'); ? I have a feeling that this is not possible when using the broswer-based rewriting engine. You would need to use either the serverside rewriting (I never tried that) or proxy based replay (which we allways use at Netarchive.dk to gain the most optimal replay) to catch a thing like this. Best Bjarne Fra: Jackson, Andrew [mailto:And...@bl...] Sendt: 5. december 2012 12:32 Til: arc...@li... Emne: [Archive-access-discuss] Can Wayback cope with window.open('{URL}','_self'); ? Hi All, We're having problems with live-leakage, in that if you go to: http://www.webarchive.org.uk/wayback/archive/20081212111012/http://www.w oolworthsgroupplc.com/site/woolworths.asp <http://www.webarchive.org.uk/wayback/archive/20081212111012/http:/www.w oolworthsgroupplc.com/site/woolworths.asp> Then some JavaScript immediately redirects you to the live version of the site: window.open('http://www.woolworths.co.uk/','_self'); Does anyone know if there is any existing code in Wayback one can use to override this behaviour? Thanks for your time, Andy Jackson -- Andrew Jackson Web Archiving Technical Lead The British Library Tel: 01937 546602 Mobile: 07765 897948 Web: www.webarchive.org.uk Twitter: @UKWebArchive |
From: Bjarne A. <bj...@st...> - 2012-12-05 12:04:07
|
I have a feeling that this is not possible when using the broswer-based rewriting engine. You would need to use either the serverside rewriting (I never tried that) or proxy based replay (which we allways use at Netarchive.dk to gain the most optimal replay) to catch a thing like this. Best Bjarne Fra: Jackson, Andrew [mailto:And...@bl...] Sendt: 5. december 2012 12:32 Til: arc...@li... Emne: [Archive-access-discuss] Can Wayback cope with window.open('{URL}','_self'); ? Hi All, We're having problems with live-leakage, in that if you go to: http://www.webarchive.org.uk/wayback/archive/20081212111012/http://www.woolworthsgroupplc.com/site/woolworths.asp<http://www.webarchive.org.uk/wayback/archive/20081212111012/http:/www.woolworthsgroupplc.com/site/woolworths.asp> Then some JavaScript immediately redirects you to the live version of the site: window.open('http://www.woolworths.co.uk/','_self'); Does anyone know if there is any existing code in Wayback one can use to override this behaviour? Thanks for your time, Andy Jackson -- Andrew Jackson Web Archiving Technical Lead The British Library Tel: 01937 546602 Mobile: 07765 897948 Web: www.webarchive.org.uk<http://www.webarchive.org.uk> Twitter: @UKWebArchive |
From: Jackson, A. <And...@bl...> - 2012-12-05 11:32:00
|
Hi All, We're having problems with live-leakage, in that if you go to: http://www.webarchive.org.uk/wayback/archive/20081212111012/http://www.w oolworthsgroupplc.com/site/woolworths.asp <http://www.webarchive.org.uk/wayback/archive/20081212111012/http:/www.w oolworthsgroupplc.com/site/woolworths.asp> Then some JavaScript immediately redirects you to the live version of the site: window.open('http://www.woolworths.co.uk/','_self'); Does anyone know if there is any existing code in Wayback one can use to override this behaviour? Thanks for your time, Andy Jackson -- Andrew Jackson Web Archiving Technical Lead The British Library Tel: 01937 546602 Mobile: 07765 897948 Web: www.webarchive.org.uk Twitter: @UKWebArchive |
From: <squ...@ta...> - 2012-12-04 00:35:31
|
greetings, I am perplexed, I am running wayback-1.7.1 in ubuntu tomcat6 etc.. and I have been running and everything runs fairly well until i hit a "resource not available" in wayback, I check the tomcat logs and see "org.archive.wayback.webapp.AccessPoint logNotInArchive " I check the CDX and the record is there pointing to the proper warc file I pull down the warcfile and search for the record in the warc file, and the record is there as it was indexed in the first place (other records in this warc file are rendered just fine in wayback) however wayback says it's not in the archive. this is not an isolated incident. It happens "seemingly" randomly. Is there something else I should be checking? if it's in the cdx it should be in the list view I also find an odd thing happening where a request for a url is made where request is == wayback/yyyymmdd(etc)/http://url and even though the requested document is in the CDX wayback returns an earlier document or one with a different timestamp. I know these are two different questions but they might be related I am running wayback-1.7.1 could there be something that i am missing? thanks! |
From: Coram, R. <Rog...@bl...> - 2012-12-03 11:16:39
|
Thanks, that worked. I'd tried swapping the JSPReplayRenderer for the TransparentReplayRenderer but hadn't thought to remove the bean entirely. Roger -----Original Message----- From: Ko, Lauren [mailto:Lau...@un...] Sent: 28 November 2012 16:28 To: Coram, Roger; Wayback ML Subject: RE: Replay Renderers and Redirects Does commenting out the following bean in ArchivalUrlReplay.xml work for you? <bean class="org.archive.wayback.replay.selector.RedirectSelector"> <property name="renderer"> <bean class="org.archive.wayback.replay.JSPReplayRenderer"> <property name="targetJsp" value="/WEB-INF/replay/UrlRedirectNotice.jsp" /> </bean> </property> </bean> Lauren Ko Programmer UNT Libraries ________________________________________ From: Coram, Roger [Rog...@bl...] Sent: Wednesday, November 28, 2012 8:17 AM To: Wayback ML Subject: [Archive-access-discuss] Replay Renderers and Redirects Hi, By default, when Wayback encounters a 302 it displays a page courtesy of the JSPReplayRenderer before redirecting. We'd prefer that Wayback do this transparently without displaying any message. Is there any way to do this? There's the TransparentReplayRenderer but this doesn't modify the data and will redirect to the live site. And setting the timeout in the JSP page to zero still has it briefly appear. Thanks, Roger ------------------------------------------------------------------------ ------ Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Ko, L. <Lau...@un...> - 2012-11-28 16:28:08
|
Does commenting out the following bean in ArchivalUrlReplay.xml work for you? <bean class="org.archive.wayback.replay.selector.RedirectSelector"> <property name="renderer"> <bean class="org.archive.wayback.replay.JSPReplayRenderer"> <property name="targetJsp" value="/WEB-INF/replay/UrlRedirectNotice.jsp" /> </bean> </property> </bean> Lauren Ko Programmer UNT Libraries ________________________________________ From: Coram, Roger [Rog...@bl...] Sent: Wednesday, November 28, 2012 8:17 AM To: Wayback ML Subject: [Archive-access-discuss] Replay Renderers and Redirects Hi, By default, when Wayback encounters a 302 it displays a page courtesy of the JSPReplayRenderer before redirecting. We'd prefer that Wayback do this transparently without displaying any message. Is there any way to do this? There's the TransparentReplayRenderer but this doesn't modify the data and will redirect to the live site. And setting the timeout in the JSP page to zero still has it briefly appear. Thanks, Roger ------------------------------------------------------------------------------ Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |
From: Coram, R. <Rog...@bl...> - 2012-11-28 14:29:35
|
Hi, By default, when Wayback encounters a 302 it displays a page courtesy of the JSPReplayRenderer before redirecting. We'd prefer that Wayback do this transparently without displaying any message. Is there any way to do this? There's the TransparentReplayRenderer but this doesn't modify the data and will redirect to the live site. And setting the timeout in the JSP page to zero still has it briefly appear. Thanks, Roger |
From: Nicolas G. <nik...@gm...> - 2012-11-07 12:17:25
|
Hi, I have browsed the 1.7.1 code on GitHub, and I've seen quite a couple of interesting things with the new AccessPoint class hierarchy. However the code is nearly undocumented, so I'd appreciate a couple of explanations. First let me explain what I'm trying to do. The infrastructure we use for Wayback at the French National library implies that Wayback acts in proxy mode, but with a single proxy (the proxy is configured once and for all in the browser). We have two additional needs: - also be able to process ArchivalUrl replay requests (as we can use these as permalinks) - be able to choose which WaybackCollection we want to search in, without changing the browser's proxy This worked with Wayback 1.4.1, but the code was quite a hack, since AccessPoint was hardly extensible. I have adapted this code and enhanced it a bit for 1.7.1, but still am not satisfied with the design. Now looking at AccessPointAdapter and ProxyAccessPoint classes, I believe there might be a clean way to implement those functions. Can someone please provide a bit of background about the new AccessPoint class hierarchy? Configuration examples maybe? Thanks in advance, -- Nicolas Giraud --------------------------------------------------------------------------------------------- Développeur Archives du Web - Bibliothèque Nationale de France Web Archiving Developper - National Library of France --------------------------------------------------------------------------------------------- |
From: Kris C. N. <kca...@ar...> - 2012-11-06 18:07:35
|
Hi all, We have actually moved the wayback development tree from SourceForge to GitHub. Sorry that this hasn't yet been updated in the source-forge tree, that code is no longer in use and is obsolete. The latest wayback trunk is available at: https://github.com/internetarchive/wayback http://builds.archive.org:8080/maven2/ is also currently accessible. Nicolas, did you get a permissions error, a 404 or other error when you tried to access our Maven directory? We are trying to determine what might have caused it to be unavailable to you earlier today. Best, Kris On Nov 6, 2012, at 4:32 AM, Nicolas Giraud wrote: > Hi, > > I am currently unable to connect to Internet Archive's Maven repo, > http://builds.archive.org:8080/maven2. Which is bad since I need to > build the latest Wayback sources. > > Anyone has the same issue? > > Cheers, > > > > -- > Nicolas Giraud > --------------------------------------------------------------------------------------------- > Développeur Archives du Web - Bibliothèque Nationale de France > Web Archiving Developper - National Library of France > --------------------------------------------------------------------------------------------- > > ------------------------------------------------------------------------------ > LogMeIn Central: Instant, anywhere, Remote PC access and management. > Stay in control, update software, and manage PCs from one command center > Diagnose problems and improve visibility into emerging IT issues > Automate, monitor and manage. Do more in less time with Central > http://p.sf.net/sfu/logmein12331_d2d > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |