From: Drazenko C. <dra...@sr...> - 2013-01-03 11:08:09
|
Hi, bug in java was resolved on 2011-03-08 and wayback 1.6.0 is older. You should probably use newer wayback (http://builds.archive.org:8080/maven2/org/archive/wayback/dist/) and java version greater than 6u22. Regards, Drazenko On 2.1.2013. 13:24, Henrik Ranthin wrote: > Hi! > > I switched to an old java version (6u22) and now it seems to work! > Previously I used java 1.7.0_03. > > For wayback I use version 1.6.0. > > I saw that the JIRA issue HER-1865 was marked as fixed. Maybe the fix is just not included in the wayback version I'm using? > > > Thanks for all the help! > > Regards, Henrik > > -----Original Message----- > From: Drazenko Celjak [mailto:dra...@sr...] > Sent: den 28 december 2012 21:32 > To: Henrik Ranthin > Cc: arc...@li... > Subject: Re: [Archive-access-discuss] Read compressed warc.gz files with Wayback > > Hi, > > which java and wayback versions do you use? > > I had the same problem when I used old version of Heritrix (1.14.4) and java newer than 6u22. Here was the reason: > https://webarchive.jira.com/browse/HER-1865 > > Regards, > Drazenko > > > On 28.12.2012. 16:04, Henrik Ranthin wrote: >> Hi! >> >> Nope, it is a warc file. I attached a sample warc file. Maybe you can have a quick look at it and see if there is something strange? >> >> Regards, Henrik >> >> -----Original Message----- >> From: Erik Hetzner [mailto:eri...@uc...] >> Sent: den 21 december 2012 22:43 >> To: Henrik Ranthin >> Cc: arc...@li... >> Subject: Re: [Archive-access-discuss] Read compressed warc.gz files >> with Wayback >> >> Hi Henrik, >> >> At Fri, 21 Dec 2012 10:52:24 +0000, >> Henrik Ranthin wrote: >>> >>> Thanks for the quick reply! >>> The warc files I’ve used are created (and compressed) by the Heritrix web crawler (version 3.1.1). >>> >>> I thought the output from Heritrix should be compatible with Wayback. Maybe I’m missing some setting? >> >> Yes, they should be. Sorry for the distraction, but badly gzipped WARC files are often the problem. >>> >>> I’ve also tried to compress the file using the scripts from the warc-tools project: >>> warc2warc.py –Z my_archive.warc> my_archive.warc.gz However, I still >>> get the same result. >>> >>> From the log it seems like Wayback is treating the name of the compressed warc file as an URL: >>> >>> Dec 21, 2012 10:57:27 AM >>> org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter >>> adapt >>> WARNING: FAILED >>> canonicalize(http://filedesc:WEB-20121128091040702-00000-26202~192.16 >>> 8 >>> .24.4~8443.warc.gz:WEB-20121128091040702-00000-26202~192.168.24.4~844 >>> 3 >>> .warc.gz) >> >> I’m just guessing here, but the WARC files I see don’t start with filedesc://... records; only the ARC files. Is this an ARC file that was named with .warc.gz rather than .arc.gz? >> >> best, Erik >> >> >> >> ---------------------------------------------------------------------- >> -------- Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API >> and much more. Get web development skills now with LearnDevNow - >> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. >> SALE $99.99 this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122812 >> >> >> >> _______________________________________________ >> Archive-access-discuss mailing list >> Arc...@li... >> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |