From: Drazenko C. <dra...@sr...> - 2012-12-28 20:50:41
|
Hi, which java and wayback versions do you use? I had the same problem when I used old version of Heritrix (1.14.4) and java newer than 6u22. Here was the reason: https://webarchive.jira.com/browse/HER-1865 Regards, Drazenko On 28.12.2012. 16:04, Henrik Ranthin wrote: > Hi! > > Nope, it is a warc file. I attached a sample warc file. Maybe you can have a quick look at it and see if there is something strange? > > Regards, Henrik > > -----Original Message----- > From: Erik Hetzner [mailto:eri...@uc...] > Sent: den 21 december 2012 22:43 > To: Henrik Ranthin > Cc: arc...@li... > Subject: Re: [Archive-access-discuss] Read compressed warc.gz files with Wayback > > Hi Henrik, > > At Fri, 21 Dec 2012 10:52:24 +0000, > Henrik Ranthin wrote: >> >> Thanks for the quick reply! >> The warc files I’ve used are created (and compressed) by the Heritrix web crawler (version 3.1.1). >> >> I thought the output from Heritrix should be compatible with Wayback. Maybe I’m missing some setting? > > Yes, they should be. Sorry for the distraction, but badly gzipped WARC files are often the problem. >> >> I’ve also tried to compress the file using the scripts from the warc-tools project: >> warc2warc.py –Z my_archive.warc> my_archive.warc.gz However, I still >> get the same result. >> >> From the log it seems like Wayback is treating the name of the compressed warc file as an URL: >> >> Dec 21, 2012 10:57:27 AM >> org.archive.wayback.resourceindex.bdb.SearchResultToBDBRecordAdapter >> adapt >> WARNING: FAILED >> canonicalize(http://filedesc:WEB-20121128091040702-00000-26202~192.168 >> .24.4~8443.warc.gz:WEB-20121128091040702-00000-26202~192.168.24.4~8443 >> .warc.gz) > > I’m just guessing here, but the WARC files I see don’t start with filedesc://... records; only the ARC files. Is this an ARC file that was named with .warc.gz rather than .arc.gz? > > best, Erik > > > > ------------------------------------------------------------------------------ > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > much more. Get web development skills now with LearnDevNow - > 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. > SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122812 > > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |