From: Noah L. <nl...@ar...> - 2013-02-06 02:51:18
|
Hello Søren, I committed a fix to ARCReaderFactory in Heritrix for the issue you raised. See https://webarchive.jira.com/browse/HER-2032 Not sure how long that will take to appear in a wayback build. Noah On 02/05/2013 05:53 AM, Søren Vejrup Carlsen wrote: > > Hi all. > > I have found the problem. It was in the wayback-core module in the > class > org.archive.wayback.resourcestore.resourcefile.ResourceFactory.getResource(File > > file, long offset) > > The method-call "ARCReaderFactory.get(path.getName(), is, false);" > > assumes, that the file is a gzipped ARC-file, even though the > getResource method should work for both compressed > > and uncompressed arc-files? > > The solution is to replace this call with ARCReaderFactory.get(file, > offset). > > This makes the method work for both compressed and uncompressed arc-files. > > /Søren V. Carlsen (Royal Library, Copenhagen) > > *Fra:*Søren Vejrup Carlsen [mailto:sv...@kb...] > *Sendt:* 1. februar 2013 12:32 > *Til:* arc...@li... > *Emne:* [Archive-access-discuss] Workaround for > locationDBResourceStore bug in 1.7.1-SNAPSHOT > > Hi all. > > I have installed wayback 1.7.1-SNAPSHOT, built myself directly from > the pom.xml after downloading the code from > https://github.com/internetarchive/wayback > > I'm using the locationDBResourceStore that the CDXCollection.xml uses, > and it can find the correct files from the CDX. > > However, it fails to extract the record, as it somehow assumes that > all files are GZIPPED, and when it is now, it fails miserably with the > following log-entries: > > Jan 31, 2013 6:49:18 PM > org.archive.wayback.resourcestore.resourcefile.ResourceFactory getResource > INFO: Fetching: /home/prod/wayback/arcs/83807-92-0000-1.arc : 39136770 > Jan 31, 2013 6:49:18 PM > org.archive.wayback.resourcestore.resourcefile.ResourceFactory getResource > WARNING: ResourceNotAvailable for > /home/prod/wayback/arcs/83807-92-0000-1.arc Not in GZIP format > Jan 31, 2013 6:49:18 PM > org.archive.wayback.resourcestore.LocationDBResourceStore retrieveResource > INFO: Unable to retrieve /home/prod/wayback/arcs/83807-92-0000-1.arc - > java.util.zip.ZipException: Not in GZIP format > Jan 31, 2013 6:49:18 PM org.archive.wayback.webapp.AccessPoint > handleReplay > WARNING: (1)LOADFAIL: /home/prod/wayback/arcs/83807-92-0000-1.arc - > java.util.zip.ZipException: Not in GZIP format > /20100107153228/http://www2.kb.dk/elib/mss/skatte/aeldre_danske/ln185.htm > > Can anyone help me here? > > /Søren > > --------------------------------------------------------------------------- > > Søren Vejrup Carlsen, Department of Digital Preservation, Royal > Library, Copenhagen, Denmark > > tlf: (+45) 33 47 48 41 > > email: sv...@kb... <mailto:sv...@kb...> > > ---------------------------------------------------------------------------- > > Non omnia possumus omnes > > --- Macrobius, Saturnalia, VI, 1, 35 ------- > > > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |