|
From: Brad T. <br...@ar...> - 2008-01-30 04:12:31
|
Hi Miguel, The SVN code was not quite coherent: the wayback.xml configuration file, specifically, was not referencing the new implementation classes for the ResourceStore. I'm hoping that this is the issue, but see below for some more notes if you're still having problems. The major changes you'll need to make in the wayback.xml are in the ResourceStore and Replay configurations. I'm not convinced this will solve the problem though, since you were able to index the documents OK.. With what version of the wayback code did you first index them? One last question is how the ARCs were compressed. Were they written compressed by Heritrix, or compressed later? If the new wayback.xml (using different implementations) does not fix the problem, one thing that may help me figure out what's going wrong would be a fragment of one of your ARC files. Can you post part of one of your ARC files somewhere, for example, just the first few 100KB? (head -c 200000 foo.arc.gz > sample.arc.gz -- understanding that the last record in the ARC fragment will probably be truncated.) Brad Miguel Costa wrote: > Hello, > > I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback with > an URL I get a: > > org.archive.io.NoGzipMagicException > org.archive.io.GzipHeader.readHeader(GzipHeader.java:122) > org.archive.io.GzipHeader.<init>(GzipHeader.java:107) > org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:335) > > org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.java:370 > ) > > org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory > .java:383) > > org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory > .java:326) > > org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResource(Loc > alARCResourceStore.java:108) > org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:312) > org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java:280) > org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106) > org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:90) > > The wayback find de file and then check if it is OK. This check thows a > NoGzipMagicException because it doesn't find a "magic" number. > The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) for > both projects - nutchwax and wayback. > > I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from trunk) > and indexed the same ARC files. The query's results are presented ok. > Other files present the same symptoms. > Does anyone have a clue of this problem? Does anyone use this version of > wayback without problems? > > > Thanks > -- > > Miguel Costa > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |