|
From: Miguel C. <mig...@fc...> - 2008-01-29 16:06:13
|
Hello, I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback with an URL I get a: org.archive.io.NoGzipMagicException org.archive.io.GzipHeader.readHeader(GzipHeader.java:122) org.archive.io.GzipHeader.<init>(GzipHeader.java:107) org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:335) org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.java:370 ) org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory .java:383) org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory .java:326) org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResource(Loc alARCResourceStore.java:108) org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:312) org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java:280) org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106) org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:90) The wayback find de file and then check if it is OK. This check thows a NoGzipMagicException because it doesn't find a "magic" number. The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) for both projects - nutchwax and wayback. I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from trunk) and indexed the same ARC files. The query's results are presented ok. Other files present the same symptoms. Does anyone have a clue of this problem? Does anyone use this version of wayback without problems? Thanks -- Miguel Costa |
|
From: Brad T. <br...@ar...> - 2008-01-30 04:12:31
|
Hi Miguel, The SVN code was not quite coherent: the wayback.xml configuration file, specifically, was not referencing the new implementation classes for the ResourceStore. I'm hoping that this is the issue, but see below for some more notes if you're still having problems. The major changes you'll need to make in the wayback.xml are in the ResourceStore and Replay configurations. I'm not convinced this will solve the problem though, since you were able to index the documents OK.. With what version of the wayback code did you first index them? One last question is how the ARCs were compressed. Were they written compressed by Heritrix, or compressed later? If the new wayback.xml (using different implementations) does not fix the problem, one thing that may help me figure out what's going wrong would be a fragment of one of your ARC files. Can you post part of one of your ARC files somewhere, for example, just the first few 100KB? (head -c 200000 foo.arc.gz > sample.arc.gz -- understanding that the last record in the ARC fragment will probably be truncated.) Brad Miguel Costa wrote: > Hello, > > I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback with > an URL I get a: > > org.archive.io.NoGzipMagicException > org.archive.io.GzipHeader.readHeader(GzipHeader.java:122) > org.archive.io.GzipHeader.<init>(GzipHeader.java:107) > org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:335) > > org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.java:370 > ) > > org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory > .java:383) > > org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory > .java:326) > > org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResource(Loc > alARCResourceStore.java:108) > org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:312) > org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java:280) > org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106) > org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:90) > > The wayback find de file and then check if it is OK. This check thows a > NoGzipMagicException because it doesn't find a "magic" number. > The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) for > both projects - nutchwax and wayback. > > I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from trunk) > and indexed the same ARC files. The query's results are presented ok. > Other files present the same symptoms. > Does anyone have a clue of this problem? Does anyone use this version of > wayback without problems? > > > Thanks > -- > > Miguel Costa > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Brad T. <br...@ar...> - 2008-02-01 20:13:43
|
Hey Miguel, I think I just found the problem: I hadn't checked in a small but crucial change to the wayback-code pom.xml which increases the dependency on archive-commons from 2.0.0 to 2.0.1.. I'm betting this makes all the difference. Please try updating to the latest HEAD and let me know if that works for you. Brad Miguel Costa wrote: > Hello, > > I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback with > an URL I get a: > > org.archive.io.NoGzipMagicException > org.archive.io.GzipHeader.readHeader(GzipHeader.java:122) > org.archive.io.GzipHeader.<init>(GzipHeader.java:107) > org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:335) > > org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.java:370 > ) > > org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory > .java:383) > > org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderFactory > .java:326) > > org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResource(Loc > alARCResourceStore.java:108) > org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:312) > org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java:280) > org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106) > org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:90) > > The wayback find de file and then check if it is OK. This check thows a > NoGzipMagicException because it doesn't find a "magic" number. > The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) for > both projects - nutchwax and wayback. > > I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from trunk) > and indexed the same ARC files. The query's results are presented ok. > Other files present the same symptoms. > Does anyone have a clue of this problem? Does anyone use this version of > wayback without problems? > > > Thanks > -- > > Miguel Costa > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Miguel C. <mig...@fc...> - 2008-02-04 10:36:14
|
Thank you Brad. This FIX solved the problem. Best regards, Miguel Costa -----Original Message----- From: Brad Tofel [mailto:br...@ar...] Sent: sexta-feira, 1 de Fevereiro de 2008 20:16 To: Miguel Costa Cc: arc...@li... Subject: Re: [Archive-access-discuss] org.archive.io.NoGzipMagicException Hey Miguel, I think I just found the problem: I hadn't checked in a small but crucial change to the wayback-code pom.xml which increases the dependency on archive-commons from 2.0.0 to 2.0.1.. I'm betting this makes all the difference. Please try updating to the latest HEAD and let me know if that works for you. Brad Miguel Costa wrote: > Hello, > > I installed wayback 1.1.0-SNAPSHOT from svn. When I query the wayback > with an URL I get a: > > org.archive.io.NoGzipMagicException > org.archive.io.GzipHeader.readHeader(GzipHeader.java:122) > org.archive.io.GzipHeader.<init>(GzipHeader.java:107) > > org.archive.io.GzippedInputStream.readHeader(GzippedInputStream.java:3 > 35) > > org.archive.io.GzippedInputStream.gzipMemberSeek(GzippedInputStream.ja > va:370 > ) > > org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderF > actory > .java:383) > > org.archive.io.arc.ARCReaderFactory$CompressedARCReader.get(ARCReaderF > actory > .java:326) > > org.archive.wayback.resourcestore.LocalARCResourceStore.retrieveResour > ce(Loc > alARCResourceStore.java:108) > > org.archive.wayback.webapp.AccessPoint.handleReplay(AccessPoint.java:3 > 12) > > org.archive.wayback.webapp.AccessPoint.handleRequest(AccessPoint.java: > 280) > > org.archive.wayback.webapp.RequestFilter.handle(RequestFilter.java:106 > ) > > org.archive.wayback.webapp.RequestFilter.doFilter(RequestFilter.java:9 > 0) > > The wayback find de file and then check if it is OK. This check thows > a NoGzipMagicException because it doesn't find a "magic" number. > The code used is in commons-2.0.0-SNAPSHOT-sources.jar (from Heritrix) > for both projects - nutchwax and wayback. > > I also installed nutchax 0.11.0-SNAPSHOT from svn (both projects from > trunk) and indexed the same ARC files. The query's results are presented ok. > Other files present the same symptoms. > Does anyone have a clue of this problem? Does anyone use this version > of wayback without problems? > > > Thanks > -- > > Miguel Costa > > > > > ---------------------------------------------------------------------- > -- > > ---------------------------------------------------------------------- > --- This SF.net email is sponsored by: Microsoft Defy all challenges. > Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ---------------------------------------------------------------------- > -- > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |