From: Erik H. <eri...@uc...> - 2011-10-04 23:47:48
|
At Tue, 4 Oct 2011 19:16:28 -0400, Mat Kelly wrote: > > Hello, > I have successfully installed an instance of Wayback and am able to > successfully add the file from > http://www.archive.org/download/ExampleArcAndWarcFiles/IAH-20080430204825-00000-blackbook.warc.gz > to my WARCs folder, see the listing appear and access the content of > the archive. I am investigating how to create WARCs from scratch (e.g. > without using Heretrix), so wanted to modify this WARC file and see > the changed reflected in my local Wayback instance after allowing some > time for re-indexing. > > I have decompressed this WARC file: > gzip -d http://www.archive.org/download/ExampleArcAndWarcFiles/IAH-20080430204825-00000-blackbook.warc.gz > > Truncated and/or made a subtle change to the file: > truncate -s 1500 IAH-20080430204825-00000-blackbook.warc > > Re-gzipped the file: > gzip 1500 IAH-20080430204825-00000-blackbook.warc This won’t work. You need to compress each warc record & concatenate the result. See [1]. Unfortunately this will probably be some effort. I have a perl script which can compress ARC files, but not WARC files which I can send to you. best, Erik 1. http://crawler.archive.org/articles/developer_manual/arcs.html best, Erik |