From: Mat K. <mk...@cs...> - 2011-10-05 00:02:08
|
Erik, Thank you for the reply. Please do send your script, as it might be helpful. From the procedure above, I was hoping to create a base case WARC and if I am not doing so properly, is there a bare bones template to create a WARC file? Once I am familiar enough with the procedure/structure, I plan to write a script to do the work but wanted first to understand how I go about constructing a WARC. Please supply any insight you can, as I am just learning about this system. -Mat On Tue, Oct 4, 2011 at 7:47 PM, Erik Hetzner <eri...@uc...> wrote: > At Tue, 4 Oct 2011 19:16:28 -0400, > Mat Kelly wrote: >> >> Hello, >> I have successfully installed an instance of Wayback and am able to >> successfully add the file from >> http://www.archive.org/download/ExampleArcAndWarcFiles/IAH-20080430204825-00000-blackbook.warc.gz >> to my WARCs folder, see the listing appear and access the content of >> the archive. I am investigating how to create WARCs from scratch (e.g. >> without using Heretrix), so wanted to modify this WARC file and see >> the changed reflected in my local Wayback instance after allowing some >> time for re-indexing. >> >> I have decompressed this WARC file: >> gzip -d http://www.archive.org/download/ExampleArcAndWarcFiles/IAH-20080430204825-00000-blackbook.warc.gz >> >> Truncated and/or made a subtle change to the file: >> truncate -s 1500 IAH-20080430204825-00000-blackbook.warc >> >> Re-gzipped the file: >> gzip 1500 IAH-20080430204825-00000-blackbook.warc > > This won’t work. You need to compress each warc record & concatenate > the result. See [1]. Unfortunately this will probably be some effort. > I have a perl script which can compress ARC files, but not WARC files > which I can send to you. > > best, Erik > > 1. http://crawler.archive.org/articles/developer_manual/arcs.html > > best, Erik > > Sent from my free software system <http://fsf.org/>. > > |