From: Mat K. <mk...@cs...> - 2011-10-04 23:28:42
|
Hello, I have successfully installed an instance of Wayback and am able to successfully add the file from http://www.archive.org/download/ExampleArcAndWarcFiles/IAH-20080430204825-00000-blackbook.warc.gz to my WARCs folder, see the listing appear and access the content of the archive. I am investigating how to create WARCs from scratch (e.g. without using Heretrix), so wanted to modify this WARC file and see the changed reflected in my local Wayback instance after allowing some time for re-indexing. I have decompressed this WARC file: gzip -d http://www.archive.org/download/ExampleArcAndWarcFiles/IAH-20080430204825-00000-blackbook.warc.gz Truncated and/or made a subtle change to the file: truncate -s 1500 IAH-20080430204825-00000-blackbook.warc Re-gzipped the file: gzip 1500 IAH-20080430204825-00000-blackbook.warc again producing IAH-20080430204825-00000-blackbook.warc.gz in my Warc directory but even after restarting Tomcat and the even the server, the new file's contents never become accessible. When I click on the date link supposedly associated with the modified WARC (I am guessing it is this WARC and not a stale link from the old one), I am simply told: Resource Not Available. My ultimate goal is to create a WARC file using a collection of webpages and images that I have manually archived. Is there something wrong with my procedure above that would prevent the truncated data from showing up in Wayback? Does a resource exist that would allow me to accomplish my ultimate goal of manually creating a WARC file from a very small collection of data currently represented as files on a file system? Any advice or direction provided would be very helpful. Thank you, Mat |