From: Bradley T. <br...@ar...> - 2011-10-05 01:28:09
|
Hi Mat, Another solution to side-step the compression complexities while you work on the WARC format issues, would be using uncompressed WARC files - just skip the compress step altogether (be sure to remove the ".gz" suffix) Wayback should handle those fine - note you do still need to create WARC records to encapsulate the archived content, but this may lower the bar to some iterative testing. A couple questions to help steer you in the right direction: 1) do you have HTTP response headers for your archived content? 2) do you know Java? Brad On 10/4/11 5:09 PM, Erik Hetzner wrote: > At Tue, 4 Oct 2011 20:02:01 -0400, > Mat Kelly wrote: >> Erik, >> Thank you for the reply. Please do send your script, as it might be >> helpful. From the procedure above, I was hoping to create a base case >> WARC and if I am not doing so properly, is there a bare bones template >> to create a WARC file? Once I am familiar enough with the >> procedure/structure, I plan to write a script to do the work but >> wanted first to understand how I go about constructing a WARC. Please >> supply any insight you can, as I am just learning about this system. > Hi Mat, > > Attached. > > As far as I know there is no template to create a WARC file. > > You might want to have a look at the warc-tools project [1] or the > it.unimi tools as well as the heritrix commons tools [3]. > > best, Erik > > 1. http://code.hanzoarchives.com/ > 2. http://law.dsi.unimi.it/software/docs/it/unimi/dsi/law/warc/io/package-summary.html > 3. http://builds.archive.org:8080/maven2/org/archive/heritrix/heritrix-commons/ > > > > Sent from my free software system<http://fsf.org/>. > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |