From: Aaron B. <aa...@ar...> - 2011-12-06 19:15:36
|
Armin Schleicher <Arm...@ui...> writes: > Thanks for your reply! I would like to get a list of the urls in my > local wayback deployment. The Wayback Machine install package comes with a command-line tool for generating a CDX file for an ARC or WARC file, e.g. ${wayback-install}/bin/cdx-indexer You can run it on your (w)arc files, one at a time, like this $ cdx-indexer foo.arc.gz foo.cdx which reads foo.arc.gz and puts the index into foo.cdx. By default, the first column of the resulting foo.cdx file is the URL of the record. There is one line in the CDX per record in the (w)arc. Hope that helps, Aaron |