From: Bradley T. <br...@ar...> - 2011-12-09 01:14:39
|
Hi Armin, One other possibility, assuming you're using the automatic indexing systems in Wayback (the BDBIndex) is to look in your wayback directory under ".../index-data/merged/" where Wayback keeps a copy of the same CDX files that the "cdx-indexer" tool will create. Column 1 is the "canonicalized" (normalized) URL, and column 3 is the original URL. Brad On 12/6/11 11:15 AM, Aaron Binns wrote: > Armin Schleicher<Arm...@ui...> writes: > >> Thanks for your reply! I would like to get a list of the urls in my >> local wayback deployment. > The Wayback Machine install package comes with a command-line tool for > generating a CDX file for an ARC or WARC file, e.g. > > ${wayback-install}/bin/cdx-indexer > > You can run it on your (w)arc files, one at a time, like this > > $ cdx-indexer foo.arc.gz foo.cdx > > which reads foo.arc.gz and puts the index into foo.cdx. > > By default, the first column of the resulting foo.cdx file is the URL of > the record. There is one line in the CDX per record in the (w)arc. > > > Hope that helps, > > Aaron > > > ------------------------------------------------------------------------------ > Cloud Services Checklist: Pricing and Packaging Optimization > This white paper is intended to serve as a reference, checklist and point of > discussion for anyone considering optimizing the pricing and packaging model > of a cloud services business. Read Now! > http://www.accelacomm.com/jaw/sfnl/114/51491232/ > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |