Couple of mechanisms, depending on what Replay UI, and what ResourceIndex
you're using.
If you're using Archival URL replay mode, then you can use a wildcard '*'
for the datespec, and a trailing '*' to list documents in the index
prefixed with the given url.
Example:
http://wayback.yourhost.org:8080/wayback/*/example.com/*
Also, you can alter the the query URL 'type' argument from 'urlquery' to
'urlprefixquery'. This is clearly a problem with the current search
form, and will hopefully be addressed in the next release.
If you want to do further processing on URLs in the index, there are two
tools packaged with the wayback called bdb-client and bin-search. They
are command line tools for dumping URLs with a given prefix from either
a BDB index or from a sorted CDX index. Hopefully the online
documentation for these tools will be enough to get you started with
them, but let me know if it falls short.
Brad
Ignacio Garcia wrote:
> is there any way to list all contents in my archived files using wayback?
>
> I have three different crawls, but one of them is not complete, so I
> don't
> know exactly what files I actually have archived and I would like to know
> what files is wayback serving.
>
> I don't know if this is possible, since the url field doesn't seem to be
> taking wildcards.
>
> Thank you.
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>
|