Good question. This was a bug/missing feature in the software, but I've
just tested a checkin to HEAD, which is the 1.2.0 release candidate that
addresses this issue. We're still not handling non-http protocols
correctly, but this will wait till 1.4.0, which will have a new index
format that will allow better searches, and should expose additional
search options via the UI, allowing end users to relax canonicalization
if they are not finding the documents they want.
So, as of now, the following tar.gz is the release candidate, and should
fix this issue, as well as numerous other bugs.
http://builds.archive.org:8080/maven2/org/archive/wayback/wayback/1.1.0-SNAPSHOT/wayback-1.1.0-20080204.230115-24-1.1.0-SNAPSHOT.tar.gz
Let me know if this works for you, and if you find any other problems
with this version.
Brad
Chris Vicary wrote:
> Hi all,
>
> I am having a problem retrieving harvested resources whose urls include port
> numbers using Wayback 1.0.1. We have a seed that includes a port number that
> was harvested using heritrix. The resulting arc files were indexed using
> wayback, and the urls stored in the index include the port number. Using the
> wayback web address search interface, I am able to find the urls by
> including the port number in the search string (if the port number is not
> included, no results are found - which is expected). The link for the search
> result does not include the port number, however, and clicking it does not
> retrieve the harvested resource. If the port number is inserted into the
> search result link, retrieval works fine. Even so, rewritten links on the
> retrieved page do not include a port number where applicable. So my question
> is, how do I ensure that port numbers are preserved in wayback search
> results and in rewritten links?
>
> Thanks,
>
> Chris
>
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Archive-access-discuss mailing list
> Arc...@li...
> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss
>
|