From: Kristinn S. <kri...@la...> - 2013-08-09 10:12:55
|
See my replies below... > It is possible to disable the interstitial by removing the bean that > handles that in ArchivalUrlReplay.xml which probably looks like this: > I do not want to remove the interstitial. It is useful information in most cases and hiding it causes all sorts of weird. I just want to eliminate it in these kinds of pointless instances. > >> Also, as a side note, the XML search results used to contain >> the destination for redirect captures, but no longer. This limits my >> options of dealing with this in the frontend search results. > > > Hmm.. no changes have been made to the xml search results in a long > time. (I have been working on an alternative tool for viewing search > results: > https://github.com/internetarchive/wayback/tree/master/wayback-cdx- > server) > > For some old cdxs, we've discovered the cdx field was often > improperly encoded (or encoding was ambiguous) and simply started > writing a "-" for new cdxs. > > It is not directly useful to wayback replay except to determine if a > url is a self-redirect. > > That said, it should still appear in the search results if it is in > the cdx. Yes, the issue is that it is not (anymore) included in the CDX (we were actually not using CDXs in our old installation). This info may not be directly useful to wayback replay, but it can be useful for rendering a results page. I think this merits a closer look (although I'm not exactly eager to rebuild all our CDX files). At minimum, it would be useful if a self-redirect was annotated so we could suppress it in the results page. I do wonder if it is even useful at all to have self-redirects in the CDX at all? We treat www.example.com and example.com as the same URL anyway. Would there be any harm in eliminating all redirects like this? > 2. When navigating search results using the "last" arrow on the > injected Toolbar, if the previous capture was such a redirect, you > will not get anywhere as the redirect is resolved by sending you back > to the same capture you are on. > > I found an instance of this in Internet Archive's Wayback: > http://web.archive.org/web/20130602062836/http://timarit.is/ > > Try clicking the back arrow. You are not going anywhere. It > doesn't even give you the URL redirect notice. > > > Thanks for pointing this example! This happens due to a slightly > complicated combination of different things. > > The short answer is that this should be fixable by checking the > referrer, or better yet, I'd like to propose having a dedicated "prev > from X" or "next from X" timestamp modifier. > such as: > "20130602062836-" - redirect to prev available capture before > 20130602062836 or back to 20130602062836 if it is the first. > "20130602062836+" - redirect to next available capture after > 20130602062836 or back to 20130602062836 if it is the last. > > The long answer is [snipped] > > By adding an explicit timestamp modifier of + or -, one could request > /web/16-/ and guarantee to get a previous capture to 16, if at all > possible. > > Such a modifier (or a different one) for specifying prev/next capture > could be a useful for other reasons as well. > > Any thoughts on this idea? I like it. Would definitely address the replay part of this issue. You would, though, need to address how to handle cases where you specify a TIME- and there are no older instances but there are new ones. Do you then ignore the - or do you return 'nothing found'. - Kris ------------------------------------------------------------------------- Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík Sími/Tel: +354 5255600 | www.landsbokasafn.is ------------------------------------------------------------------------- fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is |