From: Noah L. <nl...@ar...> - 2013-03-01 03:48:38
|
Hello Nicholas, We've had some success with the pages replaying in wayback archival mode, i.e. http://example.org:8080/wayback/https://facebook.com/whatever Presumably you're referring to the fact that wayback doesn't support https yet in proxy mode. We're planning to add that within the next couple of months. Unfortunately twitter and especially facebook continue to change the way their stuff works, and more variations present themselves. The settings on that wiki page may not work exactly right anymore, and I'm sure they won't handle every case. We've also had to add some custom canonicalization rules for playback to work in some cases. :-\ Some kind of crawling with real javascript support looks like it's the only feasible way for the future. Noah On Thu, 21 Feb 2013 14:13:35 +0000 Nicholas Clarke <ni...@kb...> wrote: > Hello people > > We have been experimenting with H3 settings based on the following article. > > https://webarchive.jira.com/wiki/display/Heritrix/Facebook+and+Twitter+Scroll-down > > But now our problem is how to access https content using wayback. > > Is there an established way of doing this? > > Best > Nicholas > > ------------------------------------------------------------------------------ > Nicholas Clarke, Software Developer > Department of Digital Preservation, Royal Library, Copenhagen, Denmark > tlf: (+45) 33 47 48 38 > email: ni...@kb...<mailto:sv...@kb...> > ------------------------------------------------------------------------------ > Building complex programs one state machine at a time. > -- |