From: Kristinn S. <kri...@la...> - 2013-07-30 15:49:53
|
We've been experimenting with crawling using RSS feeds. This has generally gone well, but has led to a concern over how well Wayback can handle a URL that has a LOT of snapshots. In our RSS experiment we've seen that the front pages (which are crawled each time an item is added to the feed) are crawled as often as 2000 times a month (and, yes, those are all unique captures!). Wayback has a default "maxRecords" of 10 thousand, a value we'll hit in just a few months crawling. Interestingly, while I can lower that value in the wayback.xml config file, raising it causes all searches to return a "Bad Query Exception", the 10.000 limit seems pretty hard wired in. Has anyone looked into how Wayback handles scaling along this axis? - Kris ------------------------------------------------------------------------- Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík Sími/Tel: +354 5255600 | www.landsbokasafn.is ------------------------------------------------------------------------- fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is |