From: Bradley T. <br...@ar...> - 2011-08-29 06:47:36
|
Hi Mohamed, I think you're talking about the UDP broadcast location service? For those not familiar with IA internal systems, the www.archive.org website locates content on the cluster by sending a UDP broadcast packet to all hosts on the network. A special UDP listening server is running on each host, and is aware of what content is local. When the UDP server receives a broadcast UDP packet for content that is local, it sends a packet back to the originating server:port, indicating "that content is here!". The classic Wayback used this service to locate ARC content, up until about 4 years ago. It was discontinued in favor of a static, lookup file, that mapped W/ARC filenames to one or more URLs. The reason for the change was the inherent unreliability of UDP. We would see constant low-level failures in the location service, which often provoked IA admins and end users to "just try refreshing a few times". The failure levels escalated sharply when internal network usage was near peak, and also increased steadily as our data centers became more separated. So, the current Wayback has two implementations: 1) local static lookup file (path index) running with org.archive.wayback.resourcestore.LocationDBResourceStore 2) remote HTTP 1.1 directory. This is likely in fact one of: 2a) a normal HTTP server fronting a single directory of W/ARC files 2b) a custom HTTP server fronting some more complex storage network, with site specific logic to make all W/ARC files appear to be in the top-level directory 2c) an org.archive.wayback.resourcestore.locationdb.FileProxyServlet instance, backed by either a static path index (flat file) or a BDB. All of our production Wayback installations at IA use option #1 - it's fast, and simple, and rebuilding a path-index, even with 40M entries only takes a few minutes. In the mid to long term, we are exploring option 2b. So, my short answer would be to advise you also to go with option #1, as W/ARC files don't move around that much, it will definitely meet your scale needs, and is the most robust choice. Brad On 8/28/11 4:40 PM, Mohamed Elsayed wrote: > I now have the new Wayback working on a single host. I am currently > trying to set up something like the "Item Location Server" that used to > exist in the old system. I guess this should also be possible with the > new Wayback. Can you provide any pointers for getting started on this?Is > it a fileproxy? > > Thanks in advance. > |