RE: [Planetlab-arch] Proposed Changes for Dynamic Slice API
Brought to you by:
alklinga
From: Bowman, M. <mic...@in...> - 2004-01-27 23:01:18
|
Just a comment on relative scalability... this discussion is very similar to one we had on ganglia about a year ago. Ganglia was transferring huge amounts of data (consistently one of the top five users of bw on planetlab) to PLC. To get around the problem, we started compressing the data that was sent from gmond to gmetad. It helped a lot.... initially. At 100 nodes, the reduction in bw was sufficient to keep ganglia running. However, by the time we got to 350 nodes, the amount of data, the amount of time required to process the data, the number of connections, and any number of other scalability concerns started to bite us hard. Point is that simply compressing the data helps us get through some of the bw concerns for now. But as the number of nodes & number of users increase, it will quickly fail if it is the only way we address scalability. The size of the file is the product of the increase in # of users and # of nodes which is multiplied again by the number of nodes that have to retrieve it.=20 BTW... on thought experiments... if we assume a planetlab with 10,000 nodes and 1000 slices, that's still a very small database. Simple implementations like an index array of <nodeid,sliceid> tuples could easily fit in about $50 worth of memory and response time for searching would be measured in milliseconds. Sorry if I'm rambling... --Mic=20 -----Original Message----- From: pla...@li... [mailto:pla...@li...] On Behalf Of Steve Muir Sent: Tuesday, January 27, 2004 02:03 PM To: Timothy Roscoe Cc: pla...@li... Subject: RE: [Planetlab-arch] Proposed Changes for Dynamic Slice API On Tue, 27 Jan 2004, Timothy Roscoe wrote: > > Quick response: your answer doesn't address scalability at all (which=20 > is about how the system grows), though it does point out that the=20 > current solution is adequate at the moment. my answer addressed two aspects of scalability: 1) relative scalability - i claim that providing a (compressed) static file is in general more scalable than supporting a wide range of database queries, particularly if we can use a CDN to distribute that file. even though the database is optimised for supporting a range of queries there is a fairly significant overhead in redirecting every XML-RPC request from the HTTP(S) server to the PHP (or whatever) entity that performs the db query. 2) increase in number of users - for a fixed number of slices and nodes, a static file can be served repeatedly to a large number of users without increasing the load on our server by using some CDN. > Thought experiment: suppose we increase the number of slices by a=20 > factor of 10, the number nodes by a factor of 10, and the number of > users by a factor of 100. Suppose that the file is downloaded > several times an hour by a significant fraction of those users > (because their automated status tools do it). What is the dollar > cost in bandwidth to Princeton or the Consortium for this traffic? i don't know how Princeton or the Consortium gets billed for traffic to/from, say, Codeen proxies, so i can't answer that question. > Another question: does XMLRPC have a gzippable transport, or are we=20 > defining a new RPC protocol by compressing things? i wasn't thinking of this as RPC, just file download. if you're asking a more general question then i defer to david anderson's response. steve ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Planetlab-arch mailing list Pla...@li... https://lists.sourceforge.net/lists/listinfo/planetlab-arch |