Re: [Queue-developers] new design details
Brought to you by:
wkrebs
From: wernerkrebs <wer...@ya...> - 2005-05-12 05:01:44
|
Progamming is partially an artform as much as a science, and it's usually best to try to use the most modern techniques that everyone else is using. This way, you can leverage off what other people in the community are doing, and develop synergy with other projects. GQ was modern for its time, but lots has happened since it came out. 1. In certain modern environments (Java or C#/Mono), using SOAP/XML actually simplifies things. The complexity gets shifted to the libraries. Moreover, it adds the ability to then become a "standard" that work together with other environments --- open the possibility of easily making GQ work in non-homogenous environments, something that was much requested by the users. Check out Ganglia, another open source project, originally developed at UCB. http://ganglia.sourceforge.net/ This is becoming one of the standards for determining load averages on remote machines. Meta-clustering systems, such as APST, recognize this as one of the protocols for querying load averages and other information. So, it would be good if GQ supported a standard monitoring protocol like ganglia. You'd have to run ganglia somewhere anyway if you wanted to use a meta system like apst, so GQ could feed off that, rather than, or in addition to, its own load monitoring code. Or, GQ could continue to use its own monitoring code, but also support the ganglia protocol (it's open source, after all), so there wouldn't be a need to run two load monitoring daemons. Ganglia exchanges using information using XML, of course. 2. Regarding your concerns about SQL scalability: a lot of work was gone into making SQL environments highly scalable --- its a huge issue for corporations and anyone trying to run a high-traffic, mission critical website (been there). Remember the old commercials for a certain computer services firm about the startup that didn't consult with them, and therefore its website wasn't scalable? mysql (which RMS wouldn't want us to mention here because it is semi-commercial --- he wants us to say 'postgresql' because that's completely free), may not be one of them, but the major two commercial SQL are designed to be highly scalable in a cluster environment (this is why one pays big bucks in support and fees for them instead of mysql). The other solution, commonly used in the website world (J2EE and .NET/Mono), involves so-called 3-tiered architectures. Basically, the idea is to have another set of threads, which can potentially run on another CPU or another machine, handle all the actual communication with client (web browser) and cache some of the SQL queries (pre-compilation), and, in some cases, even cache the results when possible. This takes a significant load off the back-end SQL database, which can now handles 1000s more clients. This approach would work with a GQ mananger by creating an intermediary gm daemon that would reply to a large number of clients by caching and periodically refreshing the results of a small number of SQL queries. SQL databases are at the leading edge of scalability technology (although often in the commercial rather than open source worlds) and have other benefits (standardization, so other clients can interact, and existing SQL database management tools can be used). Still, I'll let you decide how you think it's best to do this. --- Koni <mh...@co...> wrote: > On Tue, 2005-05-10 at 08:56 -0700, wernerkrebs > wrote: > > > > Two comments. > > > > 1. Regarding the protocol, GQ's protocols largely > > predated modern RPC standards, such as SOAP and > XML. > > > > I'm not sure any of these things are worth their > weight in a homogeneous > system. The communication between the GQ system as I > have envisioned it > is pretty lightweight and there is very little > structure to the > information. In this case, I think using XML or SOAP > for a communication > layer adds complexity (in my mind) which is contrary > to their purpose in > general. > > [snip] > > > I would think some of the current features of the > GQ > > TCP/IP protocol would be best done using some sort > of > > SOAP implementation. For example, aspects of the > > initial authentication, and querying load > information > > would be best done using SOAP. > > > > I don't think SOAP will do much for us regarding > authentication. The > authentication stuff here is really simple (to me). > Perhaps for load > information if a lot of detail is returned (like all > the information ps > would return say). As for authentication, its > already implemented as a > simple challenge handshake (initial authentication): > > qd qm > > auth/register request > (send nonce) --------> > > sign nonce with > system key, > <------- reply with our own > nonce > > verify response --------> > sign qm nonce > > <-------- verify response, > send session key > > > If either verification fails, the offended party > stops the protocol. > Receipt of the session key indicates to qd that the > challenge handshake > protocol completed successfully. After that, all > communication between > the qd and qm come with simple signatures using that > key. The complexity > of the generation of signatures and verification of > them is already more > or less isolated from the logic if handling the > message payload. > > > > > Also, since GQ was written, standard protocols for > > this type of thing have emerged. Look at > Apst/Apstd > > system at SDSC (where, ironically, I used to work, > > although not on that project): > > > > http://grail.sdsc.edu/projects/apst/ > > > > Apst is a meta demon for cluster demons. It > doesn't > > currently support starting jobs using GQ, but does > > support starting other (commerical) systems. GQ > > support would be fairly trivial for them to add, > if > > they wanted to. SDSC (part of UCSD) receives grant > > money from a firm that makes a GQ-like commercial > > product, so it's not clear if that's a direction > they > > want to go in. They do support the commerical > product. > > However, the source code is available, so the > > community is free to add support for GQ as well. > > > > Apst will query each cluster manager (this would > > similar to the qm program you are proposing) and > > obtain load information via an XML file returned > from > > the cluster manager. It will then decide how many > jobs > > to start on that particular cluster (which it will > > start using a crude ssh command-line protocol to > > submit the jobs and scp to first transfer the > relevant > > files into place). It's up to the cluster manager > to > > then distribute the jobs to the cluster nodes. > > > > Apst, which is C/C++ based (Apstd is available in > > Java) is similar to Nimrod, which is Java-based. > > Source code for all of these is available. > > This sounds interesting. It would be great for GQ, > whether GQ becomes my > new proposed implementation, remains as is, or > something else > altogether, contributing a "driver" (so to speak) so > that this meta > system can work with it would be cool and perhaps > broaden the market for > us. > > > > > 2. Regarding qm, a divison of the Texas > Instruments > > actually contributed a SQL-based qm in C++. (It > would > > require that an SQL database, preferably Open > Source > > and free such as Postgresql, be running on a > server). > > > > Cool. I was first thinking about job information > being managed by a > mysql (or postgres) backend, where the SQL engine > would handle things > like atomicity and persistent state information > across failure. Would > have been cake if I wrote qm in perl (I am very > familiar with Perl-DBI). > The only thing I don't like about this is the > potential high-latency -- > one (or more) threads insert to the job table (qs) > while some another > thread polls (qm) the table for new rows. Perhaps in > postgres there is a > way to install a trigger or something so polling is > unnecessary. I don't > think there is a way to do that in mysql. qm is > actually unnecessary if > qd's can talk to the SQL engine directly. SQL can > handle authentication > and atomicity and qd's can just compete for jobs. > That's kind of nice. > Not sure it will scale well though. 1000 qd's each > with persistent TCP > connection to mysql would create 1000 forked > processes at the database > server. > > > > This is part of the GQ distribution, but is > optional > > and not compiled by default (due to C++ autoconf > > problems at the time since resolved. Also, users > wrote > > to me explaining their preference for a small, > simple > > package with peer-to-peer behavior, rather than a > > centralized package with a manager that might > crash, > > so the original behavior of GQ remained the > default.) > > > > Beforing writing a manager from scratch, you might > > want to look at the manager code and documentation > > that TI's subsidary contributed. > > OK, I'll try to have a look. The manager is almost > already all written > though in my haste to flesh out ideas rolling around > in my head. I shall > post a tarball of the code shortly. I want to add at > least a rudimentary > support for actually submitting a job to the system > and having it > execute. While I'm doing that, we can get a better > feel for who is out > there reading this list and what interest there is. > > Thanks for your comments Werner, I appreciate your > insights greatly. > > Cheers, > Koni > > > > ------------------------------------------------------- > This SF.Net email is sponsored by Oracle Space > Sweepstakes > Want to be the first software developer in space? > Enter now for the Oracle Space Sweepstakes! > http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click > _______________________________________________ > Queue-developers mailing list > Que...@li... > To unsubscribe, subscribe, or set options: > https://lists.sourceforge.net/lists/listinfo/queue-developers > |