[Chromium-environment] n log n Startup Scalability

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Regarding the problem of starting huge numbers of nodes (current code requires 
all nodes to contact the mothership for configuration, and the mothership is a 
single process that handles the connections linearly)...

"Daughterships" have been suggested as one option for solving this problem; I'm 
ruminating on the concept.  A daughtership is a process that receives 
information from a mothership, and then acts on behalf of the mothership, 
brokering connections and passing configuration information off to connecting 
processes.

Servers should be able to connect to the mothership or any daughtership 
transparently, with equal functionality (in fact, a server should not itself be 
able to determine any difference between the mothership and a daughtership). 
Further, not only should multiple daughterships be possible, but a hierarchy of 
daughterships should be possible; this would theoretically allow n log n 
responsiveness.  (However, I think there must be a top-level mothership, a 
"grandmothership", to which the application connects, which manages sending the 
application the signal to continue; in this case symmetry of motherships and 
daughterships is broken.)

A daughtership will ask its mothership for the node graph when it starts up; but 
a two-way connection is necessary to maintain state in the distributed system 
(e.g. to propagate "setparam" or "reset" commands to the whole system, or to 
manage dynamic host markers [mentioned in another thread]).  A daughtership that 
receives one of these on any connection but the upstream connection must 
propagate it to its mothership; a mothership that receives one of these must 
propagate it to all its downstream daughterships (except for the one that 
originated the command, if any, although this wouldn't likely make a difference).

If there are dynamic markers in the system (again, see the other thread), a 
daughtership must convey connections to its mother, so that the 
"grandmothership" can track all the dynamic markers, resolve their names, and 
determine when to allow the application to continue.  (Motherships will also 
need a new instruction just for their daughters, to tell them to update their 
own dynamic marker tables, so they can resolve requests as well, and so they 
know when all dynamic markers have been resolved so that they don't need to 
propagate connections upward...)

--------------------------------------------------------------------------

I've thought about having "ungraphed" daughterships, "graphed" daughterships, 
and "noded" daughterships, as three different approaches.

- Ungraphed daughterships: Daughterships are independent processes, with no 
associated node.

In this case, it's up to the user to start daughterships (as many or as few) as 
are desired, and to set up their hierarchy (via each one's understanding of 
where its mothership is).  Chromium has no knowledge of whether daughterships 
are present, or how many.  Any network server can be told to connect to any 
mothership or daughtership equivalently.  Chromium need not know anything else.

- Graphed daughterships: Daughterships are full-fledged Chromium nodes, with a 
new node type created for them.

Servers are associated with particular daughterships, and Chromium is aware of 
which server is associated with which daughtership.  Chromium can auto-start 
daughterships (like any other node).  Chromium also waits for connections from 
daughterships; a daughtership signals its mother when all its connections have 
been made (these can either be simple server connections, or daughterships 
signalling that their connections are all complete).

An advantage is that daughterships "know" which nodes are theirs, and which are 
dynamic; there's no need to propagate connection information, since 
daughterships can resolve dynamic markers themselves.  It may also be an 
advantage that the entire configuration need not be communicated to each 
daughtership - the mothership knows how much of the configuration each 
daughtership requires, so it isn't necessary to send the whole huge 
configuration to every daughtership.  However, the disadvantage is the extra 
complexity of managing this hierarchical data.

- Noded daughterships: Daughterships can have nodes (but these aren't required); 
nodes are only provided for autostart capabilities.

This hybrid solution provides one primary advantage of the graphed daughtership 
case (i.e. the ability to autostart daughterships), without all the complexity 
of managing a full hierarchy in Chromium (although the daugherships can still be 
set up in a hierarchy, Chromium wouldn't be intrinsically aware of this).

--------------------------------------------------------------------------

Thoughts?  Other ideas?

Bob Ellison
Tungsten Graphics, Inc.