From: Randall F. <ran...@co...> - 2002-09-02 03:01:12
|
Ok, The idea here was to start some discussion about what a "user environment" would be for Chromium, with an eye toward what might be done by the end of Sept and how things might be shaped into the future. I have one perspective on how things might happen with a "production viz cluster"... So, you have a large (>32 node) cluster that you expect to be able to support multiple simultaneous users running different applications, with different gfx destinations. "Destinations" might include: a monitor attached to one of the nodes sitting in fron of the user, a remote desktop with only TCP/IP access to the cluster or a "PowerWall" that consists of some number of PCs routed through a video switch to some number of projectors which constitute a "wall". Note that not all of the nodes are attached to the video switch (too expensive). Users will include "domain scientists" that just want to run a canned application to see the favorite isosurface, but much faster that they could otherwise. The mode I have envisioned had users "allocating" nodes for a viz "job". This would be done via a batch queuing system like RMS/DPCS/etc. During this allocation process, the nodes would have a chance to "restart" and configure themselves, based on the job parameters. Chromium would be bootstrapped this way to "see" only a fraction of the nodes. Video switches would be set and other data paths (e.g. TCP/IP back to the desktop) preflighted (e.g. QoS allocations). (up to this point, I consider all this stuff to be site specific and I do not think core Cr should try to standardize any of that) At this point (or as a part of the job startup), a graphics application would start. Chromium would have a set of predefined and dynamically configured pipelines ready to attach to the application at this point. The next problem is how to have the right pipeline attached to the application. For example, if the application knows it is about to do time-varying volume rendering, it would like to opt for a sort last pipeline. The big question is how is this done? The simplest case would be to leverage a simple "name" for a config that would be set via environmental variable and possibly overridden by an application crXX() call. Should some standards be laid down here? Can the "types" of rendering be characterized, perhaps by a keyword list, that some "AI" in the mothership could use to make the selection?? Anyway, in the usage model I describe, when the "job" finishes, the system is torn down as part of the job handling, so automatic termination of the mothership is not needed. It should however be possible for a user to run multiple OGL applications, perhaps even simultaneously, within a single "job". The mothership must be able to at least track resource utilization within the context of a single "job". An aside, what is the mechanism for pipeline "preflighting". If there is a limited resource involved in the pipeline (assuming it is encapsulated by a SPU), how can the mothership gracefully fallback to a different SPU pipeline should the most desired one fail? Ok, other perspectives on things or discussion of some points/questions raised? So long that I bored you to tears?? Thanks. -- rjf. Randy Frank | ASCI Visualization Lawrence Livermore National Laboratory | rj...@ll... B451 Room 2039 L-561 | Voice: (925) 423-9399 Livermore, CA 94550 | Fax: (925) 423-8704 |
From: Brian P. <br...@tu...> - 2002-09-08 18:20:34
|
Randall Frank wrote: > Ok, > The idea here was to start some discussion about > what a "user environment" would be for Chromium, with an > eye toward what might be done by the end of Sept and how > things might be shaped into the future. I have one > perspective on how things might happen with a "production > viz cluster"... > > So, you have a large (>32 node) cluster that > you expect to be able to support multiple simultaneous > users running different applications, with different gfx > destinations. "Destinations" might include: a monitor > attached to one of the nodes sitting in fron of the user, > a remote desktop with only TCP/IP access to the cluster > or a "PowerWall" that consists of some number of PCs > routed through a video switch to some number of projectors > which constitute a "wall". Note that not all of the nodes > are attached to the video switch (too expensive). Users > will include "domain scientists" that just want to run > a canned application to see the favorite isosurface, but > much faster that they could otherwise. > The mode I have envisioned had users "allocating" > nodes for a viz "job". This would be done via a batch > queuing system like RMS/DPCS/etc. During this allocation > process, the nodes would have a chance to "restart" and > configure themselves, based on the job parameters. Chromium > would be bootstrapped this way to "see" only a fraction of > the nodes. Video switches would be set and other data > paths (e.g. TCP/IP back to the desktop) preflighted (e.g. > QoS allocations). > > (up to this point, I consider all this stuff to be > site specific and I do not think core Cr should try to > standardize any of that) > > At this point (or as a part of the job startup), > a graphics application would start. Chromium would have > a set of predefined and dynamically configured pipelines > ready to attach to the application at this point. The > next problem is how to have the right pipeline attached > to the application. For example, if the application > knows it is about to do time-varying volume rendering, > it would like to opt for a sort last pipeline. The > big question is how is this done? The simplest case > would be to leverage a simple "name" for a config that > would be set via environmental variable and possibly > overridden by an application crXX() call. Should some > standards be laid down here? Can the "types" of > rendering be characterized, perhaps by a keyword list, > that some "AI" in the mothership could use to make the > selection?? > > Anyway, in the usage model I describe, when the > "job" finishes, the system is torn down as part of the > job handling, so automatic termination of the mothership > is not needed. It should however be possible for a user > to run multiple OGL applications, perhaps even simultaneously, > within a single "job". The mothership must be able to > at least track resource utilization within the context of > a single "job". An aside, what is the mechanism for > pipeline "preflighting". If there is a limited resource > involved in the pipeline (assuming it is encapsulated by > a SPU), how can the mothership gracefully fallback to a > different SPU pipeline should the most desired one fail? > > Ok, other perspectives on things or discussion of some > points/questions raised? So long that I bored you to > tears?? I think I understand some of the high-level goals for this but I'm not 100% sure how to boil it down to concrete items for implementation. I guess I'm picturing at least the following: 1. A node/host allocator which keeps track of the free and in-use nodes in a cluster. A client (like Chromium) would ask the allocator for N nodes and get back a list of N hostnames (or an error code). When Chromium's finished with the nodes they'd be returned to the free pool. 2. Dynamic Cr config scripts which know how to work with the allocator in (1). Either the config scripts would ask the allocator for N nodes or a list of nodes would be passed in as an argument. 3. A set of "standardized" configuration scripts for sort-first, sort-last which can work with a range of applications and satisify item (2). 4. Some system to automatically choose a standarized config script based on the name of the program being run, or the the user's requested rendering style (sort first/last). Perhaps libcrfaker could look at argc/argv and look up the name of a config script in a resource file, then automatically start the mothership with that config script. Or, look for a special environment variable. -Brian |
From: Sean A. <sea...@ll...> - 2002-09-11 19:41:50
|
Brian Paul wrote: > I guess I'm picturing at least the following: > > 1. A node/host allocator which keeps track of the free and in-use nodes > in a cluster. A client (like Chromium) would ask the allocator for N > nodes and get back a list of N hostnames (or an error code). When > Chromium's finished with the nodes they'd be returned to the free pool. > > 2. Dynamic Cr config scripts which know how to work with the allocator > in (1). Either the config scripts would ask the allocator for N nodes or > a list of nodes would be passed in as an argument. For what it's worth, that's exactly how we were expecting this to happen. I think the first case of working with the allocator would be difficult to make work generally, what with the different kinds of node allocators that may be out there. But it should be relatively easy to write the second case, where a list of nodes is passed in somehow. That should probably be possible with any node allocator. > 3. A set of "standardized" configuration scripts for sort-first, > sort-last which can work with a range of applications and satisify item > (2). Right. Going further, there may also be a set of configuration scripts which encapsulate various local display devices, tiled displays, SGE/Bertha, etc. I'm thinking out loud here, but let's see where this goes... A user may launch a parallel app, and the system detects that he's sitting on a Bertha display driven by an SGE. When the user tells the job control system to launch the app, the node allocator allocates nodes for the app, as well as at least 4 nodes for doing rendering to the SGE. A Chromium configuration is chosen to set up the back end of the network so that render SPUs are launched on 4 nodes hooked up to the SGE. We also have hardcoded that this particular app renders with a sort-last configuration. Another Chromium configuration is chosen to set up the middle of the network using the rest of the render nodes given to us by the allocator. Finally, the app is launched on the remaining nodes. (I know this may be an inefficient way of doing this particular example, but I wanted to think about the general case of Chromium having different "stages" in its network.) The part that gets a bit fuzzy to me is if the app tells Chromium at runtime (through glChromiumParameterv) that it would like to do a sort-last setup. If Chromium needed to reconfigure its network and SPU setup, is that something we have the infrastructure to do? > 4. Some system to automatically choose a standarized config script based > on the name of the program being run, or the the user's requested > rendering style (sort first/last). Perhaps libcrfaker could look at > argc/argv and look up the name of a config script in a resource file, > then automatically start the mothership with that config script. Or, > look for a special environment variable. I think having an environment variable (CR_ENVIRONMENT?) is a good fallback in any case. Using that, we can override any other special logic that we might set up. -Sean __ sea...@ll... |