From: Paul J. <pj...@en...> - 2002-01-19 05:44:30
|
On Mon, 14 Jan 2002, Paul Dorwin wrote: > I now have a new document, which can be found at > http://lse.sourceforge.net/numa/Topology/Topology.html#REVIEW Ah - you've been busy. Good to see you back on this list. I will confess to having a little trouble with this design. Since it may well have been in part some of my suggestions from last Nov/Dec that led to this current design, I'm conflicted. I share your concern, expressed in Section 1.2, that the new design is "much larger". Anyhow ... to specifics: 1) As with the first design, I have trouble with trying to present the same API to both the user level and the kernel. This API in the second design strikes me as likely suitable for use by C code in user land, possibly suitable for passing across the user-kernel boundary, and rather unsuitable for an internal kernel API. Even you note that this API may "cause unacceptable overhead for the kernel". I am open to a relatively large API for userland (as can be seen from my CpuMemSet work ;). But internal kernel API's should be, in my view, minimal and more demand driven -- little more than what is needed by the code that depends on it. If you decided that this latest API was just for use by userland code, not also exposed within the kernel, then I would be more comfortable. 2) Do you intend to add a notion of "distance" to your topology? For simpler systems of perhaps 16 to 64 cpus or less, depending on topology, all distances are trivially obvious from the abstract topology. But for more complex geometries seen in larger systems, distances (cpu to memory and cpu to cpu) can become non-obvious. I appreciate however that you might not think that distance fit well within your structure. We'll need it somehow. Any suggestions how? 3) As was commented on by a couple of us in the lengthy email thread that followed up your first design, perhaps the best way to pass this topology across the kernel-user boundary would be by a /proc-like (or driverfs or whatever like) display of files containing ascii text lines, in a directory tree. Then the C-friendly API in your second design would be implemented purely in user land, depending on reading the file-system display and converting it to linked C structures. Once gain, I recommend to your consideration SGI's hwgraph interface, as described on the man page: http://www.mcsr.olemiss.edu/cgi-bin/man-cgi?hwgraph+4 4) You state that there are no plans at this time to map hardware beyond the I/O controller. This seems to mean we are headed toward the following collection of display mechanisms in Linux: a] Your topology subsystem displays cpu/mem/node/cache, with future extensions to the I/O bus and controller level. b] Gooch's much "loved" devfs displays block and char devices in a namespace that is a modern replacement to /dev. c] Mochel's recent driverfs exposure of the device driver structure. Gooch was motivated by the need to manage a larger /dev directory, where the number of special device files was exceeding both the simple limits of <major, minor> bit fields, and the human limits of managing a large list of changing device entries with static links. Mochel was motivated by the need to support Power Management, Plug and Play and hot plug, which required a better structure for connecting the device drivers in the kernel. This split is ok by me, more or less. Other no doubt are concerned that there is apparent duplication of similar effort here. But perhaps the focus is sufficiently different in each case, and the likelihood of genuine merger and sharing of effort sufficiently low, that this is how it is, and that's ok. Still, I don't understand why you would not anticipate adding I/O devices to your topology? They would seem like a natural addition to me. 5) How would you model a system where, say, each node had two dual-processor chips (4 processors per node), with cache at each level: per processor, per chip and per node? Thanks for the good work -- hopefully I will get time soon to download your patch and have a look at the real code. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |