From: Paul E. M. <pmc...@us...> - 2001-10-05 17:30:47
|
Hello! There is a new version of the NUMA description document at: http://lse.sourceforge.net/numa/numastatusdesc.html There is a new document describing simple-binding APIs at: http://lse.sourceforge.net/numa/numa_api.html And a rationale document at: http://lse.sourceforge.net/numa/numa_api_rationale.html Please send questions and comments! There will be conference call discussing this document in about 30 minutes (Friday Oct 5 1-888-790-7156 Passcode 85875 at 11:00AM Pacific Daylight Time), as advertised in Pat Gaughen's earlier email. Thanx, Paul |
From: Kimio S. <k-s...@mv...> - 2001-10-05 22:22:31
|
Hello Paul, I have a question about your document. # Sorry, I missed the today's conference. :( In your document, there is a description; ------------- Systems that can dynamically remove CPUs or nodes from a running system may have "holes" in the numbering scheme. However, if new CPUs are introduced, they will appear in the same range as other CPUs on the same node, and if new nodes are introduced, their CPUs will be consecutively numbered. CPUs from different nodes are never interleaved. This means that if a node has the capacity to have additional CPUs added to it, space must be left in the numbering scheme to accommodate those additional CPUs. ------------- Is this restriction necessary? I feel there is no API relying on this assumption. If so, I have to look for a way to get max. number of CPUs per node. :( Regards, Kimi On Fri, 5 Oct 2001 10:29:26 -0700 (PDT) "Paul E. McKenney" <pmc...@us...> wrote: > Hello! > > There is a new version of the NUMA description document at: > > http://lse.sourceforge.net/numa/numastatusdesc.html > > There is a new document describing simple-binding APIs at: > > http://lse.sourceforge.net/numa/numa_api.html > > And a rationale document at: > > http://lse.sourceforge.net/numa/numa_api_rationale.html > > Please send questions and comments! There will be conference call > discussing this document in about 30 minutes (Friday Oct 5 > 1-888-790-7156 Passcode 85875 at 11:00AM Pacific Daylight Time), as > advertised in Pat Gaughen's earlier email. > > Thanx, Paul > > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech -- Kimio Suganuma <k-s...@mv...> |
From: Anton B. <an...@sa...> - 2001-10-05 23:03:31
|
> Systems that can dynamically remove CPUs or nodes from > a running system may have "holes" in the numbering scheme. > However, if new CPUs are introduced, they will appear > in the same range as other CPUs on the same node, > and if new nodes are introduced, their CPUs will be consecutively > numbered. CPUs from different nodes are never interleaved. > > This means that if a node has the capacity to have additional CPUs > added to it, space must be left in the numbering scheme to > accommodate those additional CPUs. The hotplug cpu patches remove the idea of logical cpu numbers completely, replacing it with a for_each_cpu macro. I assume the hardware groups cpu ids with NUMA domains? Anton |
From: <k-s...@mv...> - 2001-10-05 23:44:42
|
Hi Anton, you are here! :) > > Systems that can dynamically remove CPUs or nodes from > > a running system may have "holes" in the numbering scheme. > > However, if new CPUs are introduced, they will appear > > in the same range as other CPUs on the same node, > > and if new nodes are introduced, their CPUs will be consecutively > > numbered. CPUs from different nodes are never interleaved. > > > > This means that if a node has the capacity to have additional CPUs > > added to it, space must be left in the numbering scheme to > > accommodate those additional CPUs. > > The hotplug cpu patches remove the idea of logical cpu numbers > completely, replacing it with a for_each_cpu macro. I assume the > hardware groups cpu ids with NUMA domains? The problem is how to determine CPU number at booting. Currently, there is no way to get the max. number of CPUs per node at least on our system. So, we have to determine the number by config or someting. (It's no big deal, though.) In addition, I'd like to make sure the meaning of "consecutively." When I want to remove node 1 which consists of cpu 4,5,6,7, do I have to offline cpus in number order like 7-6-5-4 to avoid to make hole in the cpu numbers? And, when cpu 5 is reports some HW error, is it possible to offline only cpu 5? Regards, Kimi |
From: Anton B. <an...@sa...> - 2001-10-07 10:37:08
|
> Hi Anton, you are here! :) Hi Kimi, yeah I am lurking :) > The problem is how to determine CPU number at booting. > Currently, there is no way to get the max. number of CPUs per node > at least on our system. So, we have to determine the number by > config or someting. (It's no big deal, though.) Urgh, it would be nice if the hardware did this for us, but as you say it can be worked around. > In addition, I'd like to make sure the meaning of "consecutively." > When I want to remove node 1 which consists of cpu 4,5,6,7, > do I have to offline cpus in number order like 7-6-5-4 to avoid > to make hole in the cpu numbers? > And, when cpu 5 is reports some HW error, is it possible to > offline only cpu 5? Yes you can remove any cpu and leave a gap in the ordering. Rusty made the conscious decision in the hotplug cpu patches to remove the distinction between logical and real cpu ids. On many architectures there was no difference and on the others (like sparc) we found lots of kernel code got it wrong anyway. A macro was created that would iterate through all online cpus (for_each_cpu or something like that). So only this macro has to handle gaps and on architectures where this wont happen it can be optimised to go as fast as the old for(i = 0; i < smp_num_cpus; i++) Anton |
From: Kimio S. <k-s...@mv...> - 2001-10-08 17:54:19
|
Hi Anton, On Sun, 7 Oct 2001 20:20:31 +1000 Anton Blanchard <an...@sa...> wrote: > Yes you can remove any cpu and leave a gap in the ordering. Rusty made > the conscious decision in the hotplug cpu patches to remove the > distinction between logical and real cpu ids. On many architectures > there was no difference and on the others (like sparc) we found lots of > kernel code got it wrong anyway. > > A macro was created that would iterate through all online cpus > (for_each_cpu or something like that). So only this macro has to > handle gaps and on architectures where this wont happen it can be > optimised to go as fast as the old for(i = 0; i < smp_num_cpus; i++) Great! Could you show me the source, or latest patch? I'd like to merge it into my patch (Hotplug CPU for IA-64.) Regards, Kimi -- Kimio Suganuma <k-s...@mv...> |
From: Paul J. <pj...@en...> - 2001-10-11 02:58:17
|
On Sun, 7 Oct 2001, Anton Blanchard wrote: > Yes you can remove any cpu and leave a gap in the ordering. Rusty made > the conscious decision in the hotplug cpu patches to remove the > distinction between logical and real cpu ids. On many architectures > there was no difference and on the others (like sparc) we found lots of > kernel code got it wrong anyway. The CpuMemSets cpu and memory placement proposal that I have recently updated the Design Note for, at: http://sourceforge.net/docman/display_doc.php?docid=7178&group_id=8875 has a logical versus real cpu id (and memory node id) construct. However that's fine, and consistent with it seems Rusty is doing. The expectation in the CpuMemSet design is that all the existing kernel code uses physical cpu id's, and that just for the purpose of supporting the CpuMemSet interface (and the various interfaces layered on top of there, such as dplace, runon, cpusets, OpenMP, MPI, and what not) there is a logical numbering layer that virtualizes the view seen by applications from the comings and goings and reallocation of cpus and memory at the physical level known to the kernel. So, in short, it strikes me as good that Rusty has removed the logical/real cpu number distinction. I intend to handle that distinction higher up in the code, unbeknownst to the main scheduling and allocation code, where I think that distinction belongs anyway. I won't rest till it's the best ... Manager, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |
From: Jack S. <st...@sg...> - 2001-10-11 15:36:11
|
> > On Sun, 7 Oct 2001, Anton Blanchard wrote: > > Yes you can remove any cpu and leave a gap in the ordering. Rusty made > > the conscious decision in the hotplug cpu patches to remove the > > distinction between logical and real cpu ids. On many architectures > > there was no difference and on the others (like sparc) we found lots of > > kernel code got it wrong anyway. > > The CpuMemSets cpu and memory placement proposal that I have > recently updated the Design Note for, at: > > http://sourceforge.net/docman/display_doc.php?docid=7178&group_id=8875 > > has a logical versus real cpu id (and memory node id) construct. > > However that's fine, and consistent with it seems Rusty is doing. > > The expectation in the CpuMemSet design is that all the existing > kernel code uses physical cpu id's, and that just for the purpose What is your definition of a "physical cpu id" & "physical node id"? For cpu identifiers, most places in the kernel use what is called a "cpuid". It is the value that is in task_struct.processor & is the value returned by smp_processor_id(). I think this is the best "id" to use in the interfaces to identify cpus. Node identifiers currently come in 3 flavors (this terminology may change but the concepts will be the same): compact - dense mapping of all node to a range <0 .. numnodes-1> No physical significance of the exact number although the boot node will probably always be 0. proximity id - undefined. Currently neither "compact" or "physical" although this may change. (Currently, On sn2 the proximity id is bits {8:1} of NASID. Note that bits 0, 9, 10 are ignored. physical - On SN platforms, same as NASID. Undefined on some other platforms. > of supporting the CpuMemSet interface (and the various interfaces > layered on top of there, such as dplace, runon, cpusets, OpenMP, > MPI, and what not) there is a logical numbering layer that > virtualizes the view seen by applications from the comings and > goings and reallocation of cpus and memory at the physical level > known to the kernel. > > So, in short, it strikes me as good that Rusty has removed the > logical/real cpu number distinction. I intend to handle that > distinction higher up in the code, unbeknownst to the main > scheduling and allocation code, where I think that distinction > belongs anyway. > > I won't rest till it's the best ... > Manager, Linux Scalability > Paul Jackson <pj...@sg...> 1.650.933.1373 > > > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech > -- Thanks Jack Steiner (651-683-5302) (vnet 233-5302) st...@sg... |
From: Paul J. <pj...@en...> - 2001-10-11 19:29:47
|
On Thu, 11 Oct 2001, Jack Steiner wrote: > ... [pj...@sg... wrote:] ... > > The CpuMemSets cpu and memory placement proposal that I have > > recently updated the Design Note for, at: > > > > http://sourceforge.net/docman/display_doc.php?docid=7178&group_id=8875 > > > > has a logical versus real cpu id (and memory node id) construct. > > > > What is your definition of a "physical cpu id" & "physical > node id"? > > For cpu identifiers, most places in the kernel use what is called > a "cpuid". It is the value that is in task_struct.processor & > is the value returned by smp_processor_id(). I think this is > the best "id" to use in the interfaces to identify cpus. > > > Node identifiers currently come in 3 flavors (this terminology > may change but the concepts will be the same): For physical cpu id, I would want to use what is in the task struct and used to index the cpus_allowed bit vector in the scheduler. Looks like that is "cpuid". When I write "node" in the CpuMemSet Design Note, I am referring to a chunk of memory (just the memory), and typically write "memory node". Here, the choice of physical identifier might not matter much. What matters is that the right zone lists be handed to the kernel page allocation code in mm/page_alloc.c. But I am not clear on this detail yet. CpuMemSets does not have an explicit concept of "node", in the sense of a set of closely placed cpus and their associated memory and cache. The notion of a node is implicit, in two ways: 1) The topology and metrics reported via /proc need to show both: a] <cpu, mem> distances, reflecting memory latency and bandwidth from a cpu to a memory, and b] <cpu, cpu> distances, reflecting cache affinity -- two cpus sharing a major cache are closer. 2) The cpu and memory lists that are explicit in CpuMemSets will typically be arranged to respect the above distances and node topology, by higher level system management software. In other words, higher level software will often want to be cognizant of nodes when setting up configurations, but at the level of CpuMemSets it is just cpu and memory. I won't rest till it's the best ... Manager, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |
From: Martin J. B. <Mar...@us...> - 2001-10-11 20:11:38
|
> For physical cpu id, I would want to use what is in the task > struct and used to index the cpus_allowed bit vector in the > scheduler. Looks like that is "cpuid". If I understand what you're saying correctly, that's not really a physical thing at all - they're a logical numbering handed out at bootstrap time (at least on i386 ... see arch/i386/kernel/smpboot.c do_boot_cpu() ). People seem to think they're a 1-1 map with physical apic id - they're not. If I have a processor that fails to boot, then everbody else's number shuffles. I suspect this is just a terminology difference - but the term physical is most confusing - I would suggest using another term. M. |
From: Paul J. <pj...@en...> - 2001-10-11 20:21:50
|
On Thu, 11 Oct 2001, Martin J. Bligh wrote: > I suspect this is just a terminology difference - but the term physical > is most confusing - I would suggest using another term. It's all relative. Really. In other words, it is my impression that whenever two layers of abstraction meet at some interface, and need to map between two ways of naming something known across the interface, the lower layer's naming is called physical and the upper's is called virtual or logical. And then anyone normally working one level higher or lower in the system finds that confusing: either something that is logical in their normal view is being labeled physical, or something physical in their view is being labled logical. I know of no solution to this conundrum. I won't rest till it's the best ... Manager, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |
From: Martin J. B. <Mar...@us...> - 2001-10-11 21:20:39
|
> In other words, it is my impression that whenever two layers of > abstraction meet at some interface, and need to map between two > ways of naming something known across the interface, the lower > layer's naming is called physical and the upper's is called > virtual or logical. > > And then anyone normally working one level higher or lower > in the system finds that confusing: either something that is > logical in their normal view is being labeled physical, or > something physical in their view is being labled logical. Hmmm ... I'd say that physical numberings are dictated by hardware, and logical numberings by software. Ergo we have 1 physical and 2 logical numbering schemes here. But my main objection is that switching the terminology half way up is really confusing, especially for those of us who have to work with hardware dependant and higer level code. Thus I have physical apicids, logical apicids, a cpuid that you're calling both physical and logical depending where I'm standing, and a logical cpuid that's not the same as the other logical cpuid. I fear my brain is too small to cope with such things. How about you call one of them logical (the existing cpuid?) and the other one virtual (the new thing you are creating)? I beg you, for the sake of my sanity ;-) I think clear, distinct naming is important for discussing such things between people. I don't really care what the naming scheme is, as long as it's singular ;-) M. |
From: Paul J. <pj...@en...> - 2001-10-28 00:04:30
|
A couple of weeks ago, Martin Bligh objected to the choice of terms "physical" and "logical" for cpu and memory numbers in the CpuMemSets design note: On Thu, 11 Oct 2001, Martin J. Bligh wrote: > But my main objection is that switching the terminology > half way up is really confusing, especially for those of > us who have to work with hardware dependant and higer level > code. Thus I have physical apicids, logical apicids, a cpuid > that you're calling both physical and logical depending where > I'm standing, and a logical cpuid that's not the same as the > other logical cpuid. I recant. After taking further grief (in his gentle way) from Jack Steiner on this same issue, I have decided to change the terms to "system" and "application". The system numbering is that used by the kernel in such places as the scheduler and allocator. It typically numbers all the cpu and memory in the system. On our SGI SN hardware, for example, what we call the compact node id will be used as the CpuMemSet system number for memory nodes. The application numbering includes just the cpu and memory available to the specified application. It is that presented to an application via the CpuMemMap, and used in specifying CpuMemSets. This is a straight renaming: physical ==> system logical ==> application This change is essentially a change from _relative_ (to where you stand) names such as "physical" and "logical", to more _absolute_ names "system" and "application". This should help both system programmers, such as Martin and Jack, as well as application programmers who might be coding to this API, better understand the distinction. Does this help, Martin? Thanks, Martin and Jack. I won't rest till it's the best ... Manager, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |
From: Martin J. B. <Mar...@us...> - 2001-10-29 17:37:53
|
> This change is essentially a change from _relative_ (to where > you stand) names such as "physical" and "logical", to more > _absolute_ names "system" and "application". This should help > both system programmers, such as Martin and Jack, as well as > application programmers who might be coding to this API, better > understand the distinction. > > Does this help, Martin? Excellent - this will make it much easier both for engineers to think, and communicate with others. My brain is no longer a small piece of silly putty. Thanks very much for fixing this, Martin. |
From: Paul J. <pj...@en...> - 2001-10-29 20:20:56
|
On Mon, 29 Oct 2001, Martin J. Bligh wrote: > > Excellent - ... My brain is no longer a small piece > of silly putty. > > Thanks very much for fixing this, You're welcome. Now if it were only this easy to fix _my_ brain ;). I won't rest till it's the best ... Manager, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |