From: Paul J. <pj...@sg...> - 2004-10-06 02:49:46
|
Matthew wrote: > > I feel that the actual implementation, however, is taking > a wrong approach, because it attempts to use the cpus_allowed mask to > override the scheduler in the general case. cpus_allowed, in my > estimation, is meant to be used as the exception, not the rule. I agree that big chunks of a large system that are marching to the beat of two distinctly different drummers would better have their schedulers organized along the domains that you describe, than by brute force abuse of the cpus_allowed mask. I look forward to your RFC, Matthew. Though not being a scheduler guru, I will mostly have to rely on the textual commentary in order to understand what it means. Existing finer grain placement of CPUs (sched_setaffinity) and Memory (mbind, set_mempolicy) already exists, and is required by parallel threaded applications such as OpenMP and MPI are commonly used to develop. The finer grain use of non-exclusive cpusets, in order to support such workload managers as PBS and LSF in managing this finer grained placement on a system (domain) wide basis should not be placing any significantly further load on the schedulers or resource managers. The top level cpusets must provide additional isolation properties so that separate scheduler and resource manager domains can work in relative isolation. I've tried hard to speculate what these additional isolation properties might be. I look forward to hearing from the CKRM and scheduler folks on this. I agree that simple unconstrained (ab)use of the cpus_allowed and mems_allowed masks, at that scale, places an undo burden on the schedulers, allocators and resource managers. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |