From: Jun N. <ju...@sc...> - 2001-02-09 21:48:23
|
I agree. I think the scope of the values returned by goodness() is not clear even with the original scheduler - it's not local or global either, because of PROC_CHANGE_PENALTY, which should be platform-dependent. So I don't think we need to stick to the current scheduler semantics for non-realtime tasks, especially when supporting larger SMP machines, including NUMA. Besides the cpu_allowed mask, one thing I would like to add is to check if its cache is still warm or not there, when stealing a task from a remote CPU. We don't want to migrate such a task with warm cache even if it has a high na_goodness value. One of easy and effective ways is to maintain the last time (jiffies) when the task was scheduled. If the time is larger than some tunable/platform-specific value, then we might want to migrate the task, otherwise find another one. I bet this would cause significant scheduling behavior changes. Hubertus Franke wrote: > > Mike, it goes to the heart of the question whether we MUST or NEED to stick > to the current scheduler semantics. > > For the low-end SMPs (4ways or so) it seems ok to me to simply check before > trying to steal the thread whether it > can be scheduled or not. If not you move onto the next queue. You are > afraid that the cpu_allowed mask will > be used frequently. Waiting for the lock doesn't happen in the current > code, you use try_spinlock. > The problem is as you point out that there is a task with sufficiently high > goodness to warrant a preemption, stuck behind > this "cpu_allowed" limited task and therefore is never considered. > > Again, at some point, one has to do some hand-waiving and state "tough > luck". > As far as I am concerned, the task of the scheduler is to create high > throughput through the system and provide some > degree of fairness. By defining an affinity boost, we are already > sacrifying fairness to some degree. > > I have taken a slightly different angle. I am willing to relax the current > scheduling assumption in order to increase > scalability. Particularly I am willing to run tasks of lower goodness value > as long as I adhere to the RT semantics. > I think that is OK, because the PROC_CHANGE_PENALTY is a first heuristic > that might not necessarily be right. > > I think when you are approaching larger number of CPUs (8+) you need to > look at partitioning your system from a > scheduling point. One should pretty much deal with the system as if it is a > NUMA system. > > I have taken the MQ scheduler and sub-divide into the cpu pools. I have > posted regarding this already under > our latest status report for the scheduling: > http://lse.sourceforge.net/scheduling/results012501/status.html#Load%20Balancing > > Running the chatroom with 30/300 gives the following results. (I will post > these on Monday on our lse.sourceforge.net/scheduling) site > for general consumption. > (See attached file: pre8-chat.pdf) > > In this case, splitting up the scheduler into multiple pools with > occasional trivial load-balacing shows at the very highend for the > chat-room a 10% improvement over our current MQ. In this case I am checking > in reschedule_idle whether to preempt or not, but in schedule I only look > within my pool. > There are some potential improvements possible, for instance only checking > for remote idle cpus on all cpus, but no preemption outside the current > pool > I think these mechanism are worthwhile to look into it. > > I think that the usage of cpu_allowed can be tight into this. cpu_allowed > is only a simple mechanism and not a policy. I think we need > to look at the policies that will be built for cpu_affinities. I think most > of them will originate in the NUMA area. > And doing a pool based approach seems to be ok. > > Any comments ? > > Hubertus Franke > email: fr...@us... > > Mike Kravetz <mkr...@se...>@lists.sourceforge.net on 02/09/2001 > 02:26:08 PM > > Sent by: lse...@li... > > To: lse...@li... > cc: > Subject: [Lse-tech] cpus_allowed in multi-queue scheduler > > I'm looking at the cpus_allowed field of the task structure > and trying to determine what is the best way to handle this > in the multi-queue scheduler. > > As a reminder, the multi-queue scheduler I am working on > has one runqueue per CPU. In addition, there is a separate > runqueue lock per CPU which synchronizes access to the the > CPU specific runqueue. In schedule(), the runqueue lock > associated with the CPU we are currently running on is > obtained. Then the CPU specific runqueue is scanned looking > for the task with the highest 'goodness' value. After > scanning the CPU specific runqueue, we 'take a quick look' > at other the other runqueues to determine if there is a task > with higher goodness that should be scheduled. Of course, > this takes CPU affinity into account just like the current > scheduler. Each CPU specific runqueue data structure has a > field which contains the maximum 'non-affinity goodness' > value of all schedulable tasks on that runqueue. Therefore, > when we 'take a quick look' we are really only looking at > the task with the maximum 'non-affinity goodness' on a remote > CPU's runqueue. See the description of the multi-queue > scheduler at 'http://lse.sourceforge.net/scheduling/mq1.html' > if you want more details. Now, if we find a task with > sufficiently high goodness on another CPU specific runqueue, > we attempt to lock (via spin_trylock) the other runqueue and > move the task to our runqueue. > > It is during the process of 'stealing' tasks from other > runqueues where we must be concerned with the cpus_allowed > field of the task we are stealing. We don't want to steal > a task if it is not allowed to run on our CPU. However, we > don't want to wait until we have obtained a lock on a remote > CPU's runqueue to check the cpus_allowed field and determine > if we should steal the task. > > My thought is that the runqueue data structure could also > contain the cpus_allowed field of the task with the maximum > 'non-affinity goodness' value. Hence, we could 'quickly > check' this value without obtaining the remote CPU's runqueue > lock. > > However, I have some concerns that this design could cause > some significant scheduling behavior changes if the cpus_allowed > field/feature is used extensively. Right now, I don't see > much/any use of this field but I expect that may change in > the future. Consider the case where the task with the maximum > 'non-affinity goodness' value has cpus_allowed set such that > it is limited to a small subset of the systems CPUs. Now, > other CPUs outside this subset will not be able to steal this > task. In addition, they will not be able to steal any other > tasks on this runqueue which may have a sufficiently high > goodness value. This is because we only keep track of the > task with the highest goodness value, and in this case it can > only run on a subset of CPUs. It is obvious that a task which > is limited to a single CPU should never be identified as a > candidate to be stolen by another CPU, and this is easy to code. > However, what about a task limited to 2 CPUs on a 8 CPU system, > OR 2 CPUs on a 16 or 32 CPU system? > > Any comments? > -- > Mike Kravetz mkr...@se... > IBM Linux Technology Center > > _______________________________________________ > Lse-tech mailing list > Lse...@li... > http://lists.sourceforge.net/lists/listinfo/lse-tech > > ------------------------------------------------------------------------ > Name: pre8-chat.pdf > pre8-chat.pdf Type: Portable Document Format (application/pdf) > Encoding: base64 -- Jun U Nakajima Core OS Development SCO/Murray Hill, NJ Email: ju...@sc..., Phone: 908-790-2352 Fax: 908-790-2426 |