Re: [perfmon2] [PATCH 1/2] perf_events: add cgroup support (v8)
Status: Beta
Brought to you by:
seranian
From: Balbir S. <ba...@li...> - 2011-02-02 19:31:05
|
* Peter Zijlstra <pe...@in...> [2011-02-02 13:46:32]: > On Wed, 2011-02-02 at 17:20 +0530, Balbir Singh wrote: > > * Peter Zijlstra <pe...@in...> [2011-02-02 12:29:20]: > > > > > On Thu, 2011-01-20 at 15:39 +0100, Peter Zijlstra wrote: > > > > On Thu, 2011-01-20 at 15:30 +0200, Stephane Eranian wrote: > > > > > @@ -4259,8 +4261,20 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks) > > > > > > > > > > /* Reassign the task to the init_css_set. */ > > > > > task_lock(tsk); > > > > > + /* > > > > > + * we mask interrupts to prevent: > > > > > + * - timer tick to cause event rotation which > > > > > + * could schedule back in cgroup events after > > > > > + * they were switched out by perf_cgroup_sched_out() > > > > > + * > > > > > + * - preemption which could schedule back in cgroup events > > > > > + */ > > > > > + local_irq_save(flags); > > > > > + perf_cgroup_sched_out(tsk); > > > > > cg = tsk->cgroups; > > > > > tsk->cgroups = &init_css_set; > > > > > + perf_cgroup_sched_in(tsk); > > > > > + local_irq_restore(flags); > > > > > task_unlock(tsk); > > > > > if (cg) > > > > > put_css_set_taskexit(cg); > > > > > > > > So you too need a callback on cgroup change there.. Li, Paul, any chance > > > > we can fix this cgroup_subsys::exit callback? The scheduler code needs > > > > to do funny thing because its in the wrong place as well. > > > > > > cgroup guys? Shall I just fix this exit thing since the only user seems > > > to be the scheduler and now perf for both of which its unfortunate at > > > best? > > > > Are you suggesting that the cgroup_exit on task_exit notification should be > > pulled out? > > > No, just fixed. The callback as it exists isn't useful and leads to > hacks like the above. > OK > > > > Balbir, memcontrol.c uses pre_destroy(), I pose that using this method > > > is broken per definition since it makes the cgroup empty notification > > > void. > > > > > > > We use pre_destroy() to reclaim, so that delete/rmdir() will be able > > to clean up the node/group. I am not sure what you mean by it makes > > the empty notification void and why pre_destroy() is broken? > > A quick look at the code looked like it could return -EBUSY (and other > errors), in that case the rmdir of the empty cgroup will fail. > > Therefore it can happen that after the last task is removed, and we get > the notification that the cgroup is empty, and we attempt the rmdir we > will fail. > > This again means that all such notification handlers must poll state, > which is ridiculous. The reason why the failure occurs is because someone has an active reference to the cgroup structure. In the case of memory, it was every page_cgroup earlier. The only reason why a notification would have to poll state is if 1. notification is sent that there are no references, this group can be cleaned up 2. A new reference is acquired before the cleanup 1 and 2 are unlikely -- Three Cheers, Balbir |