From: Chandra S. <sek...@us...> - 2006-04-28 20:40:20
|
On Fri, 2006-04-28 at 15:29 +0200, Jes Sorensen wrote: > Chandra Seetharaman wrote: > > On Thu, 2006-04-27 at 10:54 -0400, Jes Sorensen wrote: > >> I am using the raw notifier chain to give the clients the same > >> sleep/blocking rules as the parents currently have. IMHO this makes > >> the most sense to keep the overhead at a minimum. > >> > >> Comments/oppinions/flames? > > But I have concerns with this implementation: > > > > Issue 1: As i told before in this context, IMHO, having a global > > notifier head for these events is better than having a per task notifier > > chains due to following reasons: > > 1 how does this model makes sure that references to all the notifier > > block for that module is removed when a module unregisters the > > notifier ? > > All references (one per task) of the notifier block (and hence the > > notifier_call) has to be removed. Attached patch doesn't do it. > > If we want to do it, then we have hold a system-wide lock/semaphore > > and walk thru _all_ the tasks in the system and remove the notifier > > block from them. > > Basically we have to make sure addition/deletion of notifier block > > to _all_ tasks is done atomically. > > Note that changing the notifier to atomic_notifier alone will not > > help as we have to remove the reference to the notifier_call in _all_ > > the tasks atomically. > > Sorry, I simply don't follow you there. If you are talking about module > unload then it's the responsibility of the module to keep a reference > count and not unload until all users have exited which should be easy > to do with the TN_EXIT handler. > Do you mean to say that the module will not be allowed to unload _until_ there is _a single_ task that the module has attached a notifier block with ? That doesn't sound appealing Consider a filesystem example: - Administrator insmods a filesystem - administrator mounts a filesystem - random user chandra gets to the filsystem - chandra creates a task p1 - p1 does something which makes filesystem module interested in p1 - filesystem module attaches a notifier block to p1 - user chandra moves off the filsystem, but p1 remains alive. - administrator unmounts the filesystem, succeeds as no user in the fs - administrator tries to rmmod the filesystem, and it fails due to the above logic. How can the administrator fix the problem ? Also, i do not think it is advisable to call unregister from the notifier call. > I don't see what lock you need here. > even though the notifier block is per task, notifier_head in the task is global, in other words, there can be many users (of the task notify mechanism) that can be doing one of the following at the _same time_ with a single notifier head: - register a new notifier block - unregister a notifier block - walk through the call chain in notifier_call_chain Unless we have some kind of protection, how can we guarantee that they will not be running over each other, walking off the list, calling notifier call of a removed module etc., ? > > 2 same issue has to be solved during register also, and the module > > has to do the registration for each task it is interested in (under > > the same lock as (1)). > > Again why do you need a lock? Registration is inherited so something > from userland creates it first and all children inherits it. Look at how > this is done with cpuset for instance. cpuset is not a valid comparison here. cpuset is the _only_ user of the data structures it plays with, and it knows how to protect it. Whereas task notify mechanism will be used by many users and hence a common protection mechanism is needed. > > > 3 If a module wants a new task to inherit the module's notification > > callback from the task's parent, then the module has to call the > > register_notifier for the new task during the TN_FORK event. > > This is an additional overhead for _each_ module at fork(). > > Also this has to done with the lock(described in (1) above) held. > > Which makes perfect sense to me. The overhead here is miniscule compared > to the cost of having a global lock for this. I am _not_ saying that we need a global lock here, i am just pointing another problem w.r.t using per task notifier. > > > 4 Consider the situation: At fork(), a module tries to allocate a > > notifier block to attach to the new task, and the alloc fails. What > > does the module do ? fail the callback and hence fail the fork() ? > > Yes. Same problem if you are in the fork and there is not enough > memory to allocate the task_struct or anything else that is allocated > during the fork. > > > 5 Where would we free the notifier blocks that are attached with > > _every_ task in the system ? (the semundo patch doesn't free the > > notifier block it alloc'd in getundo_list() function). > > Note that it is _illegal_ to do it in the corresponding > > notifier_call. Which means we need to have a garbage collector of > > some sort to free up the notifier blocks in _every_ task. > > Huh? Beware that semundo is special and requires a semaphore to be > installed if clone() is called with a specific flag. If the semundo > patch doesn't free the notifier block in the exit path then I introduced > a bug. It's the responsibility of the TN_EXIT handler to clean up. As i pointed above it is not advisable to unregister from the callback, and that is the reason i am pointing this as a problem, not specifically on how the semundo patch is written. > > > 6 wastage of memory (notifier head and multiple notifier blocks for > > every task in the system). > > Sorry this argument is bogus. You're wasting a notifier head, but not just notifier head, you need notifier blocks associated with each user of the task notifier mechanism. It is certainly not bogus, if the user of the task notify mechanism is interested in _all_ tasks, then they have to allocate a notifier block for _every_ task in the system. Process event connector is an example that would be interested in _all_ tasks. > compared to what else is in the task struct this is nothing, in fact > one could probably eliminate some elements in the task_struct and tag > them onto notifier block so they are only allocated when really needed. > A notifier block is only allocated if something is using it. According to your description above, you expect more people to start using task notifier, which in turn mean a notifier block for _each_ task for every such usage of task notifier mechanism. > > > I understand there is "one function call overhead" at fork() for you and > > the MVFS folks, but having a global notifier block will be a lot > > simpler. > > Having a global notifier block will be a lot costlier because it will > require a global lock. My point is that we are not getting away from using a global lock in the per task notifier_head case also. > > > BTW, there is no locking involved during call_chain of atomic notifiers, > > so the overhead is minimal. > > atomic notifiers make no sense in the fork path. An atomic notifier > implies that the service function cannot sleep, however that is not the > case during fork/exit etc. Either we go the full way with a global lock > or we do it lock free using raw notifiers as I suggest. By global lock you mean a semaphore, i presume. > > > Issue 2: I think the task notify mechanism has to provide the necessary > > protection to the notifier chain to make sure that the notifier_blocks > > are not unregistered when the notifier_call_chain is walked through. > > Since a task notifier should only be unregistered in the TN_EXIT case as i said above this is not advisable. > this is not a problem. Besides, how many cases do we have where a module may not be many, but we don't want to be oopsing when they do, right ? As long as the kernel supports unloading a module, we should consider that as a valid usage. > is unloaded at runtime? A classical case for this is for things like > accounting and it's not a module that will be loaded and unloaded at > random. I will not conclude on that. We have heard in this channel that MVFS will use this feature. I am sure that many more will follow. They may not be doing at random, but they may do it, and IMO, we should work correctly in those cases also. > > > Individual module owners cannot protect the notifier chain when it is > > walked through during call_chain. > > Nobody should need to walk the notifier chain outside the specified > context. They won't be walking the list, the notifier_call_chain would be walking, and it needs protection. > > > We could simply use atomic notifier. Why did you choose not to use it ? > > As explained above, atomic notifiers makes no sense there. > > > If any module wants to block, then we may have to provide a blocking > > task notify call chains also. > > The task notification chain is allowed to sleep. We already use > GFP_KERNEL all over the place there. We can just go with a blocking semaphore. > > > IMO, we need a simple wrapper that uses a global atomic notifier head > > and provides > > - register_task_notifier > > - unregister_task_notifier > > - task_call_chain > > > > with task_call_chain called from appropriate points with appropriate > > flags. > > Sorry, but I haven't seen any argument for where this would be required. > If you could provide some examples please. > > Regards, > Jes -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |