From: Patrick M. <mo...@os...> - 2002-03-13 17:12:43
|
> > You have a non-deterministic number of devices in your system, a > > device tree with an unknown depth, and a relatively small stack, which > > itself has an unknown amount of room with which to play. What is not an > > issue about that? > > Well, you can put your data into device tree. Add 4 bytes to struct > device, and you can put data related to tree walk into that. That > should save you from uglyness. What are you talking about? I'm talking about eating up the stack because of parameters and local variables. I have no idea what you're referring to. I'm looking for a copy of the driver model code that had the suspend/resume recursive walks in it. In the meantime, I'll wing it: int device_suspend(struct device * root, u32 state, u32 level) { list_t * node, * next; struct device * child; int error = 0; spin_lock(&device_lock); list_for_each_safe(node,next,&root->children) { child = list_entry(node,struct device, node); get_device(child); spin_unlock(&device_lock); device_suspend(child,state,level); if(child->driver && child->driver->suspend) error = child->driver->suspend(child,state,level); put_device(child); spin_lock(&device_lock); if (error) break; } spin_unlock(&device_lock); return error; } For each call, you're pushing 12 bytes on the stack for the parameters and 4 for the return address. gcc will likely subract another 12 for local variables (assuming that error can't be kept in a register). So, that's overhead of 28 bytes per call. You have no idea how deep your tree is, or what the suspend functions are doing. > > So, then there is the question of ordering. With the device tree, you will > > get almost perfect ordering. But, consider system devices which you need > > to quiesce before you suspend, but must be the last things suspended (IRQ > > controllers or RTCs maybe). They will be children of the system bus, which > > will likely come before the PCI bus in the root's list of children. How do > > you guarantee proper ordering? > > Well, by two phases. > > First phase, for normal devices, is interrupts enabled. > > Second one is for special devices, interrupts disabled. At that time, > RTC or IRQ controllers can be suspended safely. Right. But, how do you know what devices to do at what stage? > > Oh, and what about where you're saving state to? How can you guarantee > > that there will be enough space on the swap device? What if you run out? > > What if you have no swap? Does that preclude the user from doing S4? > > If I run out of swap space, I abort suspend, wake devices, and > continue normal operation. [You have to suspend, somewhere. If there's > not enough space, too bad, but I need *some* space. Every approach > needs that.] Yes, every approach does need space. Swap is a shared medium. Other things could be eating it at the same time you're trying to use it. You could have a variable amount of swap depending on how long the system has been up, the time of day, or the phase of the moon. > Okay, I should explain. (BTW this is mostly working code, swsusp can > do 20 S4 suspensions during kernel compile). > > The trick is I need half of memory free [that's very easy to get on > any system, just swap-out almost everything]. When half of memory is > free, I can suspend all devices, shut off interrupts, and copy used > memory into free memory ("atomic copy"). That "atomic copy" is complete > snapshot of the system. > > At that point, I can resume everything, write that "atomic copy" to > the swap space, and poweroff. Ah. Interesting. > Should I attach that patch? ;-). Yes, I got it. I'll read it. On the plane. [...] > How do you write the image? > > You do not have an image to write, right? You'd want to snapshot whole > state of the kernel, but alas, your kernel is changing under you? > > Or you are doing "write the image" step in userspace? That is going to > be awfully complicated, right? And that userspace path will need to be > pagelocked? I like the snapshotting portion of what you're doing. I don't like the idea of suspending/resuming devices in kernel space. I also don't like relying on swap space to be free. I would think that all you need is one file descriptor that userspace passes to the kernel to use. It could be a swap partition, a dedicated partition, a regular file... -pat |