Re: [Apcupsd-users] System load at 1.00 with 2.6.12
Brought to you by:
adk0212
From: Adam K. <akr...@ro...> - 2005-06-22 17:22:12
|
On Wed, Jun 22, 2005 at 09:34:25AM +0200, Kern Sibbald wrote: > On Wednesday 22 June 2005 04:52, Adam Kropelin wrote: > > I've attached a patch against apcupsd-3.10.17. The actual changes were > > done in the 3.11 tree, but I did a back-port to 3.10.17 so users can try > > it out. I haven't checked it into CVS yet pending resolution of the "big > > downside" discussed below... > > It is a bit hard to see exactly what is going on without having the patched > code, which I didn't do, but it looks OK to me. I have the following > comments: I'll go ahead and check it into CVS (on the 3.11.x branch); that should make it easier for you to review. > 1. I like adding ci to the usb_info structure. It simplifies the code a lot. Indeed. Having a reverse-mapping from USB_INFO to CI is very helpful. > 2. I think it is essential, as you have already noted, to ensure that > the code will work the old way for 2.4 kernels. I suspect that there > are still a *lot* of people on 2.4 kernels. Agreed. I'll try to find the cleanest possible way to maintain backwards compatibility. > 3. I'm not too happy about back porting the fix to 3.10.17. Doing that will > lead you into a spiral of fixing the old version when it is not broken. It > would be much better to make the patch only in 3.11.x (IMO). Actually, it was your comment about a possible 3.10.18 release that led me down this path. Hopefully we can find a simpler solution for 3.10.18. > We should not forget that as far as I can tell, there is nothing wrong with > the current code. There are two problems: 1. Some people have broken > hardware. 2. The 2.6.12 kernel (IMO) is simply broken -- it really pains me > to see the kind of change they made go in a main stream kernel. I'm beginning > to wonder if Linus is losing his good sense. It's a tough call. Like I said before, technically speaking, prior to 2.6.12 hiddev did not adhere to its published API. Now it does. But given how long hiddev has been in existence, I think it would have been better to add a flag so an app that wants the new behavior can ask for it without penalizing existing apps like apcupsd. > The code you have added is really a new feature -- it is new development, and > it belongs in a development version not as a patch to the stable code (IMO). > Item 4 below, is something that could logically go in 3.10.17 because it is a > minor tweak to the code. I understand what you're saying. My changes are significant and may go beyond just being a bugfix for the stable code. If we can come up with a smaller change that solves the 2.6.12 problem I have no problem putting my changes in the devel branch only. > 4. Though it complicates the testing just a bit, I would *much* prefer making > this change and at the same time move the valid = 1 flag into each of the if > statements, so that we break out of the loop only when an important event is > seen. At the same time, it would be very prudent to add the important events > to the if statements that I mentioned in a previous email, even though they > are not currently used. Agreed. > If item 4 is implemented, your fix should not be necessary (I would like to > see your fix go in, but in 3.11.x) because your code is not necessary for any > of the important events as it doesn't matter what report they come from. Keep in mind my patch does two things: (1) Ignore duplicate reports (2) Match reports based on report_id, etc., not just usage_code. (This is REQUIRED for #1 to function reliably.) Removing my fix eliminates both of these changes, and as noted, removing #2 alone is not possible. Moving valid=1 inside the if statements is a good change, but not a sufficient fix on its own. Remember, 2.6.12 will send every interrupt event to us, which means every 3 seconds we'll get a report from the UPS saying "not on battery" (among other things). We'll see this as an "important event", set valid=1, and return. Net result: no change in behavior; we'll still poll every 3 seconds. The correct fix is to filter duplicate events. A simplistic way to do that would be to modify each if() statement to ignore the event if it doesn't represent a change. For example... Before... } else if (ev[i].hid == ups->UPS_Cmd[CI_BelowRemCapLimit]) { if (ev[i].value) ups->set_battlow(); else ups->clear_battlow(); ...After... } else if (ev[i].hid == ups->UPS_Cmd[CI_BelowRemCapLimit]) { if (!!ev[i].value == ups->is_battlow()) continue; valid = 1; if (ev[i].value) ups->set_battlow(); else ups->clear_battlow(); That should solve the 2.6.12 problem, although it does introduce a slight change in behavior since we'll not poll for unrecognized events. > 5. Though you don't like it, I still strongly recommend, along with item 4, > making the insanity time check (throttling code) apply to all states of the > UPS. I'm ok with the concept of throttling, but I don't want to see it used in such a way that it will always be invoked. It should be used to slow down reporting of rapidly changing values, it should NOT be used to solve the fundamental duplicate report problem in 2.6.12. > This with item 4 I consider rather minor tweaks to the current code > (3.10.17). To have apcupsd wake up every three seconds is not too good but > is also not a catastrophy providing the non-important events are ignored. > IMO these two items (4 and 5) are sufficient to resolve the kernel bug, and > possibly the UPS hardware bug. Item #4, with the extension I described above, should be sufficient on its own to solve the 2.6.12 problem. As such I'd like to address #5 in the devel branch only. The hardware problem Martin and Christian have will NOT be solved by #4 or #5. We really do need to introduce inter-urb throttling to solve that one. But that's a completely separate issue and should not confuse our discussion here about 2.6.12. > 6. The set of if statements mentioned in item 4 can, after your ci change, > probably be changed into a switch statement. Ah yes, nice. Thanks for pointing it out; I'll do that. --Adam |