From: Pavel C. <pc...@us...> - 2002-12-17 10:26:46
|
Hi, I'd like summarize the problem and draw possible paths we can take re. this issue. First, what's the real problem that trn timeout should fix ? What we want is that runaway transactions and queries do not consume engine resources forever or in uncontrolled manner. Proposed solution is to impose a limit for how long transaction can live (we can also impose a limit for CPU usage, memory usage etc.). We hope that this would cure many reasons for stalled or exhausted engine resources (OAT problem is the most critical one): dead clients, badly written clients, malicious clients and a something I call as self-healing in mission critical systems. But is this approach really a cure for them ? a) We acknowledged that engine can detect dead clients now, and take the right action. Of course, we can improve the detection system to be more precise, take less resources and detect more *dead* conditions. But we definitely don't need to impose timeout to solve dead clients problem, because it's already solved, in more or less satisfactory way. b) Many said that timeout is probably not a good cure for badly written applications. Engine definitely should help developers to identify the problem and may provide a way for administrators to "fix" the *immediate* problem by killing runaway transaction, connection or query, but the real cure is definitely to fix a bad-behaving application. I liked the temporary system tables approach taken by IB7 that allows identify the occurrence of various problems and allow to kill trn, connections or statements. I think that we should analyze its pro's con's and seriously consider to implement them (or anything with equivalent capabilities) in Firebird. c) A malicious code - i.e. *intentionally* bad behaving application - is the real problem that we're not able to solve right now, and that timeout may solve. But it really can ? I have my doubts. It's clear that applications have different needs for system time and resources, so we must provide a way for fine tuning of timeout or other limits (user defined amount at start and renewal). We have to provide an API for that, but that same API would be there also for malicious code to use. If we don't impose (even user configurable) a hard limit for transaction/connection/query time, we will solve nothing that way, and even with a hard limit, we will throw only a small obstacle in the way of malicious code writers. More to that, anything that we do to solve this problem should not make unneeded obstacles to regular developers, and timeout would do. d) Another use of timeout is to help mission-critical systems to not fall on their knees when unexpected problem occurs. These systems are usually very busy systems, and is normal that system will take some actions on its own to minimize the impact of any failure or take an alternate path because people are too slow to react. Timeout may help there (at least it's normal practice as David pointed out), but do we *really* need to use timeouts *in engine* ? Is it possible that an independent monitor app. (it may use temporary system tables or any other API) that would observe and rule out transaction/connections/queries according to user- defined rules would be enough ? It would be definitely more flexible solution, but would be acceptable for such mission-critical systems ? What we didn't take into account in recent discussion is an overhead of any timeout or keep-alive solution. Client-controlled approach seems to scale better than server-controlled one, but server-controlled one seems to be more precise. But both will impose additional overhead in network traffic and system resources. Another angle that was mantioned but not very thoroughly is backward compatibility of any timeout solution with current applications. Solution that would use extended API would solve d), but do not b) and c) problems, other methods would be more or less incompatiblie. So, where we want to go from here ? Best regards Pavel Cisar http://www.ibphoenix.com For all your upto date Firebird and InterBase information |