From: Leyne, S. <sl...@at...> - 2002-12-11 20:32:14
|
Christian, I have noted this request for addition to the Feature Request list. Please note that this does NOT mean that it will be implemented (he says to protect himself from the volley which might ensue). Sean |
From: Juarez R. <ju...@mp...> - 2002-12-11 21:03:09
|
- Default duration defined at user level to be able to override the server's default configuration. Maybe this will be registered as "feature request" until someone decide contribute with time, skill or money!!!? But is a good one. ----- Original Message ----- Sent: Wednesday, December 11, 2002 5:01 PM I, too, like the idea of a maximum transaction duration. Here's one way of designing it: - Default duration defined at the server level, so when creating a new database it will use that default duration. - Default duration defined at the database level to be able to override the server's default duration. - Pass the duration upon connection to override the default duration, so that in some circumstances the duration can be extended to perform i.e. the year-end processing/reports. I know that this might require some changes to the API. Just my 2 cents. -- Best regards, Daniel Rail |
From: Ann W. H. <aha...@ib...> - 2002-12-12 15:57:40
|
At 07:01 PM 12/11/2002 -0300, Juarez Rudsatz wrote: >- Default duration defined at user level to be able to override the server's >default configuration. If the problem is users who discover an interactive query tool and leave transactions open for days, then requiring that the user transaction limit itself is probably not going to work. >Here's one way of designing it: >- Default duration defined at the server level, so when creating a new >database it will use that default duration. That bothers me somewhat, probably because I'm not entirely happy with the server model starting with server logins. Yes, sometimes a single machine will host a number of related databases. More often, just a single database. And sometimes, a number of unrelated databases. That case really can't be supported now. >- Default duration defined at the database level >- Pass the duration upon connection > >I know that this might require some changes to the API. Actually that can all be done with connection and transaction parameters - yes, we'd need to add parameters, but that doesn't fundamentally change the API Regards, Ann www.ibphoenix.com We have answers. |
From: Juarez R. <ju...@mp...> - 2002-12-12 17:04:41
|
I see 3 types of problem here: 1) The political : When the user discover a interactive tool. Solution : Just buy a gun and kill him. 2) The most political : You need give different power for users. if Joe works on reception and want find 'Mary' in all database he will waste resources from others users. The problem is where the developer cannot guess the time involved, so give him one minute. But if Jim must show for 'Bill' the end-of-year report, he cannot be simply stopped. Finally this dicussion will go to features 'kill' and 'nice' implemented. 3) The login model. Really there are more than one need for a database. Monouser applications and webservers dificultly need a login. And in the second case login can be a performance penalty in a web context. ----- Original Message ----- From: "Ann W. Harrison" <aha...@ib...> To: "Juarez Rudsatz" <ju...@mp...>; <fir...@li...> Sent: Thursday, December 12, 2002 12:58 PM Subject: Re: [Firebird-devel] Re: Max transaction duration At 07:01 PM 12/11/2002 -0300, Juarez Rudsatz wrote: >- Default duration defined at user level to be able to override the server's >default configuration. If the problem is users who discover an interactive query tool and leave transactions open for days, then requiring that the user transaction limit itself is probably not going to work. >Here's one way of designing it: >- Default duration defined at the server level, so when creating a new >database it will use that default duration. That bothers me somewhat, probably because I'm not entirely happy with the server model starting with server logins. Yes, sometimes a single machine will host a number of related databases. More often, just a single database. And sometimes, a number of unrelated databases. That case really can't be supported now. >- Default duration defined at the database level >- Pass the duration upon connection > >I know that this might require some changes to the API. Actually that can all be done with connection and transaction parameters - yes, we'd need to add parameters, but that doesn't fundamentally change the API Regards, Ann www.ibphoenix.com We have answers. |
From: Leyne, S. <sl...@at...> - 2002-12-13 00:51:36
|
Paul et al, At this point, I think should remind everyone that the request was for the *ability* to set a transaction timeout value. Nothing was ever said to suggest by enabling this feature, that the engine would by 'default' start closeing connections. I think of this as equivalent to Nickolay's new explicit locking feature in v1.5. It's there, if you don't want to use, fine no one is going to force you to. but if it helps deal with a bad software/problem (maybe you don't have the source), what's the harm? Sean |
From: Pavel C. <pc...@us...> - 2002-12-13 09:17:08
|
Sean, On 12 Dec 2002 at 19:48, Leyne, Sean wrote: > At this point, I think should remind everyone that the request was for > the *ability* to set a transaction timeout value. > > Nothing was ever said to suggest by enabling this feature, that the > engine would by 'default' start closeing connections. > > I think of this as equivalent to Nickolay's new explicit locking feature > in v1.5. It's there, if you don't want to use, fine no one is going to > force you to. but if it helps deal with a bad software/problem (maybe > you don't have the source), what's the harm? Well, but how about the code bloat and new potential points of failure ? The small footprint and compact code and behaviour is one from major FB features that are easy to lose, but hard to achieve. I have nothing against new features, if they are reasonable. For me, "reasonable" means (at least) one point from: 1) Satisfy a general requirement 2) Satisfy an occasional requirement of many that's hard to achieve by other ways effectively 3) Satisfy general requirement of few that's hard to achieve by other ways effectively The transaction timeout seems to fall in third category, but I'm not convinced that it's not doable effectively "outside the engine". Best regards Pavel Cisar http://www.ibphoenix.com For all your upto date Firebird and InterBase information |
From: David J. <dav...@di...> - 2002-12-13 13:43:49
|
On 2002.12.13 04:21:59 -0500 Pavel Cisar wrote: > Sean, > > On 12 Dec 2002 at 19:48, Leyne, Sean wrote: > > > At this point, I think should remind everyone that the request was for > > the *ability* to set a transaction timeout value. > > > > Nothing was ever said to suggest by enabling this feature, that the > > engine would by 'default' start closeing connections. > > > > I think of this as equivalent to Nickolay's new explicit locking > feature > > in v1.5. It's there, if you don't want to use, fine no one is going to > > force you to. but if it helps deal with a bad software/problem (maybe > > you don't have the source), what's the harm? > > Well, but how about the code bloat and new potential points of failure ? > The small footprint and compact code and behaviour is one from major FB > features that are easy to lose, but hard to achieve. > > I have nothing against new features, if they are reasonable. For me, > "reasonable" means (at least) one point from: > > 1) Satisfy a general requirement Timeouts seem to be a general requirement for reliable distributed system design. I havne't seen any attempts to design a reliabe distributed system that don't use either an explicity timeout (xa distributed transaction model, JINI) or an implicit timeout (heartbeat, ping). It is certainly an explicit requirement of the xa transaction model. > > 2) Satisfy an occasional requirement of many that's hard to achieve by > other ways effectively > > 3) Satisfy general requirement of few that's hard to achieve by other > ways effectively > > The transaction timeout seems to fall in third category, but I'm not > convinced that it's not doable effectively "outside the engine". Since, as noted above, all designers of reliable distributed systems seem to rely on timeouts to determine remote node failures, I think it's up to you to demonstrate that there are other solutions that provide the same degree of automatic recovery from remote failure. Firebird seems to be the only open source database that supports 2pc and can be maneuvered into supporting almost all of xa: therefore IMO it's the only plausible open source choice to use in j2ee apps with more than one resource manager. I don't really understand the point of arguing against industry standard features for this environment. I would have asked for this feature a long time ago if I had fully appreciated the role of timeouts in reliability. As it is, I only really started to appreciate it when working on making the jboss transaction manager recoverable. thanks david jencks > > Best regards > Pavel Cisar > http://www.ibphoenix.com > For all your upto date Firebird and > InterBase information > > > > ------------------------------------------------------- > This sf.net email is sponsored by: > With Great Power, Comes Great Responsibility > Learn to use your power at OSDN's High Performance Computing Channel > http://hpc.devchannel.org/ > _______________________________________________ > Firebird-devel mailing list > Fir...@li... > https://lists.sourceforge.net/lists/listinfo/firebird-devel > > |
From: Phil S. <ph...@sh...> - 2002-12-13 14:22:56
|
Hi, I have been following this thread, and I think there is a 'bluring' between Connections and Transactions. When a 'Client' crashes, or the LAN cable is pulled out etc, Firebird will (eventually) notice, close the connection and rollback any open transactions. Why people are so passinate about transactions is because they can be a major cause of performance and other issues with FB. The majority of performance problems on the lists/newsgroups are down to 'incorrect' use of transactions. Long running transactions are not a good thing in FB, and whilst sometimes they can't be avoided (end of month batch runs etc.), the majority can. The problem is most evident when people move to FB from 'other databases', and the first app they write starts a transaction in the morning and they fill an 'editable grid' with data, and then wonder why the whole system has ground to a halt by lunch. In my opinion, in a properly designed system, transactions should only be open when reading or writing to the database. So I see no reason at all to have 'transaction timeouts', as if a transaction is open, something is being written to, or read from the database, and any time out will abort the operation. Phil |
From: David J. <dav...@di...> - 2002-12-13 15:17:17
|
On 2002.12.13 07:54:20 -0500 Phil Shrimpton wrote: > Hi, > > I have been following this thread, and I think there is a 'bluring' > between > Connections and Transactions. > > When a 'Client' crashes, or the LAN cable is pulled out etc, Firebird > will > (eventually) notice, close the connection and rollback any open > transactions. Right, that may have been a bad example. Can you prove that any client side failure that results in the client ceasing to communicate commits to its open transactions will quickly result in connection failure? I can't, so I think the server should be able to act unilaterally when transactions get too old. > > Why people are so passinate about transactions is because they can be a > major > cause of performance and other issues with FB. > > The majority of performance problems on the lists/newsgroups are down to > 'incorrect' use of transactions. Long running transactions are not a > good > thing in FB, and whilst sometimes they can't be avoided (end of month > batch > runs etc.), the majority can. > > The problem is most evident when people move to FB from 'other > databases', > and the first app they write starts a transaction in the morning and they > > fill an 'editable grid' with data, and then wonder why the whole system > has > ground to a halt by lunch. > > In my opinion, in a properly designed system, transactions should only be > > open when reading or writing to the database. So I see no reason at all > to > have 'transaction timeouts', as if a transaction is open, something is > being > written to, or read from the database, and any time out will abort the > operation. The purpose of timeouts is not to interfere with peoples work, but to provide a way to automatically recover from remote node failures. Some of these are taken care of by "broken connection detection" but I'm not convinced all possible failures will be detected this way. Since transaction timeouts appear to be an industry standard (specified in xa) and extremely useful I'm having trouble understanding why there is so much opposition. I'm certainly not suggesting that the default behavior change, just that it be possible to supply a transaction timeout with transactions so that for instance distributed transaction managers can make use of it. I don't think this would affect delphi users who wish to allow indefinite-length browsing sessions with open transactions, but it would provide the industry standard failure recovery for those who need it. thanks david jencks > > > Phil > > > ------------------------------------------------------- > This sf.net email is sponsored by: > With Great Power, Comes Great Responsibility > Learn to use your power at OSDN's High Performance Computing Channel > http://hpc.devchannel.org/ > _______________________________________________ > Firebird-devel mailing list > Fir...@li... > https://lists.sourceforge.net/lists/listinfo/firebird-devel > > |
From: Phil S. <ph...@sh...> - 2002-12-13 16:15:34
|
On Friday 13 December 2002 15:16, David Jencks wrote: Hi, > > When a 'Client' crashes, or the LAN cable is pulled out etc, Firebird > > will > > (eventually) notice, close the connection and rollback any open > > transactions. > > Can you prove that any client > side failure that results in the client ceasing to communicate commits to > its open transactions will quickly result in connection failure? FB will, eventuallty, close the connection, wether it is quick enougth or not is another story, a quick test just now removing the network cable and looking at the DB stats indicate it took just over a minute to close/rollback. > The purpose of timeouts is not to interfere with peoples work, but to > provide a way to automatically recover from remote node failures. Some of > these are taken care of by "broken connection detection" but I'm not > convinced all possible failures will be detected this way. I would argue that all of these failures should be handled by "broken connection detection", and if they are not, that area of the code should be improved rather than adding another 'feature' to work around it. > Since > transaction timeouts appear to be an industry standard (specified in xa) > and extremely useful I'm having trouble understanding why there is so much > opposition. I don't know the xa spec, but I think the reason for the opposition is the way FB transactions 'should be used', which is, in my view, any open, idle transactions are a design flaw of the 'client' application. You have a valid comment about 'client' crashes not being detected, but I think for those cases the 'connection detection' should be improved. > I'm certainly not suggesting that the default behavior change, > just that it be possible to supply a transaction timeout with transactions Surely, when you start a transaction, you know that you are going to commit/rollback a 'few lines of code' later after you have inserted a record or something, so unless you think you might forget to put a 'commit' in the code, I can't see a valid reason for a timeout paramter > I don't think this would affect delphi users who wish to allow > indefinite-length browsing sessions with open transactions, Anything to prevent this would be a good thing IMO Phil -- Linux 2.4.4-4GB 2:00pm up 9 days, 20:27, 1 user, load average: 0.17, 0.08, 0.01 |
From: Nickolay S. <sk...@bs...> - 2002-12-13 15:10:48
|
Hello, All ! Friday, December 13, 2002, 3:54:20 PM, you wrote: Allowing long-running transactions to kill database operation is actually a security problem because user essentially without rights can kill database operation. Oracle and MSSQL solve this problem by allowing to define several limits for roles (with database and server-wide default) like maximum transaction duration, CPU time per query, etc. Maximum duration of serializable transaction in Oracle is limited by the size of undo tablespace (where record back versions are stored). Current Firebird implemenation is insecure. You can kill the server by passing invalid handle to it or to execute whatever you need binary code on the server side tweaking with handles of deleted objects or using one of the known buffer overflows. My opinion on this issue is that it should go to security issues queue and we should make a feature to prevent DOS-attacks by starting transactions, but its priority in this queue should not be too high. > Hi, > I have been following this thread, and I think there is a 'bluring' between > Connections and Transactions. > When a 'Client' crashes, or the LAN cable is pulled out etc, Firebird will > (eventually) notice, close the connection and rollback any open transactions. > Why people are so passinate about transactions is because they can be a major > cause of performance and other issues with FB. > The majority of performance problems on the lists/newsgroups are down to > 'incorrect' use of transactions. Long running transactions are not a good > thing in FB, and whilst sometimes they can't be avoided (end of month batch > runs etc.), the majority can. > The problem is most evident when people move to FB from 'other databases', > and the first app they write starts a transaction in the morning and they > fill an 'editable grid' with data, and then wonder why the whole system has > ground to a halt by lunch. > In my opinion, in a properly designed system, transactions should only be > open when reading or writing to the database. So I see no reason at all to > have 'transaction timeouts', as if a transaction is open, something is being > written to, or read from the database, and any time out will abort the > operation. > Phil -- Best regards, Nickolay Samofatov mailto:sk...@bs... |
From: Ivan P. <pre...@ms...> - 2002-12-13 10:43:58
|
> From: psc...@in... > > The main reason for this feature is to cope with software that hasn't > > been written in a sympathetic way towards sensible transaction length, > > so implementing in the client seems weaker than having a _database_ > > parameter. > > The software should be fixed, a maximum transaction duration, cures the symptom, > and not the overall disease, a badly designed app, which probably has other > problems as well. Don't you think that with "Max transaction duration" feature it will be much easier to force people to fix/redesign their apps ? (Either they will fix it, or it will not work at all.) Ivan |
From: <psc...@in...> - 2002-12-13 22:02:41
|
On 13 Dec 2002 at 11:14, Ivan Prenosil wrote: > > From: psc...@in... > > > The main reason for this feature is to cope with software that > > > hasn't been written in a sympathetic way towards sensible > > > transaction length, so implementing in the client seems weaker > > > than having a _database_ parameter. > > > > The software should be fixed, a maximum transaction duration, cures > > the symptom, and not the overall disease, a badly designed app, > > which probably has other problems as well. > > Don't you think that with "Max transaction duration" feature it will > be much easier to force people to fix/redesign their apps ? (Either > they will fix it, or it will not work at all.) Unless of course it's a critical month end application, and it has to be finished by a certain time, and nobody realises the reason it keeps dying is the transaction duration limiter.... |
From: Leyne, S. <sl...@at...> - 2002-12-15 15:17:26
|
Pavel, > Interesting, so server should kill requests in progress when=20 > time ticket expire ? I'd worry to introduce that.=20 But if the implementation, of this proposed feature, had a default of "no timeout" what would be the worry? It seems that most of the "naysayers" are 'up-in-arms' because they are all working on the assumption that the default timeout will be something like 60 secs. Sean |
From: Nando D. <na...@de...> - 2002-12-15 16:44:43
|
Sean, > It seems that most of the "naysayers" are 'up-in-arms' because they are > all working on the assumption that the default timeout will be something > like 60 secs. my impression too. On the other hand, if you look at the situation from a security POV (as Nickolay suggests) then it would make sense to have a restrictive (and thus worrying) default setting. Ciao -- ____ _/\/ando |
From: <psc...@in...> - 2002-12-16 20:21:05
|
On 15 Dec 2002 at 10:14, Leyne, Sean wrote: > Pavel, > > > Interesting, so server should kill requests in progress when > > time ticket expire ? I'd worry to introduce that. > > But if the implementation, of this proposed feature, had a default of > "no timeout" what would be the worry? > > It seems that most of the "naysayers" are 'up-in-arms' because they > are all working on the assumption that the default timeout will be > something like 60 secs. The issue isn't the timeout, or timeout period, but whether this fixes the problem or just masks the problem. It's like a toothache, sure the aspirin takes away the pain (temporarily), but until you go to the dentist and get the tooth fixed, the problem is still there. In this case the stalling of the OAT is the toothache. So what can be done about it, first we need to enumerate all of the ways of causing a transaction to fail in the middle. 1) The client machine crashes. 2) Client machine suffers a power off situation. 3) Network cable or device failure. 4) Application fails to commit/rollback the transaction before ending. 5) Application opens a writable transaction and leaves it open for an extended period of time. For causes 1-3 the engine should be able to detect the failure and recover from it, by rolling back the transaction and marking it as committed. This may mean engine changes so that only updates that can be reliably rolled back are actually written to the database. For cause #4 when the application ends, without a commit or rollback we need to rollback the transaction, and mark it as committed. For cause #5, there really isn't much the engine can do about it, applications should not do this, the solution is to fix the #@!%$ application, I don't care what the application does, it should not behave this way. I think the solution may be in save points and commit retaining which should both allow the engine to throw away the no longer needed back versions, effectively advancing the OAT at least logically, resolving the problem. A timeout may be helpful in making sure that applications are behaving themselves, should it kill the transaction, or just warn someone about it (so they can fix the app), is the real question. |
From: David J. <dav...@di...> - 2002-12-16 20:49:47
|
umm, I think I've been saying repeatedly that _all- the sources I have seen on distributed system design say the __only__ reliable way to detect 1--5 is with timeouts on the server. So far no one has directly disputed this, just argued with the conclusion that to ensure reliability we should implement transaction timeouts on the server. Although many cases of 1--4 can be detected by the current connection keep-alive/timeout feature, I think this clearly relies very heavily on unwarranted assumptions about client architecture and good behavior of clients. david jencks On 2002.12.16 15:19:13 -0500 psc...@in... wrote: > On 15 Dec 2002 at 10:14, Leyne, Sean wrote: > > > Pavel, > > > > > Interesting, so server should kill requests in progress when > > > time ticket expire ? I'd worry to introduce that. > > > > But if the implementation, of this proposed feature, had a default of > > "no timeout" what would be the worry? > > > > It seems that most of the "naysayers" are 'up-in-arms' because they > > are all working on the assumption that the default timeout will be > > something like 60 secs. > > The issue isn't the timeout, or timeout period, but whether this fixes > the problem or > just masks the problem. It's like a toothache, sure the aspirin takes > away the pain > (temporarily), but until you go to the dentist and get the tooth fixed, > the problem is > still there. In this case the stalling of the OAT is the toothache. > > So what can be done about it, first we need to enumerate all of the ways > of causing > a transaction to fail in the middle. > > 1) The client machine crashes. > 2) Client machine suffers a power off situation. > 3) Network cable or device failure. > 4) Application fails to commit/rollback the transaction before ending. > 5) Application opens a writable transaction and leaves it open for an > extended > period of time. > > For causes 1-3 the engine should be able to detect the failure and > recover from it, > by rolling back the transaction and marking it as committed. This may > mean engine > changes so that only updates that can be reliably rolled back are > actually written to > the database. > > For cause #4 when the application ends, without a commit or rollback we > need to > rollback the transaction, and mark it as committed. > > For cause #5, there really isn't much the engine can do about it, > applications should > not do this, the solution is to fix the #@!%$ application, I don't care > what the > application does, it should not behave this way. I think the solution > may be in save > points and commit retaining which should both allow the engine to throw > away the > no longer needed back versions, effectively advancing the OAT at least > logically, > resolving the problem. > > A timeout may be helpful in making sure that applications are behaving > themselves, > should it kill the transaction, or just warn someone about it (so they > can fix the app), > is the real question. > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: > With Great Power, Comes Great Responsibility > Learn to use your power at OSDN's High Performance Computing Channel > http://hpc.devchannel.org/ > _______________________________________________ > Firebird-devel mailing list > Fir...@li... > https://lists.sourceforge.net/lists/listinfo/firebird-devel > > |
From: Roman R. <rro...@ac...> - 2002-12-16 23:24:19
|
Paul, > The issue isn't the timeout, or timeout period, but whether this fixes the > problem or just masks the problem. It's like a toothache, sure the > aspirin takes away the pain (temporarily), but until you go to the dentist > and get the tooth fixed, the problem is still there. In this case the > stalling of the OAT is the toothache. I would say that toothache is the thing we have to cure, and we have a backup/restore medicine for it. However, reason of this toothache is that you did not clean your teeth twice a day. And we want to enforce this teeth cleaning procedure by reliably detecting that teeth were not yet cleaned. > So what can be done about it, first we need to enumerate all of the ways > of causing a transaction to fail in the middle. > > 1) The client machine crashes. > 2) Client machine suffers a power off situation. > 3) Network cable or device failure. > 4) Application fails to commit/rollback the transaction before ending. > 5) Application opens a writable transaction and leaves it open for an > extended period of time. In theory of fault-tolerant systems people define three classes of failures: - failures system can detect and tolerate; - failures system can only detect; - failures system cannot detect. Crash of client machine, physical link failure and ending tx without commit/rollback seems to be failures engine should detect and tolerate (by rolling back tx and performing some additional steps). Question is if we can reliably detect them? As I understood from the discussion we have a failure detector (FD) that sends some keep-alive data to the client with response to the client query. If client fails to respond that keep-alive message, it is suspected (are there any additional checks of the suspected node?). As I understand, this FD is not able to say if the client with a long open transaction is still alive or not, because server never sends a keep-alive message to the client on its own. Am I right? If yes, then problem 5) belongs to the "undetectable faults". And, if yes, isn't it natural to extend server FD to be able to "ping" a client (ping interval and max. pong delay are specified in server config) to detect if client is alive or not? This FD will be server-centric and will need some background thread running and pinging connections. David proposes a different FD scheme where each client has to say to the server "I will be cleaning my teeth at 9:00am and 9:00pm" on begining of the transaction. And if client failes to call server between 9:00 and 9:05 am/pm and say "teeth are clean", server suspects that client without any additional checks. This scheme seems to be more elegant than a busy server asking each client at 9:05 am/pm "hey, did you clean your teeth?"... Difference is like a regime in army and in the kindergarden. Also this scheme tolerates more types of failures (at least problem 5) becomes a "detectable-toleratable fault"). However what is not clear to me if current FD is so bad, that we have to replace it with something new. Can we create a list of failure that currect FD cannot detect? Right now only one type of failure is not detectable: - long-running transaction with open socket. Are there any others? Fixing the client in this case will not solve this problem, because you assume that client software is 100% correct and works as it was intended to work. Unfortunatelly this is not the case and will not be the case for a long time. Each system can fail and fail-stop is most simple class of failures. Byzantine faults are more severe faults, and we have to deal with them on the server. But personally I would just create a list of faults engine cannot tolerate and put it somewhere on the web. As far as I know we are not writing an engine for a nuclear plant, so tolerating fail-stop faults should be enough. Best regards, Roman Rokytskyy __________________________________________________ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com |
From: Pavel C. <pc...@us...> - 2002-12-17 10:26:46
|
Hi, I'd like summarize the problem and draw possible paths we can take re. this issue. First, what's the real problem that trn timeout should fix ? What we want is that runaway transactions and queries do not consume engine resources forever or in uncontrolled manner. Proposed solution is to impose a limit for how long transaction can live (we can also impose a limit for CPU usage, memory usage etc.). We hope that this would cure many reasons for stalled or exhausted engine resources (OAT problem is the most critical one): dead clients, badly written clients, malicious clients and a something I call as self-healing in mission critical systems. But is this approach really a cure for them ? a) We acknowledged that engine can detect dead clients now, and take the right action. Of course, we can improve the detection system to be more precise, take less resources and detect more *dead* conditions. But we definitely don't need to impose timeout to solve dead clients problem, because it's already solved, in more or less satisfactory way. b) Many said that timeout is probably not a good cure for badly written applications. Engine definitely should help developers to identify the problem and may provide a way for administrators to "fix" the *immediate* problem by killing runaway transaction, connection or query, but the real cure is definitely to fix a bad-behaving application. I liked the temporary system tables approach taken by IB7 that allows identify the occurrence of various problems and allow to kill trn, connections or statements. I think that we should analyze its pro's con's and seriously consider to implement them (or anything with equivalent capabilities) in Firebird. c) A malicious code - i.e. *intentionally* bad behaving application - is the real problem that we're not able to solve right now, and that timeout may solve. But it really can ? I have my doubts. It's clear that applications have different needs for system time and resources, so we must provide a way for fine tuning of timeout or other limits (user defined amount at start and renewal). We have to provide an API for that, but that same API would be there also for malicious code to use. If we don't impose (even user configurable) a hard limit for transaction/connection/query time, we will solve nothing that way, and even with a hard limit, we will throw only a small obstacle in the way of malicious code writers. More to that, anything that we do to solve this problem should not make unneeded obstacles to regular developers, and timeout would do. d) Another use of timeout is to help mission-critical systems to not fall on their knees when unexpected problem occurs. These systems are usually very busy systems, and is normal that system will take some actions on its own to minimize the impact of any failure or take an alternate path because people are too slow to react. Timeout may help there (at least it's normal practice as David pointed out), but do we *really* need to use timeouts *in engine* ? Is it possible that an independent monitor app. (it may use temporary system tables or any other API) that would observe and rule out transaction/connections/queries according to user- defined rules would be enough ? It would be definitely more flexible solution, but would be acceptable for such mission-critical systems ? What we didn't take into account in recent discussion is an overhead of any timeout or keep-alive solution. Client-controlled approach seems to scale better than server-controlled one, but server-controlled one seems to be more precise. But both will impose additional overhead in network traffic and system resources. Another angle that was mantioned but not very thoroughly is backward compatibility of any timeout solution with current applications. Solution that would use extended API would solve d), but do not b) and c) problems, other methods would be more or less incompatiblie. So, where we want to go from here ? Best regards Pavel Cisar http://www.ibphoenix.com For all your upto date Firebird and InterBase information |
From: adem <ade...@ex...> - 2002-12-17 14:43:56
|
Hi, I dont know if this is doable but, how about putting in an option something like: "Send an eMail to <a list of eMail address> if any transaction has been running for longer than <time in seconds>" It would be nicer if this were preferably dynamically and remotely alterable. This would be similar to what is in 3ware IDE raid cards that sends an eMail to a prespecified address if it has something to say about its own operation. Our people kust love it --they want all RAID hardware to be 3ware for that (hear that Adaptec! :-). Then, I would next be asking if it would be possible to put in a tiny web server code in FB so that we could do the admin remotely without any 3rd party code, but that would be going overboard <G> -- Cheers, Adem ""Pavel Cisar"" <pc...@us...> wrote in message news:3DFF0B1F.21578.767BA1@localhost... > Hi, > > I'd like summarize the problem and draw possible paths we can take re. > this issue. > > First, what's the real problem that trn timeout should fix ? What we want > is that runaway transactions and queries do not consume engine resources > forever or in uncontrolled manner. Proposed solution is to impose a limit > for how long transaction can live (we can also impose a limit for CPU > usage, memory usage etc.). We hope that this would cure many reasons for > stalled or exhausted engine resources (OAT problem is the most critical > one): dead clients, badly written clients, malicious clients and a > something I call as self-healing in mission critical systems. But is this > approach really a cure for them ? > > a) We acknowledged that engine can detect dead clients now, and take the > right action. Of course, we can improve the detection system to be more > precise, take less resources and detect more *dead* conditions. But we > definitely don't need to impose timeout to solve dead clients problem, > because it's already solved, in more or less satisfactory way. > > b) Many said that timeout is probably not a good cure for badly written > applications. Engine definitely should help developers to identify the > problem and may provide a way for administrators to "fix" the *immediate* > problem by killing runaway transaction, connection or query, but the real > cure is definitely to fix a bad-behaving application. I liked the > temporary system tables approach taken by IB7 that allows identify the > occurrence of various problems and allow to kill trn, connections or > statements. I think that we should analyze its pro's con's and seriously > consider to implement them (or anything with equivalent capabilities) in > Firebird. > > c) A malicious code - i.e. *intentionally* bad behaving application - is > the real problem that we're not able to solve right now, and that timeout > may solve. But it really can ? I have my doubts. It's clear that > applications have different needs for system time and resources, so we > must provide a way for fine tuning of timeout or other limits (user > defined amount at start and renewal). We have to provide an API for that, > but that same API would be there also for malicious code to use. If we > don't impose (even user configurable) a hard limit for > transaction/connection/query time, we will solve nothing that way, and > even with a hard limit, we will throw only a small obstacle in the way of > malicious code writers. More to that, anything that we do to solve this > problem should not make unneeded obstacles to regular developers, and > timeout would do. > > d) Another use of timeout is to help mission-critical systems to not fall > on their knees when unexpected problem occurs. These systems are usually > very busy systems, and is normal that system will take some actions on > its own to minimize the impact of any failure or take an alternate path > because people are too slow to react. Timeout may help there (at least > it's normal practice as David pointed out), but do we *really* need to > use timeouts *in engine* ? Is it possible that an independent monitor > app. (it may use temporary system tables or any other API) that would > observe and rule out transaction/connections/queries according to user- > defined rules would be enough ? It would be definitely more flexible > solution, but would be acceptable for such > mission-critical systems ? > > What we didn't take into account in recent discussion is an overhead of > any timeout or keep-alive solution. Client-controlled approach seems to > scale better than server-controlled one, but server-controlled one seems > to be more precise. But both will impose additional overhead in network > traffic and system resources. > > Another angle that was mantioned but not very thoroughly is backward > compatibility of any timeout solution with current applications. Solution > that would use extended API would solve d), but do not b) and c) > problems, other methods would be more or less incompatiblie. > > So, where we want to go from here ? > > Best regards > Pavel Cisar > http://www.ibphoenix.com > For all your upto date Firebird and > InterBase information > > > > ------------------------------------------------------- > This sf.net email is sponsored by: > With Great Power, Comes Great Responsibility > Learn to use your power at OSDN's High Performance Computing Channel > http://hpc.devchannel.org/ > _______________________________________________ > Firebird-devel mailing list > Fir...@li... > https://lists.sourceforge.net/lists/listinfo/firebird-devel > |
From: Ann W. H. <aha...@ib...> - 2002-12-17 17:21:49
|
At 11:31 AM 12/17/2002 +0100, Pavel Cisar wrote: >So, where we want to go from here ? My suggestion is that we look at transaction monitoring - see what InterBase has done. It handles both the malicious user and the idiot. Regards, Ann www.ibphoenix.com We have answers. |
From: <psc...@in...> - 2002-12-17 12:39:29
|
On 16 Dec 2002 at 15:48, David Jencks wrote: > umm, I think I've been saying repeatedly that _all- the sources I have > seen on distributed system design say the __only__ reliable way to > detect 1--5 is with timeouts on the server. So far no one has > directly disputed this, just argued with the conclusion that to ensure > reliability we should implement transaction timeouts on the server. > > Although many cases of 1--4 can be detected by the current connection > keep-alive/timeout feature, I think this clearly relies very heavily > on unwarranted assumptions about client architecture and good behavior > of clients. As far as I can tell there are two methods of handling whether a network transaction is still alive, either we assume it is, and ask it, if we get no reponse we assume it died. The other method, is to assume that if it is really old, that it's probably died without falling over, and we make sure by driving a stake through it's heart, otherwise known as the timeout method. For network design, and networking the second method may seem to be the better of the two. However in this case our primary concern isn't the network, it's data reliability and performance. I thought the whole process got started because a stuck OAT affects performance. The real issue is how to prevent the OAT from getting stuck in the first place. I don't mind having BOTH methods in place, as long as the timeout can be disabled, what I would like to see is this: Two settings, first is the timeout period, something like a months worth as a maximum, expressed in seconds, should be sufficient. Second is a setting about what to do about it. several options here: 1) Ignore it 2) Warning this would send a message of some kind to a DBA 3) Kill 4) Ignore Read only read committed These would be additive, so for example if you combine warn, kill, ignore RORC then it would warn, wait the same period again and then kill, ignoring all read only read committed transactions. |
From: Christian P. <c_p...@ya...> - 2002-12-12 01:48:01
|
Thank you Lean > Please note that this does NOT mean that it will be implemented (he says > to protect himself from the volley which might ensue). Is important for me at lest that it will be considered. |