Thread: RE: [Firebird-devel] Max transaction duration

A powerful, cross platform, SQL database system

Brought to you by: alexpeshkoff, asfernandes, awharrison, cincura_net, and 5 others

firebird-devel

RE: [Firebird-devel] Max transaction duration

From: Leyne, S. <sl...@at...> - 2002-12-11 20:32:14

Christian,

I have noted this request for addition to the Feature Request list.


Please note that this does NOT mean that it will be implemented (he says
to protect himself from the volley which might ensue).



Sean

[Firebird-devel] Re: Max transaction duration

From: Juarez R. <ju...@mp...> - 2002-12-11 21:03:09

- Default duration defined at user level to be able to override the server's
default configuration.

Maybe this will be registered as "feature request" until someone decide
contribute with time, skill or money!!!? But is a good one.

----- Original Message -----
Sent: Wednesday, December 11, 2002 5:01 PM

I, too, like the idea of a maximum transaction duration.

Here's one way of designing it:
- Default duration defined at the server level, so when creating a new
database it will use that default duration.
- Default duration defined at the database level to be able to
override the server's default duration.
- Pass the duration upon connection to override the default duration,
so that in some circumstances the duration can be extended to perform
i.e. the year-end processing/reports.

I know that this might require some changes to the API.

Just my 2 cents.

--
Best regards,
 Daniel Rail

Re: [Firebird-devel] Re: Max transaction duration

From: Ann W. H. <aha...@ib...> - 2002-12-12 15:57:40

At 07:01 PM 12/11/2002 -0300, Juarez Rudsatz wrote:
>- Default duration defined at user level to be able to override the server's
>default configuration.

If the problem is users who discover an interactive query
tool and leave transactions open for days, then requiring
that the user transaction limit itself is probably not going
to work.

>Here's one way of designing it:
>- Default duration defined at the server level, so when creating a new
>database it will use that default duration.

That bothers me somewhat, probably because I'm not entirely
happy with the server model starting with server logins.  Yes,
sometimes a single machine will host a number of related databases.
More often, just a single database.  And sometimes, a number of
unrelated databases.  That case really can't be supported now.

>- Default duration defined at the database level
>- Pass the duration upon connection
>
>I know that this might require some changes to the API.

Actually that can all be done with connection and transaction
parameters - yes, we'd need to add parameters, but that doesn't
fundamentally change the API

Regards,

Ann
www.ibphoenix.com
We have answers.

[Firebird-devel] Re: Max transaction duration

From: Juarez R. <ju...@mp...> - 2002-12-12 17:04:41

I see 3 types of problem here:

1) The political :
When the user discover a interactive tool. Solution : Just buy a gun and
kill him.

2) The most political :
You need give different power for users. if Joe works on reception and want
find 'Mary' in all database he will waste resources from others users. The
problem is where the developer cannot guess the time involved, so give him
one minute. But if Jim must show for 'Bill' the end-of-year report, he
cannot be simply stopped.
Finally this dicussion will go to features 'kill' and 'nice' implemented.

3) The login model.
Really there are more than one need for a database. Monouser applications
and webservers dificultly need a login. And in the second case login can be
a performance penalty in a web context.

----- Original Message -----
From: "Ann W. Harrison" <aha...@ib...>
To: "Juarez Rudsatz" <ju...@mp...>;
<fir...@li...>
Sent: Thursday, December 12, 2002 12:58 PM
Subject: Re: [Firebird-devel] Re: Max transaction duration

At 07:01 PM 12/11/2002 -0300, Juarez Rudsatz wrote:
>- Default duration defined at user level to be able to override the
server's
>default configuration.

If the problem is users who discover an interactive query
tool and leave transactions open for days, then requiring
that the user transaction limit itself is probably not going
to work.

>Here's one way of designing it:
>- Default duration defined at the server level, so when creating a new
>database it will use that default duration.

That bothers me somewhat, probably because I'm not entirely
happy with the server model starting with server logins.  Yes,
sometimes a single machine will host a number of related databases.
More often, just a single database.  And sometimes, a number of
unrelated databases.  That case really can't be supported now.

>- Default duration defined at the database level
>- Pass the duration upon connection
>
>I know that this might require some changes to the API.

Actually that can all be done with connection and transaction
parameters - yes, we'd need to add parameters, but that doesn't
fundamentally change the API

Regards,

Ann
www.ibphoenix.com
We have answers.

RE: [Firebird-devel] Max transaction duration

From: Leyne, S. <sl...@at...> - 2002-12-13 00:51:36

Paul et al,

At this point, I think should remind everyone that the request was for
the *ability* to set a transaction timeout value.

Nothing was ever said to suggest by enabling this feature, that the
engine would by 'default' start closeing connections.

I think of this as equivalent to Nickolay's new explicit locking feature
in v1.5.  It's there, if you don't want to use, fine no one is going to
force you to.  but if it helps deal with a bad software/problem (maybe
you don't have the source), what's the harm?


Sean

RE: [Firebird-devel] Max transaction duration

From: Pavel C. <pc...@us...> - 2002-12-13 09:17:08

Sean,

On 12 Dec 2002 at 19:48, Leyne, Sean wrote:

> At this point, I think should remind everyone that the request was for
> the *ability* to set a transaction timeout value.
> 
> Nothing was ever said to suggest by enabling this feature, that the
> engine would by 'default' start closeing connections.
> 
> I think of this as equivalent to Nickolay's new explicit locking feature
> in v1.5.  It's there, if you don't want to use, fine no one is going to
> force you to.  but if it helps deal with a bad software/problem (maybe
> you don't have the source), what's the harm?

Well, but how about the code bloat and new potential points of failure ? 
The small footprint and compact code and behaviour is one from major FB 
features that are easy to lose, but hard to achieve.

I have nothing against new features, if they are reasonable. For me, 
"reasonable" means (at least) one point from:

1) Satisfy a general requirement

2) Satisfy an occasional requirement of many that's hard to achieve by 
other ways effectively

3) Satisfy general requirement of few that's hard to achieve by other 
ways effectively

The transaction timeout seems to fall in third category, but I'm not 
convinced that it's not doable effectively "outside the engine".

Best regards
Pavel Cisar
http://www.ibphoenix.com
For all your upto date Firebird and
InterBase information

Re: [Firebird-devel] Max transaction duration

From: David J. <dav...@di...> - 2002-12-13 13:43:49

On 2002.12.13 04:21:59 -0500 Pavel Cisar wrote:
> Sean,
> 
> On 12 Dec 2002 at 19:48, Leyne, Sean wrote:
> 
> > At this point, I think should remind everyone that the request was for
> > the *ability* to set a transaction timeout value.
> > 
> > Nothing was ever said to suggest by enabling this feature, that the
> > engine would by 'default' start closeing connections.
> > 
> > I think of this as equivalent to Nickolay's new explicit locking
> feature
> > in v1.5.  It's there, if you don't want to use, fine no one is going to
> > force you to.  but if it helps deal with a bad software/problem (maybe
> > you don't have the source), what's the harm?
> 
> Well, but how about the code bloat and new potential points of failure ? 
> The small footprint and compact code and behaviour is one from major FB 
> features that are easy to lose, but hard to achieve.
> 
> I have nothing against new features, if they are reasonable. For me, 
> "reasonable" means (at least) one point from:
> 
> 1) Satisfy a general requirement

Timeouts seem to be a general requirement for reliable distributed system
design.  I havne't seen any attempts to design a reliabe distributed system
that don't use either an explicity timeout (xa distributed transaction
model, JINI) or an implicit timeout (heartbeat, ping).  It is certainly an
explicit requirement of the xa transaction model.
> 
> 2) Satisfy an occasional requirement of many that's hard to achieve by 
> other ways effectively
> 
> 3) Satisfy general requirement of few that's hard to achieve by other 
> ways effectively
> 
> The transaction timeout seems to fall in third category, but I'm not 
> convinced that it's not doable effectively "outside the engine".

Since, as noted above, all designers of reliable distributed systems seem
to rely on timeouts to determine remote node failures, I think it's up to
you to demonstrate that there are other solutions that provide the same
degree of automatic recovery from remote failure.

Firebird seems to be the only open source database that supports 2pc and
can be maneuvered into supporting almost all of xa: therefore IMO it's the
only plausible open source choice to use in j2ee apps with more than one
resource manager.  I don't really understand the point of arguing against
industry standard features for this environment.

I would have asked for this feature a long time ago if I had fully
appreciated the role of timeouts in reliability.  As it is, I only really
started to appreciate it when working on making the jboss transaction
manager recoverable.

thanks
david jencks
> 
> Best regards
> Pavel Cisar
> http://www.ibphoenix.com
> For all your upto date Firebird and
> InterBase information
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:
> With Great Power, Comes Great Responsibility 
> Learn to use your power at OSDN's High Performance Computing Channel
> http://hpc.devchannel.org/
> _______________________________________________
> Firebird-devel mailing list
> Fir...@li...
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
> 
>

Re: [Firebird-devel] Max transaction duration

From: Phil S. <ph...@sh...> - 2002-12-13 14:22:56

Hi,

I have been following this thread, and I think there is a 'bluring' between 
Connections and Transactions.

When a 'Client' crashes, or the LAN cable is pulled out etc, Firebird will 
(eventually) notice, close the connection and rollback any open transactions.

Why people are so passinate about transactions is because they can be a major 
cause of performance and other issues with FB.

The majority of performance problems on the lists/newsgroups are down to 
'incorrect' use of transactions.  Long running transactions are not a good 
thing in FB, and whilst sometimes they can't be avoided (end of month batch 
runs etc.), the majority can.

The problem is most evident when people move to FB from 'other databases', 
and the first app they write starts a transaction in the morning and they 
fill an 'editable grid' with data, and then wonder why the whole system has 
ground to a halt by lunch.

In my opinion, in a properly designed system, transactions should only be 
open when reading or writing to the database.  So I see no reason at all to 
have 'transaction timeouts', as if a transaction is open, something is being 
written to, or read from the database, and any time out will abort the 
operation.


Phil

Re: [Firebird-devel] Max transaction duration

From: David J. <dav...@di...> - 2002-12-13 15:17:17

On 2002.12.13 07:54:20 -0500 Phil Shrimpton wrote:
> Hi,
> 
> I have been following this thread, and I think there is a 'bluring'
> between 
> Connections and Transactions.
> 
> When a 'Client' crashes, or the LAN cable is pulled out etc, Firebird
> will 
> (eventually) notice, close the connection and rollback any open
> transactions.

Right, that may have been a bad example.  Can you prove that any client
side failure that results in the client ceasing to communicate commits to
its open transactions will quickly result in connection failure?  I can't,
so I think the server should be able to act unilaterally when transactions
get too old.

> 
> Why people are so passinate about transactions is because they can be a
> major 
> cause of performance and other issues with FB.
> 
> The majority of performance problems on the lists/newsgroups are down to 
> 'incorrect' use of transactions.  Long running transactions are not a
> good 
> thing in FB, and whilst sometimes they can't be avoided (end of month
> batch 
> runs etc.), the majority can.
> 
> The problem is most evident when people move to FB from 'other
> databases', 
> and the first app they write starts a transaction in the morning and they
> 
> fill an 'editable grid' with data, and then wonder why the whole system
> has 
> ground to a halt by lunch.
> 
> In my opinion, in a properly designed system, transactions should only be
> 
> open when reading or writing to the database.  So I see no reason at all
> to 
> have 'transaction timeouts', as if a transaction is open, something is
> being 
> written to, or read from the database, and any time out will abort the 
> operation.

The purpose of timeouts is not to interfere with peoples work, but to
provide a way to automatically recover from remote node  failures.  Some of
these are taken care of by "broken connection detection" but I'm not
convinced all possible failures will be detected this way.  Since
transaction timeouts appear to be an industry standard (specified in xa)
and extremely useful I'm having trouble understanding why there is so much
opposition.  I'm certainly not suggesting that the default behavior change,
just that it be possible to supply a transaction timeout with transactions
so that for instance distributed transaction managers can make use of it. 
I don't think this would affect delphi users who wish to allow
indefinite-length browsing sessions with  open transactions, but it would
provide the industry standard failure recovery for those who need it.

thanks
david jencks

> 
> 
> Phil
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:
> With Great Power, Comes Great Responsibility 
> Learn to use your power at OSDN's High Performance Computing Channel
> http://hpc.devchannel.org/
> _______________________________________________
> Firebird-devel mailing list
> Fir...@li...
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
> 
>

Re: [Firebird-devel] Max transaction duration

From: Phil S. <ph...@sh...> - 2002-12-13 16:15:34

On Friday 13 December 2002 15:16, David Jencks wrote:

Hi,

> > When a 'Client' crashes, or the LAN cable is pulled out etc, Firebird
> > will
> > (eventually) notice, close the connection and rollback any open
> > transactions.
>
> Can you prove that any client
> side failure that results in the client ceasing to communicate commits to
> its open transactions will quickly result in connection failure? 

FB will, eventuallty, close the connection, wether it is quick enougth or not 
is another story, a quick test just now removing the network cable and 
looking at the DB stats indicate it took just over a minute to 
close/rollback.  

> The purpose of timeouts is not to interfere with peoples work, but to
> provide a way to automatically recover from remote node  failures.  Some of
> these are taken care of by "broken connection detection" but I'm not
> convinced all possible failures will be detected this way. 

I would argue that all of these failures should be handled by "broken 
connection detection", and if they are not, that area of the code should be 
improved rather than adding another 'feature' to work around it.  

> Since
> transaction timeouts appear to be an industry standard (specified in xa)
> and extremely useful I'm having trouble understanding why there is so much
> opposition. 

I don't know the xa spec, but I think the reason for the opposition is the 
way FB transactions 'should be used', which is, in my view, any open, idle 
transactions are a design flaw of the 'client' application.

You have a valid comment about 'client' crashes not being detected, but I 
think for those cases the 'connection detection' should be improved.

> I'm certainly not suggesting that the default behavior change,
> just that it be possible to supply a transaction timeout with transactions

Surely, when you start a transaction, you know that you are going to 
commit/rollback a 'few lines of code' later after you have inserted a record 
or something, so unless you think you might forget to put a 'commit' in the 
code, I can't see a valid reason for a timeout paramter

> I don't think this would affect delphi users who wish to allow
> indefinite-length browsing sessions with  open transactions,

Anything to prevent this would be a good thing IMO

Phil

-- 
Linux 2.4.4-4GB
  2:00pm  up 9 days, 20:27,  1 user,  load average: 0.17, 0.08, 0.01

Re[2]: [Firebird-devel] Max transaction duration

From: Nickolay S. <sk...@bs...> - 2002-12-13 15:10:48

Hello, All !

Friday, December 13, 2002, 3:54:20 PM, you wrote:

Allowing long-running transactions to kill database operation is
actually a security problem because user essentially without rights
can kill database operation.

Oracle and MSSQL solve this problem by allowing to define several
limits for roles (with database and server-wide default) like maximum
transaction duration, CPU time per query, etc. Maximum duration of
serializable transaction in Oracle is limited by the size of undo
tablespace (where record back versions are stored).

Current Firebird implemenation is insecure. You can kill the server
by passing invalid handle to it or to execute whatever you need
binary code on the server side tweaking with handles of deleted
objects or using one of the known buffer overflows.

My opinion on this issue is that it should go to security issues queue
and we should make a feature to prevent DOS-attacks by starting
transactions, but its priority in this queue should not be too high.

> Hi,

> I have been following this thread, and I think there is a 'bluring' between 
> Connections and Transactions.

> When a 'Client' crashes, or the LAN cable is pulled out etc, Firebird will 
> (eventually) notice, close the connection and rollback any open transactions.

> Why people are so passinate about transactions is because they can be a major 
> cause of performance and other issues with FB.

> The majority of performance problems on the lists/newsgroups are down to 
> 'incorrect' use of transactions.  Long running transactions are not a good 
> thing in FB, and whilst sometimes they can't be avoided (end of month batch 
> runs etc.), the majority can.

> The problem is most evident when people move to FB from 'other databases', 
> and the first app they write starts a transaction in the morning and they 
> fill an 'editable grid' with data, and then wonder why the whole system has 
> ground to a halt by lunch.

> In my opinion, in a properly designed system, transactions should only be 
> open when reading or writing to the database.  So I see no reason at all to 
> have 'transaction timeouts', as if a transaction is open, something is being 
> written to, or read from the database, and any time out will abort the 
> operation.

> Phil

-- 
Best regards,
 Nickolay Samofatov                        mailto:sk...@bs...

Re: [Firebird-devel] Max transaction duration

From: Ivan P. <pre...@ms...> - 2002-12-13 10:43:58

> From: psc...@in...
> > The main reason for this feature is to cope with software that hasn't
> > been written in a sympathetic way towards sensible transaction length,
> > so implementing in the client seems weaker than having a _database_
> > parameter.
> 
> The software should be fixed, a maximum transaction duration, cures the symptom, 
> and not the overall disease, a badly designed app, which probably has other 
> problems as well.

Don't you think that with "Max transaction duration" feature it will be much easier
to force people to fix/redesign their apps ?
(Either they will fix it, or it will not work at all.)

Ivan

Re: [Firebird-devel] Max transaction duration

From: <psc...@in...> - 2002-12-13 22:02:41

On 13 Dec 2002 at 11:14, Ivan Prenosil wrote:

> > From: psc...@in...
> > > The main reason for this feature is to cope with software that
> > > hasn't been written in a sympathetic way towards sensible
> > > transaction length, so implementing in the client seems weaker
> > > than having a _database_ parameter.
> > 
> > The software should be fixed, a maximum transaction duration, cures
> > the symptom, and not the overall disease, a badly designed app,
> > which probably has other problems as well.
> 
> Don't you think that with "Max transaction duration" feature it will
> be much easier to force people to fix/redesign their apps ? (Either
> they will fix it, or it will not work at all.)

Unless of course it's a critical month end application, and it has to be finished by a 
certain time, and nobody realises the reason it keeps dying is the transaction 
duration limiter....

RE: [Firebird-devel] Max transaction duration

From: Leyne, S. <sl...@at...> - 2002-12-15 15:17:26

Pavel,

> Interesting, so server should kill requests in progress when=20
> time ticket expire ? I'd worry to introduce that.=20

But if the implementation, of this proposed feature, had a default of
"no timeout" what would be the worry?

It seems that most of the "naysayers" are 'up-in-arms' because they are
all working on the assumption that the default timeout will be something
like 60 secs.


Sean

Re: [Firebird-devel] Max transaction duration

From: Nando D. <na...@de...> - 2002-12-15 16:44:43

Sean,

> It seems that most of the "naysayers" are 'up-in-arms' because they are
> all working on the assumption that the default timeout will be something
> like 60 secs.

my impression too. On the other hand, if you look at the situation from
a security POV (as Nickolay suggests) then it would make sense to have a
restrictive (and thus worrying) default setting.
Ciao
-- 
    ____
_/\/ando

RE: [Firebird-devel] Max transaction duration

From: <psc...@in...> - 2002-12-16 20:21:05

On 15 Dec 2002 at 10:14, Leyne, Sean wrote:

> Pavel,
> 
> > Interesting, so server should kill requests in progress when 
> > time ticket expire ? I'd worry to introduce that. 
> 
> But if the implementation, of this proposed feature, had a default of
> "no timeout" what would be the worry?
> 
> It seems that most of the "naysayers" are 'up-in-arms' because they
> are all working on the assumption that the default timeout will be
> something like 60 secs.

The issue isn't the timeout, or timeout period, but whether this fixes the problem or 
just masks the problem.  It's like a toothache, sure the aspirin takes away the pain 
(temporarily), but until you go to the dentist and get the tooth fixed, the problem is 
still there.  In this case the stalling of the OAT is the toothache.

So what can be done about it, first we need to enumerate all of the ways of causing 
a transaction to fail in the middle.  

1) The client machine crashes.
2) Client machine suffers a power off situation.
3) Network cable or device failure.
4) Application fails to commit/rollback the transaction before ending.
5) Application opens a writable transaction and leaves it open for an extended 
period of time.

For causes 1-3 the engine should be able to detect the failure and recover from it, 
by rolling back the transaction and marking it as committed.  This may mean engine 
changes so that only updates that can be reliably rolled back are actually written to 
the database.

For cause #4 when the application ends, without a commit or rollback we need to 
rollback the transaction, and mark it as committed.  

For cause #5, there really isn't much the engine can do about it, applications should 
not do this, the solution is to fix the #@!%$ application, I don't care what the 
application does, it should not behave this way.  I think the solution may be in save 
points and commit retaining which should both allow the engine to throw away the 
no longer needed back versions, effectively advancing the OAT at least logically, 
resolving the problem.  

A timeout may be helpful in making sure that applications are behaving themselves, 
should it kill the transaction, or just warn someone about it (so they can fix the app), 
is the real question.

Re: [Firebird-devel] Max transaction duration

From: David J. <dav...@di...> - 2002-12-16 20:49:47

umm, I think I've been saying repeatedly that _all- the sources I have seen
on distributed system design say the __only__ reliable way to detect 1--5
is with timeouts on the server.  So far no one has directly disputed this,
just argued with the conclusion that to ensure reliability we should
implement transaction timeouts on the server.

Although many cases of 1--4 can be detected by the current connection
keep-alive/timeout feature, I think this clearly relies very heavily on
unwarranted assumptions about client architecture and good behavior of
clients.

david jencks


On 2002.12.16 15:19:13 -0500 psc...@in... wrote:
> On 15 Dec 2002 at 10:14, Leyne, Sean wrote:
> 
> > Pavel,
> > 
> > > Interesting, so server should kill requests in progress when 
> > > time ticket expire ? I'd worry to introduce that. 
> > 
> > But if the implementation, of this proposed feature, had a default of
> > "no timeout" what would be the worry?
> > 
> > It seems that most of the "naysayers" are 'up-in-arms' because they
> > are all working on the assumption that the default timeout will be
> > something like 60 secs.
> 
> The issue isn't the timeout, or timeout period, but whether this fixes
> the problem or 
> just masks the problem.  It's like a toothache, sure the aspirin takes
> away the pain 
> (temporarily), but until you go to the dentist and get the tooth fixed,
> the problem is 
> still there.  In this case the stalling of the OAT is the toothache.
> 
> So what can be done about it, first we need to enumerate all of the ways
> of causing 
> a transaction to fail in the middle.  
> 
> 1) The client machine crashes.
> 2) Client machine suffers a power off situation.
> 3) Network cable or device failure.
> 4) Application fails to commit/rollback the transaction before ending.
> 5) Application opens a writable transaction and leaves it open for an
> extended 
> period of time.
> 
> For causes 1-3 the engine should be able to detect the failure and
> recover from it, 
> by rolling back the transaction and marking it as committed.  This may
> mean engine 
> changes so that only updates that can be reliably rolled back are
> actually written to 
> the database.
> 
> For cause #4 when the application ends, without a commit or rollback we
> need to 
> rollback the transaction, and mark it as committed.  
> 
> For cause #5, there really isn't much the engine can do about it,
> applications should 
> not do this, the solution is to fix the #@!%$ application, I don't care
> what the 
> application does, it should not behave this way.  I think the solution
> may be in save 
> points and commit retaining which should both allow the engine to throw
> away the 
> no longer needed back versions, effectively advancing the OAT at least
> logically, 
> resolving the problem.  
> 
> A timeout may be helpful in making sure that applications are behaving
> themselves, 
> should it kill the transaction, or just warn someone about it (so they
> can fix the app), 
> is the real question.
> 
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:
> With Great Power, Comes Great Responsibility 
> Learn to use your power at OSDN's High Performance Computing Channel
> http://hpc.devchannel.org/
> _______________________________________________
> Firebird-devel mailing list
> Fir...@li...
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
> 
>

Re: [Firebird-devel] Max transaction duration

From: Roman R. <rro...@ac...> - 2002-12-16 23:24:19

Paul,

> The issue isn't the timeout, or timeout period, but whether this fixes the
> problem or just masks the problem.  It's like a toothache, sure the
> aspirin takes away the pain (temporarily), but until you go to the dentist
> and get the tooth fixed, the problem is still there.  In this case the
> stalling of the OAT is the toothache.

I would say that toothache is the thing we have to cure, and we have a
backup/restore medicine for it. However, reason of this toothache is that
you did not clean your teeth twice a day. And we want to enforce this teeth
cleaning procedure by reliably detecting that teeth were not yet cleaned.

> So what can be done about it, first we need to enumerate all of the ways
> of causing a transaction to fail in the middle.
>
> 1) The client machine crashes.
> 2) Client machine suffers a power off situation.
> 3) Network cable or device failure.
> 4) Application fails to commit/rollback the transaction before ending.
> 5) Application opens a writable transaction and leaves it open for an
> extended period of time.

In theory of fault-tolerant systems people define three classes of failures:
- failures system can detect and tolerate;
- failures system can only detect;
- failures system cannot detect.

Crash of client machine, physical link failure and ending tx without
commit/rollback seems to be failures engine should detect and tolerate (by
rolling back tx and performing some additional steps).

Question is if we can reliably detect them?

As I understood from the discussion we have a failure detector (FD) that
sends some keep-alive data to the client with response to the client query.
If client fails to respond that keep-alive message, it is suspected (are
there any additional checks of the suspected node?).

As I understand, this FD is not able to say if the client with a long open
transaction is still alive or not, because server never sends a keep-alive
message to the client on its own. Am I right?

If yes, then problem 5) belongs to the "undetectable faults". And, if yes,
isn't it natural to extend server FD to be able to "ping" a client (ping
interval and max. pong delay are specified in server config) to detect if
client is alive or not?

This FD will be server-centric and will need some background thread running
and pinging connections.

David proposes a different FD scheme where each client has to say to the
server "I will be cleaning my teeth at 9:00am and 9:00pm" on begining of the
transaction. And if client failes to call server between 9:00 and 9:05 am/pm
and say "teeth are clean", server suspects that client without any
additional checks.

This scheme seems to be more elegant than a busy server asking each client
at 9:05 am/pm "hey, did you clean your teeth?"... Difference is like a
regime in army and in the kindergarden. Also this scheme tolerates more
types of failures (at least problem 5) becomes a "detectable-toleratable
fault").

However what is not clear to me if current FD is so bad, that we have to
replace it with something new. Can we create a list of failure that currect
FD cannot detect?

Right now only one type of failure is not detectable:

- long-running transaction with open socket.

Are there any others?

Fixing the client in this case will not solve this problem, because you
assume that client software is 100% correct and works as it was intended to
work. Unfortunatelly this is not the case and will not be the case for a
long time. Each system can fail and fail-stop is most simple class of
failures. Byzantine faults are more severe faults, and we have to deal with
them on the server.

But personally I would just create a list of faults engine cannot tolerate
and put it somewhere on the web. As far as I know we are not writing an
engine for a nuclear plant, so tolerating fail-stop faults should be enough.

Best regards,
Roman Rokytskyy

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

Re: [Firebird-devel] Max transaction duration

From: Pavel C. <pc...@us...> - 2002-12-17 10:26:46

Hi,

I'd like summarize the problem and draw possible paths we can take re. 
this issue.

First, what's the real problem that trn timeout should fix ? What we want 
is that runaway transactions and queries do not consume engine resources 
forever or in uncontrolled manner. Proposed solution is to impose a limit 
for how long transaction can live (we can also impose a limit for CPU 
usage, memory usage etc.). We hope that this would cure many reasons for 
stalled or exhausted engine resources (OAT problem is the most critical 
one): dead clients, badly written clients, malicious clients and a 
something I call as self-healing in mission critical systems. But is this 
approach really a cure for them ?

a) We acknowledged that engine can detect dead clients now, and take the 
right action. Of course, we can improve the detection system to be more 
precise, take less resources and detect more *dead* conditions. But we 
definitely don't need to impose timeout to solve dead clients problem, 
because it's already solved, in more or less satisfactory way.

b) Many said that timeout is probably not a good cure for badly written 
applications. Engine definitely should help developers to identify the 
problem and may provide a way for administrators to "fix" the *immediate* 
problem by killing runaway transaction, connection or query, but the real 
cure is definitely to fix a bad-behaving application. I liked the 
temporary system tables approach taken by IB7 that allows identify the 
occurrence of various problems and allow to kill trn, connections or 
statements. I think that we should analyze its pro's con's and seriously 
consider to implement them (or anything with equivalent capabilities) in 
Firebird. 

c) A malicious code - i.e. *intentionally* bad behaving application - is 
the real problem that we're not able to solve right now, and that timeout 
may solve. But it really can ? I have my doubts. It's clear that 
applications have different needs for system time and resources, so we 
must provide a way for fine tuning of timeout or other limits (user 
defined amount at start and renewal). We have to provide an API for that, 
but that same API would be there also for malicious code to use. If we 
don't impose (even user configurable) a hard limit for 
transaction/connection/query time, we will solve nothing that way, and 
even with a hard limit, we will throw only a small obstacle in the way of 
malicious code writers. More to that, anything that we do to solve this 
problem should not make unneeded obstacles to regular developers, and 
timeout would do.

d) Another use of timeout is to help mission-critical systems to not fall 
on their knees when unexpected problem occurs. These systems are usually 
very busy systems, and is normal that system will take some actions on 
its own to minimize the impact of any failure or take an alternate path 
because people are too slow to react. Timeout may help there (at least 
it's normal practice as David pointed out), but do we *really* need to 
use timeouts *in engine* ? Is it possible that an independent monitor 
app. (it may use temporary system tables or any other API) that would 
observe and rule out transaction/connections/queries according to user-
defined rules would be enough ? It would be definitely more flexible 
solution, but would be acceptable for such 
mission-critical systems ?

What we didn't take into account in recent discussion is an overhead of 
any timeout or keep-alive solution. Client-controlled approach seems to 
scale better than server-controlled one, but server-controlled one seems 
to be more precise. But both will impose additional overhead in network 
traffic and system resources.

Another angle that was mantioned but not very thoroughly is backward 
compatibility of any timeout solution with current applications. Solution 
that would use extended API would solve d), but do not b) and c) 
problems, other methods would be more or less incompatiblie.

So, where we want to go from here ?

Best regards
Pavel Cisar
http://www.ibphoenix.com
For all your upto date Firebird and
InterBase information

Re: [Firebird-devel] Max transaction duration

From: adem <ade...@ex...> - 2002-12-17 14:43:56

Hi,

I dont know if this is doable but, how about putting in an option
something like:

    "Send an eMail to <a list of eMail address> if any transaction
    has been running for longer than <time in seconds>"

It would be nicer if this were preferably dynamically and remotely
alterable.

This would be similar to what is in 3ware IDE raid cards that sends
an eMail to a prespecified address if it has something to say about
its own operation. Our people kust love it --they want all RAID
hardware to be 3ware for that (hear that Adaptec! :-).

Then, I would next be asking if it would be possible to put in a
tiny web server code in FB so that we could do the admin remotely
without any 3rd party code, but that would be going overboard <G>
--
Cheers,
Adem

""Pavel Cisar"" <pc...@us...> wrote in message
news:3DFF0B1F.21578.767BA1@localhost...
> Hi,
>
> I'd like summarize the problem and draw possible paths we can take re.
> this issue.
>
> First, what's the real problem that trn timeout should fix ? What we want
> is that runaway transactions and queries do not consume engine resources
> forever or in uncontrolled manner. Proposed solution is to impose a limit
> for how long transaction can live (we can also impose a limit for CPU
> usage, memory usage etc.). We hope that this would cure many reasons for
> stalled or exhausted engine resources (OAT problem is the most critical
> one): dead clients, badly written clients, malicious clients and a
> something I call as self-healing in mission critical systems. But is this
> approach really a cure for them ?
>
> a) We acknowledged that engine can detect dead clients now, and take the
> right action. Of course, we can improve the detection system to be more
> precise, take less resources and detect more *dead* conditions. But we
> definitely don't need to impose timeout to solve dead clients problem,
> because it's already solved, in more or less satisfactory way.
>
> b) Many said that timeout is probably not a good cure for badly written
> applications. Engine definitely should help developers to identify the
> problem and may provide a way for administrators to "fix" the *immediate*
> problem by killing runaway transaction, connection or query, but the real
> cure is definitely to fix a bad-behaving application. I liked the
> temporary system tables approach taken by IB7 that allows identify the
> occurrence of various problems and allow to kill trn, connections or
> statements. I think that we should analyze its pro's con's and seriously
> consider to implement them (or anything with equivalent capabilities) in
> Firebird.
>
> c) A malicious code - i.e. *intentionally* bad behaving application - is
> the real problem that we're not able to solve right now, and that timeout
> may solve. But it really can ? I have my doubts. It's clear that
> applications have different needs for system time and resources, so we
> must provide a way for fine tuning of timeout or other limits (user
> defined amount at start and renewal). We have to provide an API for that,
> but that same API would be there also for malicious code to use. If we
> don't impose (even user configurable) a hard limit for
> transaction/connection/query time, we will solve nothing that way, and
> even with a hard limit, we will throw only a small obstacle in the way of
> malicious code writers. More to that, anything that we do to solve this
> problem should not make unneeded obstacles to regular developers, and
> timeout would do.
>
> d) Another use of timeout is to help mission-critical systems to not fall
> on their knees when unexpected problem occurs. These systems are usually
> very busy systems, and is normal that system will take some actions on
> its own to minimize the impact of any failure or take an alternate path
> because people are too slow to react. Timeout may help there (at least
> it's normal practice as David pointed out), but do we *really* need to
> use timeouts *in engine* ? Is it possible that an independent monitor
> app. (it may use temporary system tables or any other API) that would
> observe and rule out transaction/connections/queries according to user-
> defined rules would be enough ? It would be definitely more flexible
> solution, but would be acceptable for such
> mission-critical systems ?
>
> What we didn't take into account in recent discussion is an overhead of
> any timeout or keep-alive solution. Client-controlled approach seems to
> scale better than server-controlled one, but server-controlled one seems
> to be more precise. But both will impose additional overhead in network
> traffic and system resources.
>
> Another angle that was mantioned but not very thoroughly is backward
> compatibility of any timeout solution with current applications. Solution
> that would use extended API would solve d), but do not b) and c)
> problems, other methods would be more or less incompatiblie.
>
> So, where we want to go from here ?
>
> Best regards
> Pavel Cisar
> http://www.ibphoenix.com
> For all your upto date Firebird and
> InterBase information
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:
> With Great Power, Comes Great Responsibility
> Learn to use your power at OSDN's High Performance Computing Channel
> http://hpc.devchannel.org/
> _______________________________________________
> Firebird-devel mailing list
> Fir...@li...
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
>

Re: [Firebird-devel] Max transaction duration

From: Ann W. H. <aha...@ib...> - 2002-12-17 17:21:49

At 11:31 AM 12/17/2002 +0100, Pavel Cisar wrote:

>So, where we want to go from here ?

My suggestion is that we look at transaction monitoring -
see what InterBase has done.  It handles both the malicious
user and the idiot.


Regards,

Ann
www.ibphoenix.com
We have answers.

Re: [Firebird-devel] Max transaction duration

From: <psc...@in...> - 2002-12-17 12:39:29

On 16 Dec 2002 at 15:48, David Jencks wrote:

> umm, I think I've been saying repeatedly that _all- the sources I have
> seen on distributed system design say the __only__ reliable way to
> detect 1--5 is with timeouts on the server.  So far no one has
> directly disputed this, just argued with the conclusion that to ensure
> reliability we should implement transaction timeouts on the server.
> 
> Although many cases of 1--4 can be detected by the current connection
> keep-alive/timeout feature, I think this clearly relies very heavily
> on unwarranted assumptions about client architecture and good behavior
> of clients.

As far as I can tell there are two methods of handling whether a network transaction 
is still alive, either we assume it is, and ask it, if we get no reponse we assume it 
died.

The other method, is to assume that if it is really old, that it's probably died without 
falling over, and we make sure by driving a stake through it's heart, otherwise 
known as the timeout method.  

For network design, and networking the second method may seem to be the better 
of the two.  However in this case our primary concern isn't the network, it's data 
reliability and performance.  I thought the whole process got started because a stuck 
OAT affects performance. The real issue is how to prevent the OAT from getting 
stuck in the first place.  

I don't mind having BOTH methods in place, as long as the timeout can be disabled, 
what I would like to see is this:

Two settings, first is the timeout period, something like a months worth as a 
maximum, expressed in seconds, should be sufficient.

Second is a setting about what to do about it. several options here:

1) Ignore it 
2) Warning  this would send a message of some kind to a DBA
3) Kill 
4) Ignore Read only read committed 

These would be additive, so for example if you combine warn, kill, ignore RORC 
then it would warn, wait the same period again and then kill, ignoring all read only 
read committed transactions.

Re: [Firebird-devel] Max transaction duration

From: Christian P. <c_p...@ya...> - 2002-12-12 01:48:01

Thank you Lean

> Please note that this does NOT mean that it will be implemented (he says
> to protect himself from the volley which might ensue).

Is important for me at lest that it will be considered.