Alan Post found a really stupid bug in my code that caused a very bad
stack leak in the Cf_gadget scheduler. It was an interesting variation on
the old "recursion over exception handling" in that recursion is required to
go through a lazy indirection. Here's the detail, because it's interesting:
Every time the scheduler is entered when the ready queue is not empty,
an exception context is pushed onto the stack, and it isn't popped until
after the work unit completes. The problem is that the typical work unit
reenters the scheduler whenever an event is transmitted or received on a
wire. Under fairly common circumstances, these exception contexts can
eat up the stack faster than the scheduler gets a chance to unwind them.
The solution is to pop the exception context after taking a work unit off
the ready queue but before dispatching it. It's only needed to catch the
case where the ready queue is empty when the scheduler is entered.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Oops, this wasn't fixed until my most recent CVS checkin a few minutes
ago. I had to completely rewrite the Cf_gadget scheduler. It's much
better now you'll like it. I ran all the OCNAE libraries against it,
including the BEEP core, and it works great without eating the whole stack
whenever there's a lot of work to do between Iom runs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Logged In: YES
user_id=818877
Fixed in CVS. I will be making a cf-0.5 release soon just for this fix.
Logged In: YES
user_id=818877
Alan Post found a really stupid bug in my code that caused a very bad
stack leak in the Cf_gadget scheduler. It was an interesting variation on
the old "recursion over exception handling" in that recursion is required to
go through a lazy indirection. Here's the detail, because it's interesting:
Every time the scheduler is entered when the ready queue is not empty,
an exception context is pushed onto the stack, and it isn't popped until
after the work unit completes. The problem is that the typical work unit
reenters the scheduler whenever an event is transmitted or received on a
wire. Under fairly common circumstances, these exception contexts can
eat up the stack faster than the scheduler gets a chance to unwind them.
The solution is to pop the exception context after taking a work unit off
the ready queue but before dispatching it. It's only needed to catch the
case where the ready queue is empty when the scheduler is entered.
Logged In: YES
user_id=818877
Oops, this wasn't fixed until my most recent CVS checkin a few minutes
ago. I had to completely rewrite the Cf_gadget scheduler. It's much
better now you'll like it. I ran all the OCNAE libraries against it,
including the BEEP core, and it works great without eating the whole stack
whenever there's a lot of work to do between Iom runs.