I've been thinking about Achilleas Margaritis's proposal to use a stack of continuations to avoid blowing the stack ( http://lambda-the-ultimate.org/node/1599 ). I haven't looked at his source code, but it seems to me that there's a real tricky issue regarding how to implement the continuations.
If the continuations are nothing more than objects and if the methods recursively call each other, then the idea won't work. Sure, their data members will be allocated on the heap, but the memory that the methods run in will still be the main program stack. Obj.method() gets compiled to mangled_method_name(*Obj), after all. So you still have functions calling functions.
I asked somebody here for advice, and he suggested bounds checking the stack. If the stack is about to overflow, then return "No Match" and backtrack. It's not "correct" behavior, but it seems like a decent option for paranoid programmers. Or, I guess you could throw an exception, and let the programmer deal with it (after the whole stack's been unwound and all the work has been destroyed, which is why I'm not fond of that idea).
The problem is I don't know of any way to determine the bounds of the stack. But, it turns out, POSIX C has a set of functions declared in <ucontext.h> (but not implemented everywhere; it seems to be in Linux, but I couldn't get it to work on Cygwin or OS X), and Windows has some Fiber functions (CreateFiber, SwitchToFiber, GetFiberData, GetCurrentFiber) that can be used to create new call stacks. All these functions are meant more for co-routines, but they create a new call stack of a determined size, and then use that call stack to run a function.
The problem is that call frames are allowed to be different sizes. So I can't just say "allocate 1 MB of memory, use it as a call stack, and tell me if recursion ever gets more than X calls deep, since each frame is Y bytes, and X * Y = 1 MB." Y is allowed to be a different size for each function within the same program.
But what if we (1) registered the functions/methods that will be called, (2) determined the size of their stack frame(s), and (3) upon entering each function, determine if the next recursive call will overflow the stack and act accordingly?
So my idea is to:
1. require the paranoid programmer (and only the paranoid programmer -- this would be an extension) register each function (probably through some sort of metaprogramming -- "match this [enhanced over the standard Match() signature] signature and do XXX" where XXX will magically register the function),
2. call each registered function twice, recursively, and determine the size of the stack frame for that particular function (I have an idea about how to do this, below)
3. remember the largest stack frame size
4. on entering any function, check the current address position at the beginning of the function, if it is too close to the end of the stack (that is if stack_size < curr_pos + largest_frame + 1) then do whatever you want to do to avoid stack exhaustion
I believe the following code would determine the stack frame size for function foo:
#include <iostream>
namespace {
ptrdiff_t foo(char* a = 0) {
char c;
char* b = &c;
if (0 == a)
return foo(b);
else
return a - b;
};
};
int main() {
std::cout << foo() << '\n';
};
The magic needed would have to do this for all the parser's method calls. Hopefully non-invasively, but I'm not holding my breath on that.
Anyhow, this is the part where you tell me I have a blind spot.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> The problem is I don't know of any way to determine the bounds of the
> stack. But, it turns out, POSIX C has a set of functions declared in
> <ucontext.h> (but not implemented everywhere; it seems to be in Linux,
> but I couldn't get it to work on Cygwin or OS X), and Windows has some
> Fiber functions (CreateFiber, SwitchToFiber, GetFiberData,
> GetCurrentFiber) that can be used to create new call stacks. All these
> functions are meant more for co-routines, but they create a new call
> stack of a determined size, and then use that call stack to run a
> function.
One of the wild things about this approach (which would be added
later) is that it is (I believe) possible to create stack1, use it up,
and instead of backtracking, create stack2, and continue inside it.
When stack2 underflows, ucontext.h's functions at least, automatically
send you back to stack1. You could have a linked-list (or vlist -- http://www.ootl.org/doc/vlist.html ) of stacks (although after stack2
underflows, you've got to decide how to clean it up, or if you will
simply keep it around in case stack1 overflows again).
And, to make this clear: I'm not going to stop everything and only do
this. I'm planning on creating a few branches. One is try to use
RAII instead of AbandonNode, one will try to avoid stack overflow if
the programmer asks for it, and I may start one to experiment with
packrat parsing (using a hashmap like the upcoming STL unordered_list
should handle some of the unnecessary complexity). I'll then work on
one until my mind goes dull, and switch to another until I'm willing
to come back.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've been thinking about Achilleas Margaritis's proposal to use a stack of continuations to avoid blowing the stack ( http://lambda-the-ultimate.org/node/1599 ). I haven't looked at his source code, but it seems to me that there's a real tricky issue regarding how to implement the continuations.
If the continuations are nothing more than objects and if the methods recursively call each other, then the idea won't work. Sure, their data members will be allocated on the heap, but the memory that the methods run in will still be the main program stack. Obj.method() gets compiled to mangled_method_name(*Obj), after all. So you still have functions calling functions.
I asked somebody here for advice, and he suggested bounds checking the stack. If the stack is about to overflow, then return "No Match" and backtrack. It's not "correct" behavior, but it seems like a decent option for paranoid programmers. Or, I guess you could throw an exception, and let the programmer deal with it (after the whole stack's been unwound and all the work has been destroyed, which is why I'm not fond of that idea).
The problem is I don't know of any way to determine the bounds of the stack. But, it turns out, POSIX C has a set of functions declared in <ucontext.h> (but not implemented everywhere; it seems to be in Linux, but I couldn't get it to work on Cygwin or OS X), and Windows has some Fiber functions (CreateFiber, SwitchToFiber, GetFiberData, GetCurrentFiber) that can be used to create new call stacks. All these functions are meant more for co-routines, but they create a new call stack of a determined size, and then use that call stack to run a function.
The problem is that call frames are allowed to be different sizes. So I can't just say "allocate 1 MB of memory, use it as a call stack, and tell me if recursion ever gets more than X calls deep, since each frame is Y bytes, and X * Y = 1 MB." Y is allowed to be a different size for each function within the same program.
But what if we (1) registered the functions/methods that will be called, (2) determined the size of their stack frame(s), and (3) upon entering each function, determine if the next recursive call will overflow the stack and act accordingly?
So my idea is to:
1. require the paranoid programmer (and only the paranoid programmer -- this would be an extension) register each function (probably through some sort of metaprogramming -- "match this [enhanced over the standard Match() signature] signature and do XXX" where XXX will magically register the function),
2. call each registered function twice, recursively, and determine the size of the stack frame for that particular function (I have an idea about how to do this, below)
3. remember the largest stack frame size
4. on entering any function, check the current address position at the beginning of the function, if it is too close to the end of the stack (that is if stack_size < curr_pos + largest_frame + 1) then do whatever you want to do to avoid stack exhaustion
I believe the following code would determine the stack frame size for function foo:
#include <iostream>
namespace {
ptrdiff_t foo(char* a = 0) {
char c;
char* b = &c;
if (0 == a)
return foo(b);
else
return a - b;
};
};
int main() {
std::cout << foo() << '\n';
};
The magic needed would have to do this for all the parser's method calls. Hopefully non-invasively, but I'm not holding my breath on that.
Anyhow, this is the part where you tell me I have a blind spot.
That code was supposed to be indented.
> The problem is I don't know of any way to determine the bounds of the
> stack. But, it turns out, POSIX C has a set of functions declared in
> <ucontext.h> (but not implemented everywhere; it seems to be in Linux,
> but I couldn't get it to work on Cygwin or OS X), and Windows has some
> Fiber functions (CreateFiber, SwitchToFiber, GetFiberData,
> GetCurrentFiber) that can be used to create new call stacks. All these
> functions are meant more for co-routines, but they create a new call
> stack of a determined size, and then use that call stack to run a
> function.
One of the wild things about this approach (which would be added
later) is that it is (I believe) possible to create stack1, use it up,
and instead of backtracking, create stack2, and continue inside it.
When stack2 underflows, ucontext.h's functions at least, automatically
send you back to stack1. You could have a linked-list (or vlist --
http://www.ootl.org/doc/vlist.html ) of stacks (although after stack2
underflows, you've got to decide how to clean it up, or if you will
simply keep it around in case stack1 overflows again).
And, to make this clear: I'm not going to stop everything and only do
this. I'm planning on creating a few branches. One is try to use
RAII instead of AbandonNode, one will try to avoid stack overflow if
the programmer asks for it, and I may start one to experiment with
packrat parsing (using a hashmap like the upcoming STL unordered_list
should handle some of the unnecessary complexity). I'll then work on
one until my mind goes dull, and switch to another until I'm willing
to come back.