From: Chris K. <col...@gm...> - 2004-07-31 04:40:16
|
Moved from c.l.py, as per Armin's request. Armin Rigo wrote: > Christopher T King wrote [re: implementing generators in Psyco]: > > What makes it so difficult? > > A lot of things. But discussing the internals of Psyco on this > newsgroup isn't a very good idea. Please come to the psyco-devel > mailing list. Done. > > Implementing generators in C is trivial: > > I don't think I agree with this sentence, either. Well, to try and convince you of my stance, you can download proof-of-concept C generators from http://users.wpi.edu/~squirrel/temp/cgen.c. I did the implementation using the nifty ucontext mechanism, after attempting a setjmp/longjmp implementation. The simplicity of use can be seen near the end of the file, where the code is nearly as straightforward as the Python equivalent: void mygenfunc(struct generator *_gen,void *n) { for (; n>0; n--) yield(n); GEN_EXIT; } int main() { struct generator *gen; int i; gen=new_generator(mygenfunc,(void *)10); for (;;) { i=(int)next(gen); if (gen->stopped) break; printf("%d\n",i); } free_generator(gen); return 1; } Of course, I'm not sure if this helps the Psyco case any... > BTW I just put on http://psyco.sourceforge.net/ the paper I am going to > present to the ACM SIGPLAN 2004 conference, which gives a (partially > theoretical) description of how Psyco works. Looks helpful! I will give it a read before I continue babbling ;) |
From: Armin R. <ar...@tu...> - 2004-08-12 16:46:34
|
Hello Chris, On Sat, Jul 31, 2004 at 12:40:14AM -0400, Chris King wrote: > > > Implementing generators in C is trivial: > >=20 > > I don't think I agree with this sentence, either. >=20 > Well, to try and convince you of my stance, you can download > proof-of-concept C generators from > http://users.wpi.edu/~squirrel/temp/cgen.c. I did the implementation > using the nifty ucontext mechanism, after attempting a setjmp/longjmp > implementation. I see. Yes, this kind of implementation is close in spirit to how some versions of Stackless work. The ucontext mechanism (which I didn't know about, thanks) appears not to be portable, though, and I don't think you = can do it with setjmp/longjmp. But it does indeed work more or less like thi= s in Stackless nowadays. There is even an off-product of Stackless called Greenlets that is essentially a Python-oriented ucontext implementation. = =20 (Both Stackless and Greenlets use a few lines of assembly code to hack th= e stack.) > Of course, I'm not sure if this helps the Psyco case any... It might do. Christian "Stackless" Tismer offered a similar way to suppo= rt generators on top of Stackless, which would then more or less automatical= ly work with Psyco. In the meantime, we drifted over to the present situati= on, which is to use bytecode rewriting of generators instead, to turn them in= to regular Python functions that save their state manually. This will give = a different kind of generated code with Psyco: the state will be saved off = and restored to the C stack around a 'yield'. In some sense it is better bec= ause it avoids keeping a lot of small C stacks around (and Psyco wastes quite = a lot of C stack space, so multiplying it by the number of active generator instances doesn't look like a good idea). A bient=F4t, Armin. |
From: Chris K. <col...@gm...> - 2004-08-12 18:22:40
|
On Thu, 12 Aug 2004 17:41:52 +0100, Armin Rigo <ar...@tu...> wrote: > The ucontext mechanism (which I didn't know > about, thanks) appears not to be portable, though, and I don't think you = can > do it with setjmp/longjmp. With some gcc trickery (__builtin_return_address, __builtin_frame_address, and computed gotos), a setjmp/longjmp implementation should be possible (though messy), but that still restricts it to gcc. > There is even an off-product of Stackless called > Greenlets that is essentially a Python-oriented ucontext implementation. > (Both Stackless and Greenlets use a few lines of assembly code to hack th= e > stack.) Quite interesting, that! Of course, using assembly will restrict to a specific architecture, but would make rewriting for different architectures easier. > the state will be saved off and > restored to the C stack around a 'yield'. In some sense it is better bec= ause > it avoids keeping a lot of small C stacks around (and Psyco wastes quite = a lot > of C stack space, so multiplying it by the number of active generator > instances doesn't look like a good idea). My initial idea was to save only what stack was used into a dynamic buffer, rather than using the multiple fixed stacks of ucontext. I decided against this due to the overhead incurred by copying to/from a stack buffer, opting rather for the increased memory usage of ucontext. The bytecode solution seems similar to this -- it would save memory, but waste processor time. It seems then that the best compromise would be to keep multiple fixed Python stacks around, =E0 la ucontext, and quickly switch between them, but that will have to wait for Stackless Psyco ;) Just to throw in an example use case, the program I'm working on right now utilizes dozens (possibly even hundreds) of generators, stored in a heap, which must all be created and (partially) iterated through in around 1/10 of a second (for UI purposes). It runs fine on faster processors, but <500MHz it could use a Psyco boost. I imagine memory usage for a ucontext-based implementation would be incredible in this case, but then, time spent copying state to/from smaller buffers might overshadow the actual time spent inside the generators themselves (which is negligible). Curious, which part of Psyco eats up stack space, the code-generating/profiling routines, or the generated code itself?=20 Because if it's the former, a ucontext-style implementation would likely be best, if the two stacks could be seperated (no doubt a daunting task, though). Just some thoughts. |
From: Christian T. <ti...@st...> - 2004-08-13 00:03:37
|
Chris King wrote: > On Thu, 12 Aug 2004 17:41:52 +0100, Armin Rigo <ar...@tu...> wrote: ... >>the state will be saved off and >>restored to the C stack around a 'yield'. In some sense it is better because >>it avoids keeping a lot of small C stacks around (and Psyco wastes quite a lot >>of C stack space, so multiplying it by the number of active generator >>instances doesn't look like a good idea). > > My initial idea was to save only what stack was used into a dynamic > buffer, rather than using the multiple fixed stacks of ucontext. I > decided against this due to the overhead incurred by copying to/from a > stack buffer, opting rather for the increased memory usage of > ucontext. The bytecode solution seems similar to this -- it would > save memory, but waste processor time. It seems then that the best > compromise would be to keep multiple fixed Python stacks around, à la > ucontext, and quickly switch between them, but that will have to wait > for Stackless Psyco ;) Yes and no: We are after both solutions, since they both have advantages which are simply different and orthogonal. The stack switching approach has the advantage of a speed not reachable by anything else -- on certain platforms !! There are Stackless supported platforms, where no single stack switch *ever* can be fast. Ever used a Spark? You need to flush all the register windows and restore them. This platform simply cannot efficiently switch contexts. On X86 et al, you are right: Stack switching is *very* fast. This speed comes for some cost as well: Without huge efforts, pickling the program state is impossible. And if you need very many generators, this approach doesn't lead anywhere, since the cost of a stack is many kilobytes, not measured is a few hundred bytes. By very many I'm talking of 100.000 and much more. If you need hundreds, this is of course trivial. The bytecodehacks approach is more towards Stackless 3.0: if we can avoid using the stack, we do it. This costs a little computation time, not so much. We always leave the stack clean. This gives us a minimum of memory costs, while we pay for it with a little computation time, of course. But this will be optimum on Spark-like architectures, for instance. It also gives us trivial thread pickling, since the state tuples can be pickled, easily. Another, *huge* advantage of the bytecode approach: Here, we can make full use of inline techniques, which I am about to publish. I have enhanced the bytecodehacks to do real, completely compatible inlining of Python functions in all situations. The advantage of this is: If you can inline a generator completely, the whole function call melts away, together with the state store/restore sequence. Psyco optimizes this away like a charm! This is a thing that cannot be competed by any stack tricks. Used to the right amount, this feature is unbeatable. Finally, I want to combine both techniques: Inlining upto a bearable limit of code bloat. This creates huge, fast code blocks, with very many local variables. Switching these without inlining would be expensive without inlining, because the state tuples get huge. Exactly here it makes sense to switch to the stack switching model. This combination seems to be almost best possible of what can be done at all, and I think we are very near to a system with no competitors I know of. cheers -- chris -- Christian Tismer :^) <mailto:ti...@st...> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ |
From: Chris K. <col...@gm...> - 2004-08-13 12:52:20
|
On Fri, 13 Aug 2004 02:05:30 +0200, Christian Tismer <ti...@st...> wrote: > There are Stackless supported platforms, where no single > stack switch *ever* can be fast. Ever used a Spark? > You need to flush all the register windows and restore them. > This platform simply cannot efficiently switch contexts. Hm, this would be problematic. Am I right to assume that multiple threads suffer the same performance hit on this architecture? > The advantage of this is: > If you can inline a generator completely, the whole function call > melts away, together with the state store/restore sequence. If you can't inline a generator, though (e.g. if the generator object is stored somewhere), you'd have to revert to stack-switching, so it would still be needed in some form (as you later imply). |
From: Christian T. <ti...@st...> - 2004-08-13 13:08:59
|
Chris King wrote: > On Fri, 13 Aug 2004 02:05:30 +0200, Christian Tismer > <ti...@st...> wrote: > > >>There are Stackless supported platforms, where no single >>stack switch *ever* can be fast. Ever used a Spark? >>You need to flush all the register windows and restore them. >>This platform simply cannot efficiently switch contexts. > > > Hm, this would be problematic. Am I right to assume that multiple > threads suffer the same performance hit on this architecture? Yes, I think so. >>The advantage of this is: >>If you can inline a generator completely, the whole function call >>melts away, together with the state store/restore sequence. > > If you can't inline a generator, though (e.g. if the generator object > is stored somewhere), you'd have to revert to stack-switching, so it > would still be needed in some form (as you later imply). Exactly. This is a less common situation, and it turns a generator more into the direction of a real coroutine, and that's exactly how coroutines can run the fastest. cheers - chris -- Christian Tismer :^) <mailto:ti...@st...> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ |