You can subscribe to this list here.
2017 |
Jan
|
Feb
(2) |
Mar
(6) |
Apr
(4) |
May
(20) |
Jun
(15) |
Jul
(4) |
Aug
(2) |
Sep
(6) |
Oct
(6) |
Nov
(20) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2018 |
Jan
(16) |
Feb
(3) |
Mar
(7) |
Apr
(40) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
(1) |
2019 |
Jan
(7) |
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Kevin K. <kev...@gm...> - 2018-04-16 02:15:55
|
OOPS = Meant to post this to tcl-quadcode. Memo to self: Don't hit 'send' in the wee hours! ---------- Forwarded message ---------- From: Kevin Kenny <kev...@gm...> Date: Sun, Apr 15, 2018 at 1:36 AM Subject: A little more progress on LLVM coroutines To: Tcl Core Mailing List <tcl...@li...> I've managed to complete all the code for NRE entry/exit, NRE call/return (functions only, no Tcl commands or 'invokeExpanded' yet), and thunk generation. That's enough that the compiler generates LLVM-IR for everything in 'tester.tcl'. But now, as soon as the optimizer runs, there's a segmentation fault. (The fault occurs only if there is NRE code.) I was able to dump out the unoptimized bitcode, and I can reproduce the crash without llvmtcl in the loop, just running opt -S -coro-early test.bc -o test.ll except that occasionally, it's reported as 'Floating point exception.' (WTF?) It would appear that we could work around the issues by modifying the passes that the pass manager executes. I tried to tidy up the code a little bit by running opt -S -mem2reg test.bc -o test.ll so that a lot of the rubbish is gone from the assembly code. Lo and behold, 'opt -S -enable-coroutines -O3 test.ll -o test2.ll' runs the optimizer without a hiccup. This would indicate to me that whatever is amiss with our code that's causing segfaults, a very simple cleanup optimization gets rid of it. I have a sneaking suspicion that it's the unique undefs that are confusing it - this is because that's really all that I see that's different in the function that fails when I compare the input and output of 'mem2reg'. Since I want to get rid of those anyway, because that would make 'phi' generation a lot tidier, that may be what I work on next. |
From: Kevin K. <kev...@gm...> - 2018-04-09 20:01:08
|
On Mon, Apr 9, 2018 at 1:30 PM, Kevin Kenny <kev...@gm...> wrote: > There are a bunch of 'alloca's that are done so that calls to the Tcl > library have a cell to pass by pointer. > https://core.tcl-lang.org/tclquadcode/artifact?udc=1&ln=709-719&name=eb5a27fa4b > is typical of these. They are innocuous, since the allocated object > does not persist. They should, still, most likely be bracketed with > 'llvm.lifetime.begin' and 'llvm.lifetime.end' operations, since the LLVM > optimizer doesn't seem to recognize them for stack coloring. It may be > that stack coloring also requires 'alloca's to appear in the entry > block. In this case, I think that we'd be wise to identify all these > temporary objects and preallocate them at the entry. It appears that > the LLVM optimizer is capable of removing unused variables. This is more important than I thought: if code like that appears in a loop, then there's serious trouble. The 'alloca' cannot be simplified to an SSA variable because it's being passed by pointer into the Tcl library. It can't be moved to the entry block because it's being called repeatedly in the loop body. Result will be that, with or without coroutines, a sufficiently long loop calling that inline function will result in a stack overflow. This has to be fixed. https://stackoverflow.com/questions/21025099/llvm-alloca-causes-stack-overflow-on-while-statement This may not actually be 100% true - it appears that the inlining pass can move many 'alloca' instructions into the entry block of the caller. But it is definitely true that the 'alloca' instructions in 'buildVector', 'buildBitArray', and possibly 'frame.create', 'uniqueUndef', and the uses in 'macros.tcl' may all be vulnerable. I also think I may have seen other 'alloca' instructions that didn't get hoisted out of loops, but I'd need to investigate more, I'm not seeing that behavior at the moment except with the explicit 'alloca's in builder methods rather than in inlined functions. Also, I found confirmation that scalar replacement of aggregates and promotion of memory cells to SSA variables happens only for allocations in the entry block: https://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas I can get on this, since I know Donal doesn't have the bandwidth at the moment. |
From: Kevin K. <kev...@gm...> - 2018-04-09 17:31:07
|
I'm continuing to make some progress on implementing NRE procedures as LLVM coroutines, along the lines of what I sketched in https://core.tcl-lang.org/tclquadcode/doc/trunk/doc/20180315-notes-coros.md It's been a trifle bumpy - I'm still not fluent in LLVM-IR code generation, but I'm getting better at it. 'getElementPtr' is one confusing beast! In the course of trying to do code generation for invoking and returning from uncompiled Tcl commands using the NRE-llvm.coro bridge, I've stumbled over some further restrictions in what an LLVM coroutine is allowed to do. Essentially, any alloca'ed memory must be statically analyzable if it is to live across suspending a coroutine - so that the LLVM CoroFrame pass can move it from the stack-allocated activation record to the possibly heap-allocated one. 'Statically analyzable' rules out the trick that we use in the 'buildVector' and 'clearVector' methods of Builder - 'stacksave' and 'stackrestore' are Right Out. It turns out that in the LLVM code, the constraint is considerably more restrictive. All 'alloca' operations that produce pointers that are live at 'llvm.coro.suspend' must be in the entry block of the function. I don't think this is an insurmountable obstacle, but it'll be some amount of work to get the code scrubbed out. Fortunately, the stuff that comes from quadcode itself is already clean in this respect; quadcode only ever generates SSA values directly. There are no quadcode variables that need to be explicitly stack-allocated. There are a handful (more than a handful, actually) of possible 'alloca's that the code setting up a procedure does, but these are innocuous; after LLVM's optimizer has run, all the allocations have moved to the entry block. There are a bunch of 'alloca's that are done so that calls to the Tcl library have a cell to pass by pointer. https://core.tcl-lang.org/tclquadcode/artifact?udc=1&ln=709-719&name=eb5a27fa4b is typical of these. They are innocuous, since the allocated object does not persist. They should, still, most likely be bracketed with 'llvm.lifetime.begin' and 'llvm.lifetime.end' operations, since the LLVM optimizer doesn't seem to recognize them for stack coloring. It may be that stack coloring also requires 'alloca's to appear in the entry block. In this case, I think that we'd be wise to identify all these temporary objects and preallocate them at the entry. It appears that the LLVM optimizer is capable of removing unused variables. There are also some 'alloca' operations, that simply are done because SSA generation was inconvenient or because we didn't understand how to do it at the time the code was written. It is indeed possible to generate SSA in an alloca-free way, wuth back-patching. My reading of the code is that 'uniqueUndef' and calls to ReplaceAllUsesWith are also horrible hacks to work around this misunderstanding. A nicer approach would be to create 'phi' operations without forward references, and then backpatch the forward references once the source block and source operand are defined. In working out a version of 'tcl.vector.clear' that does not require the input values to be repeated, I was able to do this using 'AddIncoming' to do the patchwork: https://core.tcl-lang.org/tclquadcode/artifact?udc=1&ln=4582-4592&name=eb5a27fa4b To do The Right Thing here, we'd probably want to change 'build' to a two-pass process so that all the AddIncoming operations could happen automatically, but even the way I've coded it is superior to the ReplaceAllUsesWith. Once the short-lived objects and the workarounds for 'phi' are addressed, there aren't very many long-lived 'alloca' operations to consider. The chief culprits that remain are 'buildVector', which allocates the 'objv' vector that is needed for Tcl command invocation, list construction, various dictionary operations, and 'foreachStart'; and 'buildBitArray', which builds an array of Boolean flags for 'invokeExpanded'. Clearly, anything that persists across an 'invoke' operation has to persist across coroutine suspension, so we need to figure out how to get this stuff into the LLVM coroutine frame, avoiding the 'stacksave'/'alloca'/'stackrestore' dance. One possibility would simply be to use to place the 'alloca' in the entry block. There would need to be some additional bookkeeping in Builder, because I don't think that the entry block reference is available at the time we'd need it. But simply bracketing the 'my alloc' with 'my @end $entryBlock' and 'my @end $current_block' would be a good way to put the allocation where it actually needs to be. Of course, this would lead to an activation record that's in general much, much larger than it needs to be, since all objv vectors would be at unique places in it. I don't know whether the stack coloring pass would help or not, but in general, LLVM considers any 'alloca' that isn't in the entry block as being 'dynamic', and it would not astonish me to learn that moving all these items to the entry block would allow them to be cleaned up. Nevertheless, for 'buildVector/clearVector', I think that a better idea would be to allocate a single objv (and a single bit vector for 'invokeExpanded', if needed), and have the 'invoke' mechanism then use an SSA variable containing a pointer to the first element of the allocated vector. Rather than needing to do this in multiple passes in the code issuer, it would be easy to have the quadcode compiler track what the maximum number of args on any 'invoke' is, and have the code issuer preallocate from that. Any allocations that remain in our inline functions, we probably ought to bracket with llvm.lifetime.begin and llvm.lifetime.end, to help out the address sanitation pass try to root out use-after-scope errors and perhaps do better activation record management. It's still looking to me as if, even considering all of these issues, LLVM coros might still be less work than making the quadcode middle end figure out the structure of coroutine frames and perform equivalent optimizations to the four LLVM coroutine passes. It wasn't until writing this message that I actualy realized that the 'alloca' operations in question could be moved to the entry block, so this message, which started out intended to be a plea for help, actually turned into a summary of a possible path forward. So that's the next investigation - how to eliminate dynamic alloca of anything that must survive the 'llvm.coro.suspend' transition. I'll get back to plugging away on that. |
From: Donal K. F. <don...@ma...> - 2018-04-03 10:26:38
|
On 02/04/2018 23:15, Kevin Kenny wrote: > Thanks, will do. > > I don't see any obvious further changes at the moment, but I won't be > certain, of course, until and unless I have working NRE code. I've also managed to get automatic building and testing of PRs via Travis CI working. (Fiddly work getting a compiler and set of libraries that would do the right thing together.) Yes, the code is probably under-tested right now, but at least those tests which exist are run. Donal. |
From: Kevin K. <kev...@gm...> - 2018-04-02 22:15:27
|
Thanks, will do. I don't see any obvious further changes at the moment, but I won't be certain, of course, until and unless I have working NRE code. On Mon, Apr 2, 2018, 4:56 PM Donal K. Fellows < don...@ma...> wrote: > On 30/03/2018 21:45, Kevin Kenny wrote: > > In order not to get pointer smashes on trying to load (not even > > execute!) generated code in the 'kbk-nre' branch, I found that I > > needed to > > > > (a) make use of function pass managers as well as module pass managers > in LLVM > > > > (b) make the call 'addCoroutinePassesToExtensionPoints' when setting > > up the pass manager builder. > > Interesting… > > > With those changes in place, I'm making progress again toward > > implementing the sketches in > > > https://core.tcl-lang.org/tclquadcode/doc/trunk/doc/20180315-notes-coros.md > . > > > > Of course, I needed to add the 'addCoroutinePassesToExtensionPoints > > interface to llvm, and so I advanced the minor version to 3.9. I left > > a pull request on github. Until Donal pulls it, you can find my > > version at https://github.com/kennykb/llvmtcl. > > > > I've accepted the PR and fixed a few issues. One was caused by the > subtleties of whether LLVM is configured to be in many libraries or one > monolithic DLL (I needed to add the coroutine support to the list of > component libraries that are required), and the other was due to LLVM's > *lovely* habit of defaulting to killing the process whenever it sees > something it doesn't like. > > I'm planning to close the llvmtcl 3.9 milestone soon and tag for > release. If there's anything that needs doing before then, let me know > in the next couple of weeks. > > Donal. > |
From: Donal K. F. <don...@ma...> - 2018-04-02 20:56:21
|
On 30/03/2018 21:45, Kevin Kenny wrote: > In order not to get pointer smashes on trying to load (not even > execute!) generated code in the 'kbk-nre' branch, I found that I > needed to > > (a) make use of function pass managers as well as module pass managers in LLVM > > (b) make the call 'addCoroutinePassesToExtensionPoints' when setting > up the pass manager builder. Interesting… > With those changes in place, I'm making progress again toward > implementing the sketches in > https://core.tcl-lang.org/tclquadcode/doc/trunk/doc/20180315-notes-coros.md. > > Of course, I needed to add the 'addCoroutinePassesToExtensionPoints > interface to llvm, and so I advanced the minor version to 3.9. I left > a pull request on github. Until Donal pulls it, you can find my > version at https://github.com/kennykb/llvmtcl. > I've accepted the PR and fixed a few issues. One was caused by the subtleties of whether LLVM is configured to be in many libraries or one monolithic DLL (I needed to add the coroutine support to the list of component libraries that are required), and the other was due to LLVM's *lovely* habit of defaulting to killing the process whenever it sees something it doesn't like. I'm planning to close the llvmtcl 3.9 milestone soon and tag for release. If there's anything that needs doing before then, let me know in the next couple of weeks. Donal. |
From: Kevin K. <kev...@gm...> - 2018-03-30 20:45:09
|
In order not to get pointer smashes on trying to load (not even execute!) generated code in the 'kbk-nre' branch, I found that I needed to (a) make use of function pass managers as well as module pass managers in LLVM (b) make the call 'addCoroutinePassesToExtensionPoints' when setting up the pass manager builder. With those changes in place, I'm making progress again toward implementing the sketches in https://core.tcl-lang.org/tclquadcode/doc/trunk/doc/20180315-notes-coros.md. Of course, I needed to add the 'addCoroutinePassesToExtensionPoints interface to llvm, and so I advanced the minor version to 3.9. I left a pull request on github. Until Donal pulls it, you can find my version at https://github.com/kennykb/llvmtcl. |
From: Donal K. F. <don...@ma...> - 2018-03-20 14:35:51
|
On 19/03/2018 15:26, Donal K. Fellows wrote: > Looking at https://llvm.org/docs/Coroutines.html, the bit of work needed > is that there's a new basic type ('token') that we need to be able to > generate, and a new special 'none' literal to go with it. Once that's > done, the rest should be an extension of what we've already got. That should now be all done. I've also merged a few other things (module cloning and inlined-at annotation support) and made everything work with a production build of LLVM 6. There's no LLVM 7 production release yet. Use the master of https://github.com/dkfellows/llvmtcl or pester me to do the 3.9 release of llvmtcl. :-) Donal. |
From: Kevin K. <kev...@gm...> - 2018-03-19 17:59:49
|
Oops, meant to copy the list: On Mon, Mar 19, 2018 at 1:59 PM, Kevin Kenny <kev...@gm...> wrote: > On Mon, Mar 19, 2018 at 11:26 AM, Donal K. Fellows > <don...@ma...> wrote: >> On 15/03/2018 23:42, Kevin Kenny wrote: >>> >>> Could you trouble to just shoot me a one-line email reassuring me that >>> it's no worse than that? I begin to wonder whether you're ill, in some >>> other kind of trouble, or somehow offended by what I've been trying to >>> do with the compiler lately - since it's been quite a while since >>> you've answered an email about quadcode. >> >> >> I'd not been planning to touch them because I wasn't entirely sure what >> would happen with multiple independent calls, but if you want to try them >> out then I can add them in. (I've had some health issues — not serious, but >> quite time consuming — and work has been busy.) Right now, I'm a bit more >> focused on trying to sort out an implementation of TIP #500, but I can >> shelve that for a few days to get this sorted. >> >> Looking at https://llvm.org/docs/Coroutines.html, the bit of work needed is >> that there's a new basic type ('token') that we need to be able to generate, >> and a new special 'none' literal to go with it. Once that's done, the rest >> should be an extension of what we've already got. As is depressingly common >> in LLVM, it seems that while we've got the type in the C API (though in an >> unhelpful way), we've not got the associated special constant. :-(All the >> intrinsics should be exported on any system that supports those intrinsics >> at all, and the optimisations to support all this should be enabled by >> default, so fixing the type and constant ought to be all that I need to do >> to enable coroutine support. >> >> And I might also need to port to LLVM 7. That's probably what'll take the >> time. :-) >> >> Also, I also found: >> >> https://llvm.org/devmtg/2016-11/Slides/Nishanov-LLVMCoroutines.pdf >> >> and thought that it's not a bad explanation of what is going on with LLVM's >> coroutines. > > Sorry that you've been ill! If it was the flu that's been going around here, > that knocked me flat in January - and then I got not one, but two > secondary infections on top of it. It's a truly nasty bug! > > No worries about long response, and if you want to keep plugging > on TIP 500, by all means do so! I was simply getting worried about > not hearing from you at all! TclOO is important, too! If there's going > to be a long delay, I can simply work on some of the other stuff, > such as dealing with the errors of things like > set x rubbish; if {$x} { ... } > and getting a start on global value numbering. There's no > shortage of quadcode work that *won't* require your attention! > > You're right that I don't seem to see any > sort of constructor for a constant LLVMValueRef of 'token' > type - so I see what you mean by 'unhelpful'! (But I thought we > already had one or more dependencies on the C++ API, because > something else critical wasn't exported?) In any case, the > missing 'none' isn't a major problem. I use it only in calls to > @llvm.coro.suspend. We can work around it by instead > doing > %save_token = call token @llvm.coro.save(i8* %coro_handle) > immediately before @llvm.coro.suspend, to get a token > to pass to 'suspend'. The CoroCleanup pass should lower that > to the same code that we'd get from passing 'none'. > > The cool thing about the coroutine intrinsics is that they're > not in the library - the lowering passes eliminate them entirely. > They certainly resolve when I give them to [$module intrinsic]. > > When I looked at the slide presentation, the spec, and the actual > code, it looks as if the transformations that LLVM is doing are > *extremely* similar to what I'd planned to do at quadcode level. I'm > not expecting trouble with multiple independent calls, because it > looks as if, in the worst case, the optimization siimply won't be as > nice as we'd otherwise like. > > I did a pretty detailed analysis at > > http://core.tcl.tk/tclquadcode/doc/trunk/doc/20180315-notes-coros.md > > (with codebursts in LLVM assembly language). I did a few notebook > sketches of what I'd expect to see from LLVM's optimization, and it's > actually quite nice. > > The only thing that I'd wonder there is whether rather than having > a single @tcl.coro.runner, it mightn't be better to make a copy per > NR procedure. It's not obvious that it would provide enough information, > but if we could manage to get to where the LLVM backend can > figure out that %coro_handle designates a particular, specific > coro, then @llvm.coro.resume (which will be lowered in CoroElide > to a call to the continuation) could actually be inlined. And that's > the only place that 'resume' is called from. > > The really significant thing is that the whole edifice depends > on inlining the ramp function. We have a very good chance at > that, since all that the ramp I've sketched does is to > allocate the promise and the coroutine frame, 'begin' the > coroutine, and immediately 'suspend' it. > > It also occurs to me that we need to mark up the parameter > transmission with @llvm.coro.param, to avoid extra parameter > copies if a proc parameter isn't used after the ramp. > I haven't worried about that yet, because procs with unused > parameters are a bit of a rarity. |
From: Donal K. F. <don...@ma...> - 2018-03-19 15:26:59
|
On 15/03/2018 23:42, Kevin Kenny wrote: > Could you trouble to just shoot me a one-line email reassuring me that > it's no worse than that? I begin to wonder whether you're ill, in some > other kind of trouble, or somehow offended by what I've been trying to > do with the compiler lately - since it's been quite a while since > you've answered an email about quadcode. I'd not been planning to touch them because I wasn't entirely sure what would happen with multiple independent calls, but if you want to try them out then I can add them in. (I've had some health issues — not serious, but quite time consuming — and work has been busy.) Right now, I'm a bit more focused on trying to sort out an implementation of TIP #500, but I can shelve that for a few days to get this sorted. Looking at https://llvm.org/docs/Coroutines.html, the bit of work needed is that there's a new basic type ('token') that we need to be able to generate, and a new special 'none' literal to go with it. Once that's done, the rest should be an extension of what we've already got. As is depressingly common in LLVM, it seems that while we've got the type in the C API (though in an unhelpful way), we've not got the associated special constant. :-(All the intrinsics should be exported on any system that supports those intrinsics at all, and the optimisations to support all this should be enabled by default, so fixing the type and constant ought to be all that I need to do to enable coroutine support. And I might also need to port to LLVM 7. That's probably what'll take the time. :-) Also, I also found: https://llvm.org/devmtg/2016-11/Slides/Nishanov-LLVMCoroutines.pdf and thought that it's not a bad explanation of what is going on with LLVM's coroutines. Donal. |
From: Jos D. <jos...@gm...> - 2018-03-17 19:42:09
|
Hi Kevin, Work and real life issues have kept me from doing much hobby programming the last years. I can't offer any help in NRE support for LLVM. Kind regards, Jos. On Fri, Mar 16, 2018 at 12:42 AM Kevin Kenny <kev...@gm...> wrote: > Donal, Jos: > > While looking for something else in the LLVM documentation, I stumbled > upon an interesting set of interfaces: it has coroutines! > > http://releases.llvm.org/5.0.0/docs/Coroutines.html > > These are much more limited than Tcl's coroutines, in that a coroutine > is always a single function: a called function may not yield (or > rather, if it does, it becomes a coroutine separate from its caller). > Nevertheless, even this limited functionality provides a solution to > the hard part: how to create a coroutine frame that provides a > resumable context. Having that, the rest is doable. > > In fact, it looks as if the rest is all a straightforward, if somewhat > lengthy, exercise in code generation. I've worked through most of the > details, except for the four bytecode instructions that manipulate > coroutines. (I need to study the Tcl implementation more to figure > them out.) > > My notes on what I've learnt so far are at > > https://core.tcl-lang.org/tclquadcode/doc/trunk/doc/20180315-notes-coros.md > > I'd love to have a go at the code gen for this myself, but I'm afraid > that I keep getting lost. As I start to write things (and try to test > them by looking at the assembly code, not expecting it to run), I keep > stumbling over what should be basic questions, but turn out to be > really hard to answer by source-diving. I really lack a road map. > > In fact, my first question on this came from trying to implement the > NRE procedure entry, which is where I started. > I managed to define the type - I think - and emit the first few calls, > but I stumbled at trying to call Tcl_Alloc. I see that there are many > Tcl API routines that are imported, but the only places that I see > them used are in files like stdlib.tcl, where they're all wrapped up > in 'build' and 'my closure' methods, which have some pretty mysterious > interfaces. I certainly wasn't able to figure out where to get > Tcl_Alloc from, except deep in that layer, and my 'cargo cult' efforts > to bring it in merely brought about a segfault in the Global_Init > function - whose code gen I hadn't (knowingly) touched. > > Is it possible that one or the other of you could help me get up to > speed on this? (Jos, I realize that these questions may relate to > Donal's code and not yours. I'll not be hurt if your answer is, "I > don't know.") > > Donal, I realize that you are most likely insanely busy at the moment. > Could you trouble to just shoot me a one-line email reassuring me that > it's no worse than that? I begin to wonder whether you're ill, in some > other kind of trouble, or somehow offended by what I've been trying to > do with the compiler lately - since it's been quite a while since > you've answered an email about quadcode. > |
From: Kevin K. <kev...@gm...> - 2018-03-15 23:42:26
|
Donal, Jos: While looking for something else in the LLVM documentation, I stumbled upon an interesting set of interfaces: it has coroutines! http://releases.llvm.org/5.0.0/docs/Coroutines.html These are much more limited than Tcl's coroutines, in that a coroutine is always a single function: a called function may not yield (or rather, if it does, it becomes a coroutine separate from its caller). Nevertheless, even this limited functionality provides a solution to the hard part: how to create a coroutine frame that provides a resumable context. Having that, the rest is doable. In fact, it looks as if the rest is all a straightforward, if somewhat lengthy, exercise in code generation. I've worked through most of the details, except for the four bytecode instructions that manipulate coroutines. (I need to study the Tcl implementation more to figure them out.) My notes on what I've learnt so far are at https://core.tcl-lang.org/tclquadcode/doc/trunk/doc/20180315-notes-coros.md I'd love to have a go at the code gen for this myself, but I'm afraid that I keep getting lost. As I start to write things (and try to test them by looking at the assembly code, not expecting it to run), I keep stumbling over what should be basic questions, but turn out to be really hard to answer by source-diving. I really lack a road map. In fact, my first question on this came from trying to implement the NRE procedure entry, which is where I started. I managed to define the type - I think - and emit the first few calls, but I stumbled at trying to call Tcl_Alloc. I see that there are many Tcl API routines that are imported, but the only places that I see them used are in files like stdlib.tcl, where they're all wrapped up in 'build' and 'my closure' methods, which have some pretty mysterious interfaces. I certainly wasn't able to figure out where to get Tcl_Alloc from, except deep in that layer, and my 'cargo cult' efforts to bring it in merely brought about a segfault in the Global_Init function - whose code gen I hadn't (knowingly) touched. Is it possible that one or the other of you could help me get up to speed on this? (Jos, I realize that these questions may relate to Donal's code and not yours. I'll not be hurt if your answer is, "I don't know.") Donal, I realize that you are most likely insanely busy at the moment. Could you trouble to just shoot me a one-line email reassuring me that it's no worse than that? I begin to wonder whether you're ill, in some other kind of trouble, or somehow offended by what I've been trying to do with the compiler lately - since it's been quite a while since you've answered an email about quadcode. |
From: Kevin K. <kev...@gm...> - 2018-03-12 04:32:09
|
I'm still sort of stuck on inlining of procedures that return errors, owing to the issues from my last few messages, so I've put the 'inline' branch on hold for now. I'm presuming that Donal will recover from whatever's been keeping him from responding to queries about code generation, and that it will be more effective to forge ahead elsewhere. I'm working instead on laying the groundwork for NRE (at long last). I believe that doing so will also give us the foundation for dynamic recompilation - or at least detection of spoilt compilation. This will be necessary if we are ever to invoke unknown commands - we need to detect that Core language bindings haven't changed, new global aliases haven't appeared, traces haven't changed, and so on. It turns out that this happens across a call to and return from the command that the compiler doesn't know about, which is precisely where NRE needs to insert itself. (OK, there are other places as well.) The first step toward NRE is to figure out what variables need to be preserved across an NRE invocation. This set is precisely those variables in the calling procedure that are live on return from the invocation, so accurate computation of live variable sets will be a prerequisite. There was a live-sets computation in quadcode/livevars.tcl, but it was pretty horrible. It was initially designed even before we went to SSA as an intermediate representation, and translated from a pretty bad Datalog prototype to an even worse Tcl implementation. It was also intimately wrapped up with inserting 'free' quadcodes for values that are going out of scope, which happens very late in compilation; much later than NRE needs to be figured out. My task for the weekend was therefore to refactor the liveness calculation. It turns out that there was a fairly recent paper by Bradner et al.: "Computing Liveness Sets for SSA-Form Programs," https://hal.inria.fr/inria-00558509v2 (12 Ap, that had nearly the exact machinery that we need, I was able to read the paper. and refactor out the horrible code, over the weekend, and have merged it into the quadcode trunk. All the tests continue to pass, and while the 'free' operations do not ncecessarily appear in precisely the same places, it doesn't appear to be leaking memory. Next will be to come up with a proof-of-concept slicing up of optimized quadcode into discrete continuations, which may be compiled separately. Continuations will need a new kind of entry, and I'm still working out details. My guess is that except for entry and return, very little of code generation will actually need to be altered, since values can be closed within a continuation either by creating an anonymous structure to hold them (my weak preference) or else by pushing them into the Tcl callframe before the call and retrieving them afterward. (I dislike the latter option because of the additional overhead of repeated boxing and unboxing. In any case, there's a fair amount of work before I'll have anything resembling sensible quadcode to generate code from, so there's still time to think about this stuff (I need liveness for other planned activities, so the merge to the trunk is something I expect to be sound in any case.) |
From: Kevin K. <kev...@gm...> - 2018-02-27 18:33:51
|
At last, I think I have things set up so that the interpreter status, Tcl result, and options dictionary all flow into a catch explicitly as a FAIL value. I managed to make all the changes in the code issuer myself. This needed the refactoring of a surprising number of quadcodes: setReturnCode Changed to produce an explicit FAIL (but see below!) throwIfNotExists, throwNotExists, initIfNotExists Changed to make an 'exists' check, and then do the right thing with combinations of 'initException', 'copy' and conditional jumps. These three quadcodes are now eliminated. throwIfScalar, throwIsScalar, throwIfArray, throwIsArray Changed to use appropriate combinations of 'exists', 'arrayExists', 'initException' and conditional jumps. These four quadcodes are now eliminated. narrowToParamType, narrowToNotParamType These quadcodes, it turns out, were never generated, so support for them is removed. checkParamType, throwIfNotParamType - These quadcodes are refactored into 'instanceOfParamType' (a new quadcode), 'initParamTypeException' (another new quadcode), and conditional jumps. The new quadcodes do NOT propagate into the code issuer, because a later pass replaces them with 'instanceOf' and 'initException' once the parameter types are known. 'checkParamType' and 'throwIfNotParamType' are eliminated. checkArithDomain, throwArithDomainError - These quadcodes are refactored into 'instanceOf', 'initException', and conditional jumps. They are eliminated. result, returnCode, returnOptions 'result' is refactored to accept a value that includes FAIL among its types, and returns the result value associated with it. 'returnCode' is refactored also to accept a value of type {FAIL SOMETHING} and return the associated interpreter return code. 'returnOptions' is likewise refactored. Note that if any of these receives a value that is known NOT to be of type fail, then it will be optimized away: in the TCL_OK case, the result is simply the input value; the return code is zero, and the return options are {-code 0 -level 0}. All the tests that passed before this change continue to pass, so I've merged back into the trunk. The new 'errortest8' test is informative: try running it with tclsh8.x demos/perftest/tester.tcl \ -just errortest8 -quadcode-log widen You'll see that the fair-weather case (where nothing throws) bypasses all the 'catch' machinery altogether and simply constructs the return value. There is no repeated reaching into the Tcl interpreter. The next thing that I'd like to do is to modify 'initException' for the special case where the last two args are {literal 1} {literal 0}, which is the common case. This generates ugly code, since it constructs a FAIL STRING with an empty string, only to have the string part discarded immediately because the next thing that happens to the object will always be an 'extractFail'. If we could adjust both quadcode/types.tcl and the code issuer so that the result of 'initException' is a plain FAIL in that case, that would streamline things a bit. It may not be worth worrying about, since we probably don't care as much about the performance of 'catch' when it actually catches an exception. Also, could you check me out on https://core.tcl-lang.org/tclquadcode/artifact?ln=3555-3567&name=ccad694ee8 ? I thought that FOREACH was a reference type and I'd have to dropReference when no longer using a FAIL FOREACH, but I can't find even a dropReference(FOREACH) to call. I think I'm probably leaking memory, but don't know what to do to plug it. |
From: Kevin K. <kev...@gm...> - 2018-02-22 03:15:47
|
Continuing from my previous message, I've tried simply enabling inlining of procs that might throw, and was pleasantly surprised how few problems remain with that step. (1) There's some mislinking of the 'moveFromCallFrame' for 'errorInfo' in the 'throwCheck' test case. I'll track this one down; I'm sure that it's simply tickling a previously unexercised bug in callframe.tcl. (2) 'errortest2-caller', 'errortest4b', 'expandtest::test9', 'expandtest::test10', 'expandtest::test11' and 'expandtest::test12' all are reporting rubbish for -errorline. I've managed to track this down to the fact that IssueInvoke, IssueInvokeCommand, and IssueInvokeExpanded are all calling SetErrorLine. Is there a way that I can get SetErrorLine (and whatever other exit processing is needed: storing errorCode as well?) to be done when I'm taking the error exit from an inlined procedure? (For what it's worth, I will have a place that's known to be on the error exit branch where I can generate arbitrary code if I know what to generate.) (3) calltest2 has the resolved function name in the backtrace rather than the function name as invoked. This probably isn't all that important initially. In fact, I think we have that problem throughout, and I'll have to consider carrying the unresolved name as yet another argument to 'invoke' and friends just for the error messages! Beyond that, and the 'initException' and 'FOREACH' issues from my previous email, I think the current tranche of changes will be ready to merge again. They give small but solid gains - the FlightAware benchmark goes from roughly a 1400% speedup on my machne to a 1700% speedup with inlining. Recursion doesn't break it. The 'mrtest' test gives a gain of about 1050% before the changes and 1150% afterwards, and doesn't get into runaway - essentially, what it does is to inline all the switch cases into 'calc', but inlines 'calc' nowhere. After these minor issues are settled, it'll be on to callframe access inside the inlined procedure, and I'll need guidance about pushing and popping frames. I know when I'm entering and leaving the inline procedure, and so can generate quadcode for the callframe management, but have Absolutely No Idea how to do so, Then will come the task of resolving upvar (and uplevel!) inside the inline code. That's where the real wins will start coming, although the small performance gains so far are probably worth the work done so far. I think that in most cases, the simple 'upvar 1' cases will go away, replaced with direct access to the variable in the enclosing scope. I have some ideas how to accomplish this, but need to have the inner and outer callframes both available to me before I can start trying to implement this in earnest. A side note: When I started tracing through IssueEntry, I notice that the code appears to ignore the variable list that's on the 'entry' quadcode in favor of the one listed in the bytecode. I'd have to doublecheck, and I think that the 'entry' instruction will, in general, have a shorter list, because it won't contain variables that the compiler can prove won't be accessed by name. (Consider a procedure with lots of variables, but the only callframe access being [scan $s %d x].) Do you think it would be much of a win to leave out the slots for variables that won't be used nonlocally? Hope you're doing OK; I haven't heard from you in a little while. |
From: Kevin K. <kev...@gm...> - 2018-02-21 18:12:01
|
The following message appears to have vanished into the aether - while I see it in my 'Sent Items' folder, it's not on the tcl-quadcode archives, so I assume some mailer ate it. Resending: On Mon, Feb 19, 2018 at 12:41 AM, Kevin Kenny <kev...@gm...> wrote: > At last, I think I have things set up so that the interpreter status, > Tcl result, and options dictionary all flow into a catch explicitly as > a FAIL value. I managed to make all the changes in the code > issuer myself. > > This needed the refactoring of a surprising number of quadcodes: > > setReturnCode > Changed to produce an explicit FAIL (but see below!) > > throwIfNotExists, throwNotExists, initIfNotExists > Changed to make an 'exists' check, and then do the right > thing with combinations of 'initException', 'copy' and > conditional jumps. These three quadcodes are now eliminated. > > throwIfScalar, throwIsScalar, throwIfArray, throwIsArray > Changed to use appropriate combinations of 'exists', > 'arrayExists', 'initException' and conditional jumps. These > four quadcodes are now eliminated. > > narrowToParamType, narrowToNotParamType > These quadcodes, it turns out, were never generated, so > support for them is removed. > > checkParamType, throwIfNotParamType - > These quadcodes are refactored into 'instanceOfParamType' > (a new quadcode), 'initParamTypeException' (another new > quadcode), and conditional jumps. The new quadcodes do NOT > propagate into the code issuer, because a later pass replaces > them with 'instanceOf' and 'initException' once the parameter > types are known. 'checkParamType' and 'throwIfNotParamType' > are eliminated. > > checkArithDomain, throwArithDomainError - > These quadcodes are refactored into 'instanceOf', > 'initException', and conditional jumps. They are eliminated. > > result, returnCode, returnOptions > 'result' is refactored to accept a value that includes FAIL > among its types, and returns the result value associated with > it. 'returnCode' is refactored also to accept a value of type > {FAIL SOMETHING} and return the associated interpreter return > code. 'returnOptions' is likewise refactored. Note that if > any of these receives a value that is known NOT to be of type > fail, then it will be optimized away: in the TCL_OK case, the > result is simply the input value; the return code is zero, and > the return options are {-code 0 -level 0}. > > All the tests that passed before this change continue to pass, so I've > merged back into the trunk. The new 'errortest8' test is informative: > try running it with > > tclsh8.x demos/perftest/tester.tcl \ > -just errortest8 -quadcode-log widen > > You'll see that the fair-weather case (where nothing throws) bypasses > all the 'catch' machinery altogether and simply constructs the return > value. There is no repeated reaching into the Tcl interpreter. > > > The next thing that I'd like to do is to modify 'initException' for > the special case where the last two args are {literal 1} {literal 0}, > which is the common case. This generates ugly code, since it > constructs a FAIL STRING with an empty string, only to have the string > part discarded immediately because the next thing that happens to the > object will always be an 'extractFail'. If we could adjust both > quadcode/types.tcl and the code issuer so that the result of > 'initException' is a plain FAIL in that case, that would streamline > things a bit. It may not be worth worrying about, since we probably > don't care as much about the performance of 'catch' when it actually > catches an exception. > > > Also, could you check me out on > https://core.tcl-lang.org/tclquadcode/artifact?ln=3555-3567&name=ccad694ee8 > ? > I thought that FOREACH was a reference type and I'd have to > dropReference when no longer using a FAIL FOREACH, but I can't find > even a dropReference(FOREACH) to call. I think I'm probably leaking > memory, but don't know what to do to plug it. |
From: Kevin K. <kev...@gm...> - 2018-01-30 04:39:02
|
On Mon, Jan 29, 2018 at 9:22 PM, Kevin Kenny <kev...@gm...> wrote: > On Mon, Jan 29, 2018 at 6:54 PM, Donal K. Fellows < > don...@ma...> wrote: > >> On 29/01/2018 18:14, Kevin Kenny wrote: > > One thing that might be an approach: Any time we generate a 'jumpMaybe' >>> in 'translate.tcl', we're testing a FAIL result, and we uniformly simply >>> discard it. If we moved that result to {temp @fail}, and then had the >>> 'catch' processing examine that temporary, I'm guessing that would close >>> the data flow for most of the cases. It should be, in general, very close >>> to zero cost to do so, since copy propagation, dead value elimination, and >>> suchlike should get rid of excess data motion. >>> >> >> Sounds very possible. Probably ought to be a special variation of copy >> that throws away the non-FAIL type information to get a pure FAIL so that >> we can then put the result through a phi without going crazy. But I've not >> thought about that bit much and I could be dead wrong too. ;-) >> > > Yeah, I was thinking the same thing. We have 'extractMaybe' in the basic > block that follows a 'jumpMaybe', to discard the FAIL information if the > operation doesn't fail. We need the corresponding 'extractFail' to isolate > the FAIL when 'jumpMaybe' is taken. That part's easy. It's the exact same > logic as any other type narrowing after a conditional jump. There's the > logic already in 'narrow.tcl' to do that for EXISTS ('exists' and > 'throwIfNotExists'); various builtin types ('instanceOf' and > 'checkArithDomain'), array and scalar (throwIfArray; throwIfScalar; > arrayExists); IMPURE ('purify') and even the other side of FAIL > ('extractMaybe'). One more switch case is NOT a problem! > > I think I can do this one first. It should be a fairly self-contained > change, since all it does is type narrowing. > Oops. There are four pairs of quadcodes where I don't have the FAIL result available. throwIfNotExists/throwNotExists throwIfArray/throwIsArray throwIfScalar/throwIsScalar checkArithDomain/throwArithDomainError What I would like to do is to refactor all of these so that they are instead: checkExists {temp ...} {var something} ;# Returns FAIL EMPTY jumpMaybe {pc ...} {temp ...} ;# Two-instruction sequence that replaces throwIfNotExists I won't need the 'throwIf' versions of the instructions at all, so I'm happy to call them all 'check' and reuse the 'checkArithDomain' name. In the case of the unconditional throws, I think The Right Thing is to replace them with 'initException' and 'jump' - and then I need to work on making that particular 'initException' generate a pure FAIL. It looks to me in the code generation as if, internally, it's doing almost exactly what I suggest - except that it inlines the jump. Could you help me with code gen for the broken-up versions of the quadcodes? I can get them into translate.tcl, narrow.tcl and types.tcl, I think. I'm guessing that you'll have suggestions for further streamlining the API, and in any case, it's later in the evening than I want to start hacking on them in the 'middle end'. I have generation of the rest of the data flow to get the FAIL into catch blocks in my sandbox, except for the fact that I have a dozen or so test cases that are aborting. These may be the result of the quadcodes above, or may be a separate bug. That'll be the thing to do in the next debugging session. |
From: Kevin K. <kev...@gm...> - 2018-01-30 02:22:59
|
On Mon, Jan 29, 2018 at 6:54 PM, Donal K. Fellows < don...@ma...> wrote: > On 29/01/2018 18:14, Kevin Kenny wrote: > >> So - it seems like the Right Thing at least notionally to attach the >> return information to a value in the quadcode. Two possibilities come to >> mind: >> > > I agree with your analysis… > > (1) The callframe. This is the notional object that we manipulate when >> entering and leaving procedures, and is at hand when we are doing >> operations like 'return'. >> > > When I've coded things with it, I've treated it like a placeholder for all > the implicit state. That's probably not the right thing to do. It isn't very wrong. Since updates to the implicit state consume and produce the callframe, that keeps me from reordering them, which is definitely right. We can get a long way using the callframe that way. > (2) Objects of type FAIL (which will probably need a few more bits of type >> information at least while the quadcode optimizer is running, to track >> "this might be an error", "this might be a break/continue/return," "this >> might return non-locally (nonzero -level)," and maybe one or two other >> things. >> > > Currently, they're a tuple of the non-FAIL value and the current return > code. The original design (a bit to say fail-or-not) had to be extended to > make getting results out of 'invoke' work right. Theoretically, the FAIL > type is the right place to have the extended failure information, but I'm > wary about doing that as the extended failure information is fairly > expensive (a dictionary and some other fields). OK. > So - I think that associating return options, level, and code with the >> FAIL object (maybe not in code generation, but at least notionally) is more >> likely to be the right thing. >> > > It's either that or stashing it in some new “global-ish” entity. Which > seems messy as it interferes with SSA. If we name that global-ish thing something explicit like {temp @fail} in translate.tcl, then I can SSA-ify it. That's what happens with the CALLFRAME - translate.tcl calls it {temp @callframe}, and I know that there's only ever one of it live because the SSA transformation can't introduce another live copy. > What then has to change for me to start tracking return status? I identify >> the following quadcodes that will need to look at it. >> > > You're largely on it. Here's a few notes from what I can remember. > > startCatch - Does the code generator care about startCatch, or is it just >> there in the optimizer to mark a place in the code where ::errorInfo and >> ::errorCode need to be spoilt because the presence of a FAIL may have >> messed them up? (At present, 'startCatch' is overconservative and spoils >> any potential linked variable, but that's because it can't prove that >> ::errorCode or ::errorInfo aren't aliased!) >> > > I believe that if we get the failure code threading right, that'll become > simpler, possibly even non-existent. Good. I didn't think that 'startCatch' did anything - I didn't mean for it to. I simply introduced it because I needed to represent 'Something just threw an exception, make sure ::errorInfo isn't stale". returnException - This is now dead, right? >> > > Yes. Deader than a doornail. And unmourned. > result, returnCode, returnOptions - These are already coded to look at the >> callframe. That may be all right, depending on where they might look at it. >> If we do something like pull the 'returnCode' of a failed 'div', what's >> tracking the dependency? >> > > That's why what we're doing now is wrong, and why the information needs to > become associated with a FAIL. ;-) OK, that'll be probaby be the *second* change I make as I'm implementing this, see below for what happens procLeave - This is relatively new. What's it doing? >> >> > From my doc-comments on tcl.procedure.return (that's the implementation of > procLeave) in codegen/stdlib.tcl… > > # Handles the transforms on a result when a procedure returns. See > # InterpProcNR2 in tclProc.c for what is going on; this is the part > # commencing at the 'process' label. > > Basically, for a non-zero exit, there's a bunch of transformations to do > when a procedure exits that correspond to the sequence of instructions that > occur after the bytecode has been evaluated. This is things like saying > what line of the procedure had the error, calling TclUpdateReturnInfo() to > modify the level, claiming that leaking breaks and continues are errors, > that sort of thing. You need to keep that in there in the non-TCL_OK exit > path when things are inlined, and yes, it can touch both the errorinfo and > errorcode (that's its main job). So it should (notionally) take a FAIL, decorate the FAIL with the backtrace and adjusted level, handle uncaught breaks and continues, and yield another FAIL? Where I'm trying to get with all this is that I want to be able to split >> code so that normal anf FAIL paths are separate when coming out of a >> procedure. This will, in turn, let me peel off 'jumpMaybe' and fallthrough >> paths separately and hopefully enable further dataflow-driven optimization. >> It shouldn't be too difficult provided that I can work out the data flows, >> but that's turning out to be quite a challenge because of the implicit >> flows: initException, operations like 'div', and 'return' are all stashing >> code and options into the interpreter state, and >> result/returnCode/returnOptions are retrieving them, without giving me >> any way to trace the flow back to the operation that put them there. >> > > Yes, and I've not been very happy with it. I've just not seen any good way > to get the code out of the translator such that it will work out. But as my > response to your next paragraph indicates, that's a lack of imagination on > my part. ;-) > > One thing that might be an approach: Any time we generate a 'jumpMaybe' >> in 'translate.tcl', we're testing a FAIL result, and we uniformly simply >> discard it. If we moved that result to {temp @fail}, and then had the >> 'catch' processing examine that temporary, I'm guessing that would close >> the data flow for most of the cases. It should be, in general, very close >> to zero cost to do so, since copy propagation, dead value elimination, and >> suchlike should get rid of excess data motion. >> > > Sounds very possible. Probably ought to be a special variation of copy > that throws away the non-FAIL type information to get a pure FAIL so that > we can then put the result through a phi without going crazy. But I've not > thought about that bit much and I could be dead wrong too. ;-) > Yeah, I was thinking the same thing. We have 'extractMaybe' in the basic block that follows a 'jumpMaybe', to discard the FAIL information if the operation doesn't fail. We need the corresponding 'extractFail' to isolate the FAIL when 'jumpMaybe' is taken. That part's easy. It's the exact same logic as any other type narrowing after a conditional jump. There's the logic already in 'narrow.tcl' to do that for EXISTS ('exists' and 'throwIfNotExists'); various builtin types ('instanceOf' and 'checkArithDomain'), array and scalar (throwIfArray; throwIfScalar; arrayExists); IMPURE ('purify') and even the other side of FAIL ('extractMaybe'). One more switch case is NOT a problem! I think I can do this one first. It should be a fairly self-contained change, since all it does is type narrowing. Are we intending to have two {temp @fail}-derived values hanging around at > once, or can I do what I do now and keep the data in the Tcl_Interp? > For now, I'm trying to have the FAIL limited to tracking the data flow notionally; I don't see anything that's going to have multiple FAILs in flight. The big issue is to make sure that the FAIL can be followed into 'result', 'returnCode', 'returnOptions', and I think I've got a handle on that now. I'm going to merge the current 'inline' branch into trunk - it doesn't introduce any test failures other than new tests, which are bugs in this exact logic. That will give me a cleanly-anchored place to continue, leaving the 'inline' branch open. I'll follow up with discussions of the havoc that I'll be wreaking on the quadcodes. |
From: Donal K. F. <don...@ma...> - 2018-01-29 23:54:49
|
On 29/01/2018 18:14, Kevin Kenny wrote: > So - it seems like the Right Thing at least notionally to attach the > return information to a value in the quadcode. Two possibilities come to > mind: I agree with your analysis… > (1) The callframe. This is the notional object that we manipulate when > entering and leaving procedures, and is at hand when we are doing > operations like 'return'. When I've coded things with it, I've treated it like a placeholder for all the implicit state. That's probably not the right thing to do. > (2) Objects of type FAIL (which will probably need a few more bits of > type information at least while the quadcode optimizer is running, to > track "this might be an error", "this might be a break/continue/return," > "this might return non-locally (nonzero -level)," and maybe one or two > other things. Currently, they're a tuple of the non-FAIL value and the current return code. The original design (a bit to say fail-or-not) had to be extended to make getting results out of 'invoke' work right. Theoretically, the FAIL type is the right place to have the extended failure information, but I'm wary about doing that as the extended failure information is fairly expensive (a dictionary and some other fields). > So - I think that associating return options, level, and code with the > FAIL object (maybe not in code generation, but at least notionally) is > more likely to be the right thing. It's either that or stashing it in some new “global-ish” entity. Which seems messy as it interferes with SSA. > What then has to change for me to start tracking return status? I > identify the following quadcodes that will need to look at it. You're largely on it. Here's a few notes from what I can remember. > startCatch - Does the code generator care about startCatch, or is it > just there in the optimizer to mark a place in the code where > ::errorInfo and ::errorCode need to be spoilt because the presence of a > FAIL may have messed them up? (At present, 'startCatch' is > overconservative and spoils any potential linked variable, but that's > because it can't prove that ::errorCode or ::errorInfo aren't aliased!) I believe that if we get the failure code threading right, that'll become simpler, possibly even non-existent. > returnException - This is now dead, right? Yes. Deader than a doornail. > result, returnCode, returnOptions - These are already coded to look at > the callframe. That may be all right, depending on where they might look > at it. If we do something like pull the 'returnCode' of a failed 'div', > what's tracking the dependency? That's why what we're doing now is wrong, and why the information needs to become associated with a FAIL. ;-) > procLeave - This is relatively new. What's it doing? > From my doc-comments on tcl.procedure.return (that's the implementation of procLeave) in codegen/stdlib.tcl… # Handles the transforms on a result when a procedure returns. See # InterpProcNR2 in tclProc.c for what is going on; this is the part # commencing at the 'process' label. Basically, for a non-zero exit, there's a bunch of transformations to do when a procedure exits that correspond to the sequence of instructions that occur after the bytecode has been evaluated. This is things like saying what line of the procedure had the error, calling TclUpdateReturnInfo() to modify the level, claiming that leaking breaks and continues are errors, that sort of thing. You need to keep that in there in the non-TCL_OK exit path when things are inlined, and yes, it can touch both the errorinfo and errorcode (that's its main job). > Where I'm trying to get with all this is that I want to be able to split > code so that normal anf FAIL paths are separate when coming out of a > procedure. This will, in turn, let me peel off 'jumpMaybe' and > fallthrough paths separately and hopefully enable further > dataflow-driven optimization. It shouldn't be too difficult provided > that I can work out the data flows, but that's turning out to be quite a > challenge because of the implicit flows: initException, operations like > 'div', and 'return' are all stashing code and options into the > interpreter state, and result/returnCode/returnOptions are retrieving > them, without giving me any way to trace the flow back to the operation > that put them there. Yes, and I've not been very happy with it. I've just not seen any good way to get the code out of the translator such that it will work out. But as my response to your next paragraph indicates, that's a lack of imagination on my part. ;-) > One thing that might be an approach: Any time we generate a 'jumpMaybe' > in 'translate.tcl', we're testing a FAIL result, and we uniformly simply > discard it. If we moved that result to {temp @fail}, and then had the > 'catch' processing examine that temporary, I'm guessing that would close > the data flow for most of the cases. It should be, in general, very > close to zero cost to do so, since copy propagation, dead value > elimination, and suchlike should get rid of excess data motion. Sounds very possible. Probably ought to be a special variation of copy that throws away the non-FAIL type information to get a pure FAIL so that we can then put the result through a phi without going crazy. But I've not thought about that bit much and I could be dead wrong too. ;-) Are we intending to have two {temp @fail}-derived values hanging around at once, or can I do what I do now and keep the data in the Tcl_Interp? Donal. |
From: Kevin K. <kev...@gm...> - 2018-01-29 18:14:44
|
Donal, (copy to the list so the 'stream of consciousness' doesn't get broken) As you can see from the weekend commits, I'm getting back into procedure inlining; sorry about the delay, but an illness simply left me with no energy for it. I think I've manage to tie off invocation of compiled procedures with variable args prettily. It's interesting how atrocious the generated code is initially; I'm starting to embrace the idea of 'don't worry about generating disorganized or inefficient code, because the cleanup passes in the optimizer will tidy it again.' I started to look once again at handling error returns from inlined procedures, and I'm up against a bit of a design impasse. I think you were right in your observation some time ago that we need to make some of the data dependencies explicit in order to handle things, but I'm now struggling with bounding the issue. The first thing that I'm tripping over is the quads such as 'result', 'returnCode,' and 'returnOptions'. These operate on some sort of abstract interpreter state, rather than having an explicit dependence. They also get their data set as as side effect of returning from a procedure, which means that they won't have the stuff available when exiting from inline code. Not having an explicit dependency is also a lot more work when doing some of the more aggressive optimizations. The optimizer has to analyze up and down the control flow to determine where the data are coming from, and prevent instruction reordering that might spoil the flow. So - it seems like the Right Thing at least notionally to attach the return information to a value in the quadcode. Two possibilities come to mind: (1) The callframe. This is the notional object that we manipulate when entering and leaving procedures, and is at hand when we are doing operations like 'return'. (2) Objects of type FAIL (which will probably need a few more bits of type information at least while the quadcode optimizer is running, to track "this might be an error", "this might be a break/continue/return," "this might return non-locally (nonzero -level)," and maybe one or two other things. One drawback to associating with the callframe is that procedures often have callframe elimination - there's no poiint in pushing a callframe where nobody will ever look up the local variables. The callframe is therefore actually NOT always available when we need it. (This is also true when we consider locally-thrown errors and the fact that [catch] code has to handle them as well. The sequences for quads like 'div' or 'lappend' don't refer to the callframe.) So - I think that associating return options, level, and code with the FAIL object (maybe not in code generation, but at least notionally) is more likely to be the right thing. What then has to change for me to start tracking return status? I identify the following quadcodes that will need to look at it. initException - This is where a lot of FAILS (and a lot of other things) are born. It doesn't ordinarily reference the CALLFRAME, and at present produces a FAIL STRING.n invoke and invokeExpanded - I'll worry about these. They already have the CALLFRAME FAIL WHATEVER object, so no change to their interface. startCatch - Does the code generator care about startCatch, or is it just there in the optimizer to mark a place in the code where ::errorInfo and ::errorCode need to be spoilt because the presence of a FAIL may have messed them up? (At present, 'startCatch' is overconservative and spoils any potential linked variable, but that's because it can't prove that ::errorCode or ::errorInfo aren't aliased!) return - I understand what happens here - a FAIL WHATEVER together with the current CALLFRAME come in, and the procedure returns a CALLFRAME FAIL WHATEVER returnException - This is now dead, right? retrieveResult - Accepts a CALLFRAME FAIL WHATEVER and extracts the FAIL WHATEVER; no problem. result, returnCode, returnOptions - These are already coded to look at the callframe. That may be all right, depending on where they might look at it. If we do something like pull the 'returnCode' of a failed 'div', what's tracking the dependency? procLeave - This is relatively new. What's it doing? Where I'm trying to get with all this is that I want to be able to split code so that normal anf FAIL paths are separate when coming out of a procedure. This will, in turn, let me peel off 'jumpMaybe' and fallthrough paths separately and hopefully enable further dataflow-driven optimization. It shouldn't be too difficult provided that I can work out the data flows, but that's turning out to be quite a challenge because of the implicit flows: initException, operations like 'div', and 'return' are all stashing code and options into the interpreter state, and result/returnCode/returnOptions are retrieving them, without giving me any way to trace the flow back to the operation that put them there. One thing that might be an approach: Any time we generate a 'jumpMaybe' in 'translate.tcl', we're testing a FAIL result, and we uniformly simply discard it. If we moved that result to {temp @fail}, and then had the 'catch' processing examine that temporary, I'm guessing that would close the data flow for most of the cases. It should be, in general, very close to zero cost to do so, since copy propagation, dead value elimination, and suchlike should get rid of excess data motion. Thoughts? Kevin |
From: Donal K. F. <don...@ma...> - 2018-01-23 23:17:44
|
On 17/01/2018 20:52, Kevin Kenny wrote: > I notice, though, that there seems to be trouble in testing the error path. It's entirely possible that the problems are plain old real bugs that were there before and we simply didn't have tests adequate to detect the problems. Error path code is really quite tricky indeed as there's a lot of critical implicit context at the Tcl level. I'll need to dig into it on trunk... (I've been — slowly — trying to get improvements done to llvmtcl to support our next phase of changes, and thinking about debugging info. Which is much harder than it appears to be at first glance due to the need for inlined function type signatures, and the fact that simplistic approaches such as a stack of inline contexts in the code generator won't work at all as basic blocks don't work that way. Bleah.) Donal. |
From: Kevin K. <kev...@gm...> - 2018-01-17 20:52:46
|
I managed today to code up and commit a refactor of varargs handling on the 'inline' branch. With this in place, even an ordinary 'invoke' should never tickle the 'wrong # args' case when not going through a call stub. I notice, though, that there seems to be trouble in testing the error path. Test cases ::expandtest::test9 through ::expandtest::test12 are all trying to exercise the 'wrong # args' code through different paths. On each of those, there's a [catch] inside the test procedure, which should be swallowing the error information and returning just an edited version of the error message. Nevertheless, when the test harness examines the return options dictionary, it's seeing -errorcode and -errorline, and complaining about them. Any idea what I'm doing wrong here? Aside from that, I think the branch, while it doesn't do effective inlining, is ready to merge so as to get the varargs problems fixed in trunk. I can continue developing the inline stuff on the branch. |
From: Donal K. F. <don...@ma...> - 2018-01-12 12:28:35
|
On 11/01/2018 15:46, Kevin Kenny wrote: > :confused_expression: Could an 'initException' with {literal 0} as its > return code be replaced with a copy? If not, should its return type be > STRING, FAIL STRING, or the type of the 'result' arg? If the return code is 0 and the level is 0, it can become a copy. If either is non-zero, we will have a FAIL of some kind. I've not traced what happens in type-terms when we have a code=0/level=1 that passes through a catch context on its way to the procedure exit; it's entirely possible that we end up with things getting boxed as a STRING if we do that. Donal. |
From: Kevin K. <kev...@gm...> - 2018-01-11 15:47:03
|
On Thu, Jan 11, 2018 at 6:39 AM, Donal K. Fellows < don...@ma...> wrote: > On 10/01/2018 16:21, Kevin Kenny wrote: > >> Would it cause code generation tremendous heartburn if 'initException' >> were to return a simple FAIL (rather than a FAIL STRING) if its return >> code argument is {literal 1}? >> > > I believe I could live with that. Especially if a {literal 0} is also > special cased and if we can do some node splitting; that would make the > code out of [try/finally] rather nicer than it is at the moment. > :confused_expression: Could an 'initException' with {literal 0} as its return code be replaced with a copy? If not, should its return type be STRING, FAIL STRING, or the type of the 'result' arg? |
From: Donal K. F. <don...@ma...> - 2018-01-11 11:39:32
|
On 10/01/2018 16:21, Kevin Kenny wrote: > Would it cause code generation tremendous heartburn if 'initException' > were to return a simple FAIL (rather than a FAIL STRING) if its return > code argument is {literal 1}? I believe I could live with that. Especially if a {literal 0} is also special cased and if we can do some node splitting; that would make the code out of [try/finally] rather nicer than it is at the moment. Donal. |