You can subscribe to this list here.
2017 |
Jan
|
Feb
(2) |
Mar
(6) |
Apr
(4) |
May
(20) |
Jun
(15) |
Jul
(4) |
Aug
(2) |
Sep
(6) |
Oct
(6) |
Nov
(20) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2018 |
Jan
(16) |
Feb
(3) |
Mar
(7) |
Apr
(40) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
(1) |
2019 |
Jan
(7) |
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Kevin K. <kev...@gm...> - 2017-05-14 04:22:19
|
On Sat, Mar 25, 2017 at 11:31 AM, Donal K. Fellows < don...@ma...> wrote: > I've moved to now running with the full suite of procedures in demo.tcl, > and I'm now hitting a problem that appears to be caused by wrong input > to the code issuer. The problem is that there's quite a bit of code that > is generating 'invoke' opcodes whose result type is plain CALLFRAME. The > problem is particularly acute (i.e., causing full failures) when > compiling ::flightawarebench::latlongs_to_distance; it's relying on > ::flightawarebench::degrees_radians, which has three defined type > specialisations (that I know of), being effectively STRING->FAIL DOUBLE, > DOUBLE->DOUBLE and INT->DOUBLE, but the higher-level code is looking for > IMPURE NUMERIC->NOTHING, and that's just going wonky as the NOTHING then > spoils later code that calls the cos() builtin, which doesn't support > that input type at all. (I am right in saying that the NOTHING type > indicates that there are no possible values of that type?) I haven't tried to merge into the 'callframe' branch, but I believe that the logic that's now on trunk should be generating all the function types that are actually used. The NOTHING was a bug, it's simply what appears as the return type of a function that hasn't had its code generated yet. The specializer is reworked to fix the bug that missed generating the IMPURE NUMERIC version, and the recent reworking of SSA repair actually make all the versions compilable. Kevin |
From: Kevin K. <kev...@gm...> - 2017-05-14 04:16:48
|
I'm vaguely remembering a recent conversation in which dkf mentioned that errortest2-caller was commented out of demo.tcl because it caused the compiler to throw a tantrum with a complaint about NOTHING reaching a 'phi' operation. I couldn't find it in either the mailing list archive or the tickets, so it was probably in the Tcl'ers Chat or some such. I really don't like to have problem reports only in an informal medium like that. I'm not organized enough to keep track of them. Once I remembered, I opened a ticket (e1f697c808) to track the problem. It turns out that the issue is 'errortest2' once again. This procedure always throws an error - it never returns any other way. That behaviour was causing its return type to be a bald FAIL. Continuing from my comments on the ticket: The code that extracts the result is unreachable, because the 'jumpMaybe' will always jump. But the compiler at present doesn't know this (there's a fairly simple patch to cleanupNarrow that would optimize away the 'success' branch in this case, if we want to take that route). That causes the result of 'extractMaybe' to be Nothing. The Nothing happens to end up in a phi. The node splitting has a check that none of a phi's arguments is Nothing, because at one time, Nothing indicated that the phi had an undefined value for one of its args. The check appears to serve no useful purpose now. After simply patching out the check, the code aborted in the code generator, because there are numerous places where a bald FAIL or a Nothing value are not acceptable. Adding the change to 'cleanupNarrow' made the code compilable, but it started to fail at runtime with a nonsense value for the return code. I decided to work around the problem instead, and modified the type system so that a procedure that always throws an exception has a type of FAIL STRING instead of FAIL. (I'd have preferred FAIL EMPTY, but that isn't implemented, either.) With that change, the code is compilable, but the test fails. The problem is now that the '-errorline' is wrong in the options dictionary when the error is caught. I have absolutely no idea without source-diving in the code issuer where the -errorline might be coming from. I committed the workaround to a branch, 'kbk-bug-e1f697c808'. If dkf is willing to tackle this one, I'll postpone delving into whatever logic in the code issuer generates '-errorline'. I suspect that getting it right will depend on getting information from the compiler front end that isn't being passed. I'll gladly try to work on that, if I can get guidance on what the code issuer needs. Kevin |
From: Kevin K. <kev...@gm...> - 2017-05-13 22:15:23
|
Today, I finished adapting the "node splitting" ("loop peeling", "jump threading", whatever you want to call the optimization) logic to use the "black box" SSA reconstruction algorithms from Chapter 5 of "SSA-based Compiler Design" http://ssabook.gforge.inria.fr/latest/book.pdf. The adaptation went just as smoothly as I hoped in my messages of yesterday, with only one "impossible" bug that took more than a few minutes to track down. The changes feel much more robust than what was in there before. The code volume is less, and I have the feeling that I can explain what is going on at each step, rather than dealing with algorithms that have an air of mystery about them. I also fixed Donal's problem with widening of procedure return values. There is now always a temporary generated for the widened result, never the bogus assertion that a literal is being widened to itself. The 'wideretest' demo case, I see, widens the string literal in the return value to FAIL STRING. Am I to assume that this is the last piece of the puzzle for closing out ticket [ceb218c3d8]? I've been so mired in SSA reconstruction for the past weeks that I've lost sight of what we were working on, and I see that Donal has run ahead of me once again on callframe management. I'm guessing that my next task is going to be to work out where I can safely declare that a given procedure does not need a callframe - eliminating the overhead of pushing one. In the meantime, though, I've a vague recollection that Donal had some other urgent requests for me that I've mislaid entirely. I'll happily jump in on urgent material if I can recall what it was. Kevin |
From: Kevin K. <kev...@gm...> - 2017-05-13 21:48:27
|
Go ahead and close your branch. Bug [5800594ba4] is fixed in trunk. The fix consisted mostly of removing code that fell in the general category of, "I don't know WHAT I was thinking." On Sat, May 13, 2017 at 10:37 AM, Donal K. Fellows < don...@ma...> wrote: > On 13/05/2017 12:43, Kevin Kenny wrote: > >> widenTo and a literal? Bug! I'll track it down. Did you open a ticket? >> > > Life intruded, but http://core.tcl.tk/tclquadcode/tktview?name=5800594ba4 > now exists. > > Don't do #2 except as a workaround. The intent of the code is that >> 'return' should always get an instance of the correct type. >> > > That's why the code that does it is on a branch. I'm hoping that it is a > branch that never gets merged to trunk. :-) > > Donal. > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Tcl-quadcode mailing list > Tcl...@li... > https://lists.sourceforge.net/lists/listinfo/tcl-quadcode > > |
From: Donal K. F. <don...@ma...> - 2017-05-13 14:37:29
|
On 13/05/2017 12:43, Kevin Kenny wrote: > widenTo and a literal? Bug! I'll track it down. Did you open a ticket? Life intruded, but http://core.tcl.tk/tclquadcode/tktview?name=5800594ba4 now exists. > Don't do #2 except as a workaround. The intent of the code is that > 'return' should always get an instance of the correct type. That's why the code that does it is on a branch. I'm hoping that it is a branch that never gets merged to trunk. :-) Donal. |
From: Kevin K. <kk...@ny...> - 2017-05-13 11:44:00
|
On 05/13/2017 04:23 AM, Donal K. Fellows wrote: > However, I found a bug in the quadcode generation. If we use the > sample procedure dictest in demo.tcl, I see the instruction sequence: > > 38: {widenTo -1073938433 {FAIL STRING}} {literal {nothing at_all} 0} > {literal {nothing at_all}} : STRING ⇐ STRING > 39: return {} {literal {nothing at_all} 0} : VOID ⇐ STRING > > (Output generated with: tclsh8.6 demo.tcl -debug 1 -just dictest > -iterations 5) > > The (large amount of) rest of the output is uninteresting, but this is > a problem because it effectively specifies a resource leak. (The > overall function result type is FAIL STRING.) That's leak is > technically because the result of widenTo is not saved in a variable > or temporary variable, making the transfer to the return just bogus. > I'd also expect deeper problems if there were multiple paths that > produced that instruction sequence. The deeper problem (from my > perspective) is that the literal that was passed to widenTo is being > propagated through it — generating the nonsense where a widenTo's > destination is a literal! — and then to the return, and this is in > turn generating a memory leak (the compiler code handles the type > mismatch at the return since there's been a history of versions which > needed it.) widenTo and a literal? Bug! I'll track it down. Did you open a ticket? > The workaround in that branch is two-fold: > 1: It generates a warning if a weird-looking widenTo occurs (instead > of generating the implementation of the widenTo; that code isn't > designed to handle that case). > 2: It teaches return to do the widening correctly, instead of the > half-baked job that it previously had (dating back ages now). Don't do #2 except as a workaround. The intent of the code is that 'return' should always get an instance of the correct type. -- 73 de ke9tv/2, Kevin |
From: Donal K. F. <don...@ma...> - 2017-05-13 08:24:04
|
On 12/05/2017 19:11, Kevin Kenny wrote: > I realize that it's been several weeks since I last posted. I most > likely owe the community a status report on what I've been up to in > quadcode. It's been a frustrating few weeks, trying to tease out just > what assumptions are implicit in the various algorithms that I've > found in papers. The authors of the papers do not always make clear > what they're assuming. There's significant parts of the guts of the advanced parts of LLVM that are just as bad. Everything to do with garbage collection, exception handling and debugging metadata is just as filled with unstated assumptions (and getting it wrong just causes a crash where the pieces you're left with make no sense). I've decoded the bits for debugging (and they now help me a bit, even if not as much as I'd like yet) and the other two just look like a dead loss to me. > I also see, alas, that the code for it isn't on > core.tcl.tk/tclquadcode <http://core.tcl.tk/tclquadcode>. Apparently > my Fossil synchronization has failed, possibly as a result of the > expired certificate. I'll check it out when I get home tonight - I'm > also having trouble with SSH access to the box that my personal repo > is on. Chiselapp's auto-sync is also troublesome. It's just not my > week for computer networking. I hand-synchronize, so the certificate issues are just annoying. (I've also built a recent fossil, but that was so that I can pull the sqlite so it might be unrelated.) ... However, I found a bug in the quadcode generation. If we use the sample procedure dictest in demo.tcl, I see the instruction sequence: 38: {widenTo -1073938433 {FAIL STRING}} {literal {nothing at_all} 0} {literal {nothing at_all}} : STRING ⇐ STRING 39: return {} {literal {nothing at_all} 0} : VOID ⇐ STRING (Output generated with: tclsh8.6 demo.tcl -debug 1 -just dictest -iterations 5) The (large amount of) rest of the output is uninteresting, but this is a problem because it effectively specifies a resource leak. (The overall function result type is FAIL STRING.) That's leak is technically because the result of widenTo is not saved in a variable or temporary variable, making the transfer to the return just bogus. I'd also expect deeper problems if there were multiple paths that produced that instruction sequence. The deeper problem (from my perspective) is that the literal that was passed to widenTo is being propagated through it — generating the nonsense where a widenTo's destination is a literal! — and then to the return, and this is in turn generating a memory leak (the compiler code handles the type mismatch at the return since there's been a history of versions which needed it.) The same issue crops up in the compilation of a few other procedures. I have a workaround in the dkf-workaround-widenTo-literal branch (which is able to run the full suite of test cases without any obvious resource leaks; the trunk currently has excessive retention of references to literals) but I'm not keen on merging that branch, as the actual wrong behaviour is elsewhere. :-) The workaround in that branch is two-fold: 1: It generates a warning if a weird-looking widenTo occurs (instead of generating the implementation of the widenTo; that code isn't designed to handle that case). 2: It teaches return to do the widening correctly, instead of the half-baked job that it previously had (dating back ages now). Donal. |
From: Kevin K. <kev...@gm...> - 2017-05-12 18:12:04
|
I realize that it's been several weeks since I last posted. I most likely owe the community a status report on what I've been up to in quadcode. It's been a frustrating few weeks, trying to tease out just what assumptions are implicit in the various algorithms that I've found in papers. The authors of the papers do not always make clear what they're assuming. The deep rabbit hole that I've gone down was the Choi-Sarkar-Schonberg paper. http://www.research.ibm.com/people/v/vsarkar/ps/ChSS96.ps. There is fairly little material in the literature about preserving SSA form across complex program transformations, and I thought that the paper would at least provide a starting point. At least so far, I've been unable to make any implementation of their algorithms truly robust. Deciding where to stop when working forward from inserting a new variable is really tough, the way they've set things up. I am guessing that they proved out their algorithm on something like Cytron's original SSA implementation without pruning, in which all source variables are considered 'live' and redundant φ operations are everywhere. I've spent many hours (and a fair amount of code committed to 'mistake' branches) trying to make it go. While I occasionally succeed in making it pass the demos, I have little confidence in it at this point. It truly is too brittle to work with. Fortunately, while browsing the massive work-in-progress "SSA-based compiler design" looking for something else, I stumbled upon the brief Chapter 5: "SSA Reconstruction" http://ssabook.gforge.inria.fr/latest/book.pdf#chapter.309 It's surely written by an author who shares my experience: Many optimizations perform such program modifications, and maintaining SSA is often one of the more complicated and error-prone parts in such optimizations, owing to the insertion of additional φ-functions and the correct redirection of the uses of the variable. (p. 61, last sentence) The exposition of the algorithms that it presents is quite lucid indeed, and the author attributes the first of the two, based on dominance frontiers, to the Choi paper. To my eye, there's fairly little in common except for the use of the iterated dominance frontier. Choi works forward from inserted assignments, while the algorithm as presented works backward from uses of the assigned variable. In any case, I was able to implement Algorithm 5.1 and the dominance-based reconstruction in Section 5.3 in an evening, and apply them to the optimization pass 'narrow', which inserts 'extractExists', 'unset', and 'narrowToType' operations on flow graph arcs that test the existence and type of variables. The implementation has a much cleaner feel to it that anything I've done for SSA repair and reconstruction up to this point. Once the typos were out, the 'narrow' pass worked on the entire demo suite without further twiddling - a very different experience from what I had trying to implement the algorithm in the Choi paper. The next step will be to try to expand this analysis to encompass the node splitting (loop peeling, jump threading) pass that allows elimination of type conversions in inner loops. This analysis is somewhat harder, since the node splitting wreaks havoc on the control and data flows. It destroys dominance relations, introduces irreducible flow graphs, and replaces single assignments to variables with multiple versions (one per split copy of the node). Nevertheless, I'm optimistic, because in the transformation, I can fairly readily enumerate all the places where a variable assignment has been replicated. That enumeration (plus ud- and du-chaining) appears to be all that the algorithm needs to do its job. It remains to be seen which of the two algorithms in the book I'll use for the node-splitting pass. I could stick to the version of the algorithm that uses dominance frontiers (possibly with dynamic updating of the frontiers as suggested on page 69). That would be the path of least resistance, since some of the cleanup optimizations may depend on dominance as well. If it turns out that the cleanup passes can be made to work easily without keeping the dominance tree up to date, then it would be easier and likely faster to go with the search-based algorithm of Section 5.4 for this task. In either case, I have considerably more confidence in the new approach. The fact that it Just Worked for the pass that inserts narrowing operations gives me a much better feeling that the algorithm is robust. The caller of the algorithms need only identify a set of variables that a transformation has caused to violate SSA, together with their ud- and du-chains. This identification depends only on information that the compiler has readily at hand. There is much to be said for the 'black box' approach of having algorithms to repair the SSA given only this information, that run independently of what transformation caused the code to violate SSA. The approach will allow for further aggressive transformations such as loop-invariant code motion without needing to invent other repairs. Some of the notes at the end of the chapter are interesting. In particular, the last paragraph, discussing φ-function minimization for irreducible flow graphs, is something that I discovered independently. I also see, alas, that the code for it isn't on core.tcl.tk/tclquadcode. Apparently my Fossil synchronization has failed, possibly as a result of the expired certificate. I'll check it out when I get home tonight - I'm also having trouble with SSH access to the box that my personal repo is on. Chiselapp's auto-sync is also troublesome. It's just not my week for computer networking. Kevin |
From: Donal K. F. <don...@ma...> - 2017-04-19 19:55:55
|
On 19/04/2017 15:16, Kevin Kenny wrote: > There's a test case, 'bug-7c599d4029' in demo.tcl that's commented out > of 'toCompile,' so we can learn whether the fix works. I think I've fixed it, or at least the test case now compiles and runs while producing the right answers. The only tricky bit was that there was a third place that I needed to update the change of function naming rules (in the tcl::mathfunc mapping code in tclapi.tcl). ;-) Donal. |
From: Kevin K. <kev...@gm...> - 2017-04-19 14:16:59
|
On Wed, Apr 19, 2017 at 6:56 AM, Donal K. Fellows < don...@ma...> wrote: > On 18/04/2017 21:56, Kevin Kenny wrote: > >> So, is there a way that we can make the Naming of Names not give rise >> to a lot of conflict? >> > > The simplest technique would be to make the function name use the numeric > typecode instead of the high-level implementation name, which will involve > changing IssueInvoke as well as generateDeclaration (they're the things > that have to agree on the name): > > http://core.tcl.tk/tclquadcode/artifact?ln=989&name=cccd13aea88e9f5e > > That'll work as long as the type matching is done accurately between the > instantiations and the calls. OTOH, the thunk generation doesn't care; it > gets told which function value to bind to a Tcl command, instead of looking > it up by name (and will always want to produce a command that takes > otherwise-unrestricted STRING arguments). > > Hmm, perhaps we ought to factor out the “make a typed name” code so that > it can be shared… > That's what I was hoping you'd say. As a favour, I'd ask if you could possibly have a look if you can get to it before me. I'm still wallowing in the code-rewriting stuff. Now that I understand Conventional SSA and where its restrictions apply, it's going a lot better, but there are still a ton of bugs in the rework that have to be rooted out. At this point, though, they really feel like 'bugs' and not 'I've misunderstood the algorithms', which is where I was at for the last month or two. The last couple, I've said to myself, "oh ***, these symptoms are just like my earlier major misunderstanding," followed by finding a stupid typo and not a horrible problem with the logic. There's a test case, 'bug-7c599d4029' in demo.tcl that's commented out of 'toCompile,' so we can learn whether the fix works. |
From: Donal K. F. <don...@ma...> - 2017-04-19 10:56:51
|
On 18/04/2017 21:56, Kevin Kenny wrote: > So, is there a way that we can make the Naming of Names not give rise > to a lot of conflict? The simplest technique would be to make the function name use the numeric typecode instead of the high-level implementation name, which will involve changing IssueInvoke as well as generateDeclaration (they're the things that have to agree on the name): http://core.tcl.tk/tclquadcode/artifact?ln=989&name=cccd13aea88e9f5e That'll work as long as the type matching is done accurately between the instantiations and the calls. OTOH, the thunk generation doesn't care; it gets told which function value to bind to a Tcl command, instead of looking it up by name (and will always want to produce a command that takes otherwise-unrestricted STRING arguments). Hmm, perhaps we ought to factor out the “make a typed name” code so that it can be shared… Donal. |
From: Kevin K. <kev...@gm...> - 2017-04-18 20:56:24
|
In the last week or so, I've been trying to do some work in getting node splitting better integrated with the interprocedural type analysis. Bug http://core.tcl.tk/tclquadcode/tktview?name=7c599d4029 has proven to be an extremely tough nut to crack. One thing that I now realize was making everything appear cursed is that I didn't entirely understand the requirements for incremental SSA update using the algorithms in the Choi-Sarkar-Schonberg paper (http://www.research.ibm.com/people/v/vsarkar/ps/ChSS96.ps). They don't discuss the fact, but their analysis requires a stronger condition than SSA: it needs what Sreedhar, Ju, Gillies and Santhanam (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.107.7249) call 'Conventional' SSA. It's for a different reason than the requirement of the Sreedhar paper. Operations such as copy propagation, constant folding and code motion break conventional SSA. My misunderstanding had led me to introduce some pretty questionable hacks in the SSA manipulation, and I'm working on getting those backed out now that I grasp what's going on. It appears to me that we can delay or constrain the non-Conventional SSA optimizations until everything that needs Conventional SSA has had a chance to run. I've got copy propagation limited to transformations that preserve Conventional SSA, and I'm working on tracking down where the insertion of narrowing operations is going astray. I'll get there on that. No problem there, but it has me still leaving the bug open for the moment. I don't really need help with this part - I just need more time to work through the optimization stack and get everything stable again. I have auditing in place for at least part of the Conventional SSA property, which will bring me up short if any of the nanopasses breaks it - and that's how I'm finding some of these issues; we didn't yet have test cases that tickled the bugs. But now on to my question, that I do need some help with. There has been a bit of an impedance mismatch right along between the quadcode front end's understanding of a data type: "a member of a set of strings", and the back end's understanding "requiring a particular machine representation at run time." I got far enough into correcting the interprocedural analysis that now I have a new place where I'm tripping over it - and I'm not sure what to do about it. Consider: namespace eval ::bug-7c599d4029 { proc classify x { if {[string is integer -strict $x]} { return [list I $x] } else { return [list S $x] } } proc bug {x} { if {[string is integer -strict $x]} { set y 1 } else { set y 0 } list $y [classify $x] } } With the newly-repaired interprocedural type analysis, the analysis of this code has become Too Clever By Half. It reduces the 'bug' procedure to something like: proc bug {x} { if {[string is integer -strict $x]} { return [list 1 [classify(INT) $x]] } else { return [list 0 [classify(~INT) $x]] } } And the called procedures are: proc classify(INT) x { # INT->STRING return [list I $x] } proc classify(~INT) x { # INT->STRING return [list S $x] } For the Tcl thunk, we still need the original: proc classify(STRING) x { # STRING->FAIL STRING if {[string is integer -strict $x]} { return [list I $x] } else { return [list S $x] } } In short, it's managed to hoist a redundant type check (and shimmer) out of the called procedure because the information is known at the call site. The problem is that ~INT and STRING are represented identically as LLVM data types (they're both Tcl_Obj*), so 'generateDeclaration' (http://core.tcl.tk/tclquadcode/artifact?ln=168-209&name=cccd13aea88e9f5e) generates the same name for both of them. Since they require different code bodies, this is obviously not going to work. In any case, it trips and errors out with duplicate ::bug-7c599d4029::classify STRING I really don't want to give up this sort of analysis, because the performance impact could be pretty substantial. (Moreover, I'd struggle with recognizing the case and de-specializing the call site!) I think that in this particular case, I'd even be able, with more development, to inline the 'classify' operation and eliminate the cost of procedure activation as well. So, is there a way that we can make the Naming of Names not give rise to a lot of conflict? Note that the thunk generator doesn't need to be aware of any of this. It's always going to be working with arguments of type STRING and results of FAIL STRING, at least unless we decide that we need pass-by-name information at that level. If we can get the naming sorted out, both here and in 'invoke', I think that this should Just Work. It's always going to be in a call out of compiled code into compiled code. Ideas? -- Kevin |
From: Kevin K. <kev...@gm...> - 2017-03-26 02:37:31
|
On Sat, Mar 25, 2017 at 11:31 AM, Donal K. Fellows <don...@ma...> wrote: > > On 19/03/2017 08:40, Donal K. Fellows wrote: >> >> Just a quick note to say that I've made some progress with the >> implementation of callframes. Specifically, commit 250e218e6a managed to >> run the three tests with “callframe” in the name without crashing. These >> included saving variables out to the callframe and restoring them, and >> so on. It's not finished yet (an [info level] in the callee will crash >> horribly and there's probably a bunch of other problems too) but it >> should be able to do useful things soon. > > > Things are progressing, but not smoothly. > > I've moved to now running with the full suite of procedures in demo.tcl, > and I'm now hitting a problem that appears to be caused by wrong input > to the code issuer. The problem is that there's quite a bit of code that > is generating 'invoke' opcodes whose result type is plain CALLFRAME. The > problem is particularly acute (i.e., causing full failures) when > compiling ::flightawarebench::latlongs_to_distance; it's relying on > ::flightawarebench::degrees_radians, which has three defined type > specialisations (that I know of), being effectively STRING->FAIL DOUBLE, > DOUBLE->DOUBLE and INT->DOUBLE, but the higher-level code is looking for > IMPURE NUMERIC->NOTHING, and that's just going wonky as the NOTHING then > spoils later code that calls the cos() builtin, which doesn't support > that input type at all. (I am right in saying that the NOTHING type > indicates that there are no possible values of that type?) > > I attach the full output from my testing run (compressed), generated with: > > bash$ (tclsh8.6 demo.tcl -debug 1 -iterations 2 2>&1) | gzip >out.txt.gz Ugh. This is nasty. It's an argument between the two most accursed modules in the quadcode system: the specializer, and the loop unroller/node splitter. The trivial example that tickles the bug - exhibiting different symptoms, but the same underlying problem - is namespace eval ::splitbug { proc I x {return $x} proc bug {x} { if {[string is integer -strict $x]} { set y 1 } else { set y 0 } list $y [I $x] } } What's happening is that the splitter is dividing the [list $y [I $x]] into two pathways - one for when $x is INTEGER IMPURE and the other for when it's a non-INTEGER string. But the specializer has already committed to that particular invocation [I $x] being STRING->(FAIL) STRING, and has not instantiated, nor done type analysis, for the IMPURE INTEGER->(FAIL) IMPURE INTEGER case. NOTHING is the type that's temporarily returned during type analysis when a proc has not yet been analyzed, but at this point, it's too late. I need to rethink the order of operations. Getting enough time to dive into things at that level is challenging at the moment. I'll get on this, but it will definitely not be tonight. |
From: Donal K. F. <don...@ma...> - 2017-03-25 15:31:17
|
On 19/03/2017 08:40, Donal K. Fellows wrote: > Just a quick note to say that I've made some progress with the > implementation of callframes. Specifically, commit 250e218e6a managed to > run the three tests with “callframe” in the name without crashing. These > included saving variables out to the callframe and restoring them, and > so on. It's not finished yet (an [info level] in the callee will crash > horribly and there's probably a bunch of other problems too) but it > should be able to do useful things soon. Things are progressing, but not smoothly. I've moved to now running with the full suite of procedures in demo.tcl, and I'm now hitting a problem that appears to be caused by wrong input to the code issuer. The problem is that there's quite a bit of code that is generating 'invoke' opcodes whose result type is plain CALLFRAME. The problem is particularly acute (i.e., causing full failures) when compiling ::flightawarebench::latlongs_to_distance; it's relying on ::flightawarebench::degrees_radians, which has three defined type specialisations (that I know of), being effectively STRING->FAIL DOUBLE, DOUBLE->DOUBLE and INT->DOUBLE, but the higher-level code is looking for IMPURE NUMERIC->NOTHING, and that's just going wonky as the NOTHING then spoils later code that calls the cos() builtin, which doesn't support that input type at all. (I am right in saying that the NOTHING type indicates that there are no possible values of that type?) I attach the full output from my testing run (compressed), generated with: bash$ (tclsh8.6 demo.tcl -debug 1 -iterations 2 2>&1) | gzip >out.txt.gz Donal. |
From: Donal K. F. <don...@ma...> - 2017-03-19 08:41:08
|
Just a quick note to say that I've made some progress with the implementation of callframes. Specifically, commit 250e218e6a managed to run the three tests with “callframe” in the name without crashing. These included saving variables out to the callframe and restoring them, and so on. It's not finished yet (an [info level] in the callee will crash horribly and there's probably a bunch of other problems too) but it should be able to do useful things soon. Unfortunately, I've yet to make the code work with the merge from the main callframe branch (and things crash when I try to run some debugging I've got) so there's problems I've yet to sort out. And then I (or maybe “we”) need to fix arrays. The hack with pretending that they are dictionaries won't work any more if they've got to be accessible from the outside world. Once that's done, we should be able to try some fairly stern tests like compiling the [clock] innards. Donal. |
From: Kevin K. <kev...@gm...> - 2017-03-14 04:07:54
|
I noticed in the copy propagation rewrite that I did over the weekend that it was still overconservative, and that came from the fact that different instances of a value called something like {temp 2} were always presumed to interfere. While this is true for source variables, the temporaries can be treated more cavalierly. Only temps that are unified either by appearing as arg and result of a 'phi' need to be unified. I implemented a nanopass that does renaming accordingly. The nice thing about it is that it can let the struct::disjointset module in tcllib do all the heavy lifting. It simply needs to: (1) run through the program enumerating the temporaries (which all appear as the result of quads). (2) run through the program again, unifying the result of any phi that yields a temp with any temp that appears as its argument. (3) renumber the temporaries according to which disjoint partition they fall in after the unifications at (2) (4) rewrite all the quads with the new numbers for temporaries. This yields considerably tidier code for 'catch', which was rather a mess, reflecting the moving about of temporaries from one stack location to another, and manages to fold away a few more copies elsewhere. I also found a small bug (one-character typo) in the check for killing assignments in the block containing the last use of a copied temporary, that caused a local violation of Conventional SSA. This wouldn't have broken any of the algorithms, but it still wasn't good, so I fixed it. There's one other copy optimization that I really need to chase, and that's a peephole optimization. Something like set x [expr {$a + $b}] will generate code like add {temp 123 0} {var a 0} {var b 0} copy {var x 1} {temp 123 0} after copy propagation has run. The way to detect this is: (a) a copy from a temp (b) the temp immediately precedes the copy (c) the definition of the temp is not a phi If all of these conditions hold, then we want to fuse the copy with the assignment, and eliminate the temp, producing add {var x 1} {var a 0} {var b 0} This change does not destroy any assignment to a variable (and therefore can't mess up aliasing), and can't cause a violation of Conventional SSA. Generalizing the test beyond this is kind of perilous. We do not want to move the assignment to a variable out of sequence with assignments to or uses of any other variables until we've accounted for possible aliasing effects. The benefit of generalizing this optimization is pretty questionable, anyway, since this copy seems to originate only from copying the command result off the stack to a variable, immediately after computing it. |
From: Kevin K. <kev...@gm...> - 2017-03-13 04:48:05
|
I realize that it's been a couple of weeks since my last message, and I've not brought you properly up to date on progress. I've not got all that far into the "next steps" of my previous message, largely because I became aware of some refactoring that needed badly to be done. Also, because the callframe handling has run far enough ahead of the implementation that I'm coding rather speculatively. I'd jump in on the code issuer, I suppose, except that I think I'd just be in the way! In any case, what I have done on callframes is: * Fixed a bug where 'invoke' had STRING as its return type if no definition of the command could be found. It should, of course, be CALLFRAME FAIL STRING. * Added missing [dict keys] and [dict values] from the table of supported Tcl builtins. * Added the needed special cases for [lsort], [regexp] and [regsub] so that they can have their needed callframe access. This is in a new file, quadcode/builtin_specials.tcl We need at some point to document how this works, because any C command with complex syntax and a need for callframe access will need to augment the compiler with a parser to calculate what it does with the frame. * Actually moved the procedure name into the 'translator' object. I've wanted it for error reporting and debug printouts for some time, and now it's there. Then I started looking at alias analysis - and realized that it requires a specific restricted form of SSA that the authors of http://ssabook.gforge.inria.fr/latest/book.pdf call 'Conventional SSA'. In Conventional SSA, only one instance of any value is live at any time. If a value appears at the right-hand side of a phi, or any other value derived from the same base name appears on the left-hand side of any quad, the live range of the value ends right there. The aggressive copy propagation that was done in SSA construction made a mess - it does not generate Conventional SSA. The first step on the road to Conventional SSA was to remove that copy propagation entirely. Instead, we now wait to propagate copies until after SSA form is constructed. Cytron's algorithm, even with the modifications from Cooper-Harvey-Kennedy (2005) and Sreedhar-Gao (1995), produces Conventional SSA. That, of course, broke everything, because 'invoke' was now seeing a temporary instead of a literal command name, and could not analyze the code correctly. So now came the job of doing copy propagation more carefully, in ensure that Conventional SSA was preserved through all the rewriting transformations. That was the reason for peeling off the 'kbk-refactor-copyprop' branch and developing a new copy propagation pass, to be added to the 'cleanup optimizations' that are done after major program transformations. This change turns out to have some side benefits. First, no move to a named variable in the source code is deleted. This means that even in the presence of aliasing, we should have all the spots where aliased variables change, and be able to insert whatever machinations are needed to force values back into the callframes or namespaces where the variables live. This should also give us more reliable debug information, because we should no longer have the problem that a source variable can disappear from the quadcode entirely if temporaries are aliased with all its uses. Second, I'd been growing quite frustrated with 'typeunroll' being as fragile as it appeared. It turns out that the algorithms from Choi-Sarkar-Schonberg (1996) that I'd been using to update the SSA form require Conventional SSA! I'd been putting in various sorts of patches to try to identify the correct places for phi-insertion and for termination of the algorithm, and while I had it working for all our demo cases so far, I've found that apparently unrelated changes in other passes were breaking it. I expect things to be considerably more robust now that Conventional SSA is assured when the node splitting (and for that matter, narrowing insertion) runs. I'm going to want to look through things and take the kludges out, but that can probably wait for a little while, at least. All the demos run again on the trunk. Merging all of this into the 'callframe' branch revealed another bug: the 'widen' pass was rewriting 'return' operations and removing the callframe reference. That's now fixed as well, on the tip of that branch. I've closed the 'kbk-impure' branch for now. I still want to get back to IMPURE INT BOOLEAN (the data type that 'jumpTrue', 'jumpFalse', '&&', '||' and '!' require), but when I do that, I'll split a new branch to chase that issue. I'm holding off on [clock], [interp] and [encoding] for the moment. I think the right place to tackle them first is in the Tcl core: why are these ensembles not compiled? I'll be getting back to eliminating callframes altogether in contexts where they are not required, hopefully once there's more of an implementation in place, so that I can actually test rather than trying simply to desk-check the generated code. Now, I've another set of questions. [upvar], [namespace upvar], [global], [variable], etc. all create aliases: variables for which TclVarIsLink tests true. I'm not quite certain of what the semantics are supposed to be if a variable that's already a link has another one of these commands applied to it. It appears to be legal, for the most part. ([global] checks, but it appears that not all the commands do). Is it supposed to operate on the ultimate link target, making it a further link, or is it supposed to manipulate the nearest link, simply causing it to point to a different variable? In either case, am I correct that there is no code path that will cause a variable to become local again once it's acquired a link? Obviously, I can get one set of answers by source-diving, but this is an area where the ice is sufficiently thin that I'm not confident that the answers I get will actually reflect any sort of design intent. I don't believe that I've ever coded such a thing deliberately, and I strongly suspect that anyone who does may well simply be tickling long-buried bugs. |
From: Donal K. F. <don...@ma...> - 2017-03-03 06:32:28
|
On 28/02/2017 04:48, Kevin Kenny wrote: > I just committed [72d2924bfd] onto the 'callframe' branch. Be aware that the following is mostly stream-of-consciousness stuff. Mere coherence is not guaranteed! Reification of callframes. ------------------------- Callframes have one mandatory entity, a CallFrame as defined in tclInt.h (I've already got that declaration mapped into the llvm side). That has to be wired up in the same way as with a standard procedure for the rest of Tcl to be able to understand it. It's lifespan can be, for now, that of the C stack frame; the 'alloc' method of the Builder class does exactly what we need there. The callframe doesn't have to be explicitly mapped into the CALLFRAME type, but we need a concrete type there anyway because of the way that the compiler core works. The actual value can be 'undef'. The individual variables *that we care about* can be in the LVT, but I'd like to not give them names in there, and instead have the named variables all reside in the variable hashtable in the frame instead. This is because this makes everything significantly easier to handle in the interaction with the core of Tcl; the metadata required in those cases is much less complex. Using links will mean that the Tcl_Var entries in the hash table do not get destroyed until we delete the call frame itself. It's possible that code we call will [upvar] variables into us. If they don't correspond to known names, we ignore them! We'll be able to keep our own structure of mapping in the compiler and access the variables directly. Since traces on local variables are explicitly in the not-supported list, that'll let us be quick. (I'm terrified of what happens with local variables that are links to the outside world.) The fundamental type of Tcl variables is NEXIST STRING. The non-existent state is like when they have a NULL in. Everything gets much more complicated when we tackle [yield] and [yieldto]. At that point, it's my hunch we need to make CALLFRAME be real and allocate it explicitly. We'll also need to save out the set of other active values at that point so that's hardly surprising. But I think we can do something useful without tackling that; it's a feature required for 1.0, but not for 0.1. :-) In fact, I think even the cheapest of cheap hacks in callframes might be enough for us to try going for 0.1, since we'll then be able to run a majority of Tcl code (though sometimes poorly) and that's worth making available. As ahead-of-time compilation. Donal. |
From: Kevin K. <kev...@gm...> - 2017-02-28 04:49:03
|
I just committed [72d2924bfd] onto the 'callframe' branch. This change now actually begins to track the callframe variables that might be consumed in and produced by command invocations. There's a new file, 'quadcode/builtins.txt' that describes, for most of the Tcl builtin commands, the following command attributes: pure: If a command is pure, it is 'purely functional' can be depended to produce the same result if called with the same arguments, and is free of side effects. killable: If a command is killable, and the result of a call to it is unused, it may be removed from the program without ill effect. Any pure command is also killable. reads: A list of positions on the command line for arguments that give variable names of variables that the command will read from the callframe. A position of 0 indicates that all variables are potentially read. A negative position indicates that from the given index to the end of the command are all names of variables that are read. A positive position gives a specific command line index. writes: A list of positions on the command line for arguments that give variable names of variables that the command will write into the callframe. A position of 0 indicates that all variables are potentially changed. A negative position indicates that from the given index to the end of the command are all names of variables that are written. A positive position gives a specific command line index. A program, 'parseBuiltinsTxt.tcl', parses 'builtins.txt' and writes the result to 'builtins.tcl' in a more convenient form for loading into the compiler. Builtin commands that neither read nor write anything are presumed not to manipulate the callframe at all. This will have to be revisited as we implement more stuff. I'll most likely keep maintaining 'builtins.tcl' and let 'builtins.txt' fall by the wayside. Quadcode-compiled commands are presumed not to use the caller's callframe. This will change when we implement [upvar], of course! We'll also need to begin tracking aliasing as we move into [variable] and [global]. That's a problem for another day. Because of this change, [join] is no longer a good example for an external command that uses the callframe. I added an example in 'demos.tcl' that uses [scan] instead - and appears correctly to track the output variables. Next steps will be: (1) The [clock], [encoding], [interp], [lsort], [regexp] and [regsub] commands all need special handling. The first three need it because those three ensembles are not compiled, so the quadcode engine will have to dispatch them by subcommand. [lsort] needs to have special handling only to abort the compilation if the '-command' option appears (at least until and unless we can ascertain that the given command cannot affect the local callframe). 'regexp' and 'regsub' need to parse the command line to determine what arguments are output variables and set the callframe usage accordingly. (Regexp and regsub that do not work with variables can have callframe usage elided.) (2) Commands that do not use the callframe should have the {temp callframe} variable removed from the 'invoke' quadcode to tell the code issuer not to bother with callframe manipulation when calling them. (3) If all [return] and [returnOptions] operations use {temp callframe 0}, then the current procedure can have its callframe suppressed - by removing the callframe reference from the 'entry', 'return' and 'returnOptions' quadcodes throughout. These changes will mean that all of the demos so far, with the exception of callframe::test2, should continue not to create callframes (and hence not incur the overhead of doing so). After that, it will be time to start working on the implications of [upvar], [namespace upvar], [variable] and [global]. I plan initially to consider only [upvar 1] and [upvar #0], and only [namespace upvar] where the namespace path is a constant string. For all of these, I consider only cases where the variable name is either constant, or else is the value of a variable that comes in from the command arguments, and is constant in the caller's scope. That should cover the vast majority of sane uses of these commands! (Some subset of the insane uses might be covered by recording that a given name might alias anything.) Background reading for how this might work: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.6974 http://ssabook.gforge.inria.fr/latest/book.pdf -- chapter 15 https://plg.uwaterloo.ca/~olhotak/pubs/ismm09.pdf I've not actually finished reading any of these, but the abstracts all look relevant to the problem at hand. |