You can subscribe to this list here.
2017 |
Jan
|
Feb
(2) |
Mar
(6) |
Apr
(4) |
May
(20) |
Jun
(15) |
Jul
(4) |
Aug
(2) |
Sep
(6) |
Oct
(6) |
Nov
(20) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2018 |
Jan
(16) |
Feb
(3) |
Mar
(7) |
Apr
(40) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
(1) |
2019 |
Jan
(7) |
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Donal K. F. <don...@ma...> - 2017-11-04 12:48:45
|
On 03/11/2017 12:02, Kevin Kenny wrote: > Thanks for 'verifyList'. Expect a LIST data type. I can have a go at the > panoply of 'checkDomain', 'throwIfNotList', 'narrowToType $LIST', > 'throwNonList', 'widenTo $LIST', etc. The numerics are all subtypes of > LIST, so presumably we can work out that bit in the existing auto-widen > mechanism? Feel free to change the name to be something more consistent. Also, I had fretted over whether to roll the functionality directly into 'expand'. Perhaps there was no need for me to worry, but I'd be quite happy with doing that either way. :-) The bigger win would be if we could have unboxed pure LISTs, as they'd be much easier for the optimizer to get smart with. I'm not going to worry about them for now, but that would definitely help for similar reasons to why unboxing numbers was a good thing. We also ought to look out for opportunities to have LISTs of INTs, DOUBLEs or NUMERICs; they'll let us gain even more (and would be tending towards support for VecTcl, which would then just need some extra metadata on top to describe the shape of the tensor). DICTs would be another obvious type, but the opportunities for unboxing there are very limited as the internal complexity means that we'll be sticking with true STRING keys and values. (For now. :-)) > Are you making a strong suggestion that we should tackle LIST as a > data type first, or is duck-taping 'invokeExpanded' together for now > Good Enough? I'm happy to put expansion on hold if that's The Right > Thing from your perspective. Duck taping things for now would be fine. Yes, we want LIST but I'm mindful that getting something partial into an alpha for this year would be a better win for people other than ourselves. The LIST type (and the DICT type too) might make a nice thing for the second alpha. OTOH, a lot of the LIST stuff would actually be effectively a reworking of what I already have to remove error checking. That's pretty easy, especially as I'll be able to use simple pointer comparison to decide if there's value-level aliasing going on. > I have in my notes that 'invokeExpanded' should be reserved for the > pessimistic case. It's MUCH easier to do the analysis on my side than > yours, since I have more information than you do, and more machinery > to analyze data flows and so on. I won't do it at the very first go, > but expect 'invoke if we can, invokeExpanded only if we must'. Sounds great. I'll focus on trying to make invokeExpanded work this weekend, and assume that if it is being issued, it is because it is needed. Donal. |
From: Kevin K. <kev...@gm...> - 2017-11-03 12:02:33
|
On 11/03/2017 02:32 AM, Donal K. Fellows wrote: > On 02/11/2017 17:13, Kevin Kenny wrote: >> I've got as far as generating the code. I still need to handle >> callframe effects and upvar, so that the 'invokeExpanded' will get the >> right frames and the right sequence of 'moveToCallFrame' and >> 'moveFromCallFrame' surrounding it. I also am not doing the type >> analysis of the result of the 'invoke' correctly - which in turn is >> causing me not to trigger compilation of the invoked procedure. > > I've been looking at basic pessimistic code generation. Unfortunately, > the expansion rules are such that it is necessary to verify that we're > expanding a list at the point where the expansion happens. > > I've introduced a 'verifyList' opcode to do that (as it might be > useful elsewhere eventually?). I don't know whether we want to roll > that into the 'expand' opcode, and the type shuffle becomes a bit > complex because pure NUMERICS can be shown to not need expansion. > Properly, we could consider having LIST (and DICT) types to make this > all cleaner, but I'll let you worry about that for now. ;-) > > For the most pessimistic code generation cases, a full invoke through > Tcl is required (via Tcl_EvalObjv) and that requires a correct stack > frame. I'll figure out the code to do the invoke itself; it's a little > tricky because of the need to get compact types, but not crazily so. > > Cases where we can detect that there's significantly more optimal > things we can do without greatly depending on the length of the > expanded list are cases that Tcl ought to bytecode better. To date, > only [list] has been fixed that way, but [lappend] and [concat] would > be useful too. Thanks for 'verifyList'. Expect a LIST data type. I can have a go at the panoply of 'checkDomain', 'throwIfNotList', 'narrowToType $LIST', 'throwNonList', 'widenTo $LIST', etc. The numerics are all subtypes of LIST, so presumably we can work out that bit in the existing auto-widen mechanism? I haven't thought nearly enough about what theorems we might need to prove about list content, but I can certainly protect 'invokeExpanded' from coming in with a non-list. 'List of the wrong length' would also be nice to have, but I need to think more about what other theorems we need about list content. Are you making a strong suggestion that we should tackle LIST as a data type first, or is duck-taping 'invokeExpanded' together for now Good Enough? I'm happy to put expansion on hold if that's The Right Thing from your perspective. I have in my notes that 'invokeExpanded' should be reserved for the pessimistic case. It's MUCH easier to do the analysis on my side than yours, since I have more information than you do, and more machinery to analyze data flows and so on. I won't do it at the very first go, but expect 'invoke if we can, invokeExpanded only if we must'. I'm trying hard to make it easy for you, particularly because if the checks are on my side, I can optimize them away! |
From: Donal K. F. <don...@ma...> - 2017-11-03 06:32:48
|
On 02/11/2017 17:13, Kevin Kenny wrote: > I've got as far as generating the code. I still need to handle > callframe effects and upvar, so that the 'invokeExpanded' will get the > right frames and the right sequence of 'moveToCallFrame' and > 'moveFromCallFrame' surrounding it. I also am not doing the type > analysis of the result of the 'invoke' correctly - which in turn is > causing me not to trigger compilation of the invoked procedure. I've been looking at basic pessimistic code generation. Unfortunately, the expansion rules are such that it is necessary to verify that we're expanding a list at the point where the expansion happens. I've introduced a 'verifyList' opcode to do that (as it might be useful elsewhere eventually?). I don't know whether we want to roll that into the 'expand' opcode, and the type shuffle becomes a bit complex because pure NUMERICS can be shown to not need expansion. Properly, we could consider having LIST (and DICT) types to make this all cleaner, but I'll let you worry about that for now. ;-) For the most pessimistic code generation cases, a full invoke through Tcl is required (via Tcl_EvalObjv) and that requires a correct stack frame. I'll figure out the code to do the inoke itself; it's a little tricky because of the need to get compact types, but not crazily so. Cases where we can detect that there's significantly more optimal things we can do without greatly depending on the length of the expanded list are cases that Tcl ought to bytecode better. To date, only [list] has been fixed that way, but [lappend] and [concat] would be useful too. Donal. |
From: Kevin K. <kev...@gm...> - 2017-11-02 17:13:27
|
I managed to get a partial implementation of argument expansion onto a branch last night. I changed the notation for 'invokeExpanded' to a more compact form. The way it works is that a call like someproc $a {*}$b $c will turn into expand {temp 1} {var b} invokeExpanded callframeOut callframeIn {var a} {temp 1} {var c} The 'expand' quadcode does absolutely nothing other than keep the books about what args are expanded. If its input is a STRING, its output is an EXPANDED STRING. The internal representation for an EXPANDED FOO is the same as for a FOO. I expect that EXPANDED will only ever appear as the result of 'expand' and as the argument to 'invokeExpanded'. I've got as far as generating the code. I still need to handle callframe effects and upvar, so that the 'invokeExpanded' will get the right frames and the right sequence of 'moveToCallFrame' and 'moveFromCallFrame' surrounding it. I also am not doing the type analysis of the result of the 'invoke' correctly - which in turn is causing me not to trigger compilation of the invoked procedure. Once these issues are resolved, we should be able to throw together a minimal mechanism for handling {*}. But now I can see that there are a number of opportunities to improve things here. (1) If an object is known to be a singleton list, remove the 'expand'. I don't expect very many things like {*}[expr {$a+1}] in human-written code, but I do expect that procedure inlining, constant folding, and global variable numbering will introduce them. (2) Similarly, simply remove {EXPANDED EMPTY} args from the call. (3) If the thing we're invoking is compiled, then we know what sort of args it expects. We can at least restrict the types of the args that we have. If we have expand {temp w} {var q} invokeExpanded {temp x} {temp y} {literal p} {literal 1} {temp w} (representing the call [p 1 {*}$q]), I can certainly detect that we need to invoke p(INT ...), where ... is replaced with the correct number of STRINGs to match p's arg list. (Recall that $args is handled on the caller side in quadcode, so if it's present, it's an ordinary arg that takes a list - which today is stored in a STRING.) (4) If {*}$p as an argument is precisely contigous with the 'args' parameter, and is the only expansion, we can go one better and bypass the call thunk entirely - we've already got the list that has to go in 'args'. I think this will mean that 'invokeExpanded' will have to resort to the full-up Tcl_Eval* mechanism only where the called procedure is doing interesting things in the calling frame (including builtins like [scan]). So, I think that 'invokeExpanded' of a compiled command will consist of: (1) verifying that the number of args is correct by adding up list lengths. (2) constructing the named args, if necessary, by taking apart lists. (3) When doing compiled code constructing $args, if necessary, by concatenating the leftovers. (4) invoking the correctly-typed version of the target command (which, as I said, I think I can promise to have compiled). Donal's right that the worst case is "anything can happen" and "has to use Tcl_Eval*" - but there are significant cases that don't need such a big hammer - and that are nearly handled by the existing [invoke] mechanism. I might even be able to make the 'middle end' smart enough that the code issuer will never see an 'invokeExpanded' for a compiled proc, and have to deal with it only for builtins and (if we support them) callouts to interpreted code. Oh, yeah, and I still have to document the 'expand' and 'invokeExpanded' quadcodes, and implement INST_EXPAND_DROP (which I don't think will ever need to escape beyond translate.tcl - I just need to do the bookkeeping between stack levels and quadcode temporaries.) |
From: Donal K. F. <don...@ma...> - 2017-11-01 00:58:08
|
On 31/10/2017 15:50, Kevin Kenny wrote: > The friendly case: > > I've seen a lot of 'friendly' examples, though, where the leading word > of the command is indeed constant. In particular, things like > > lappend list {*}$newElements > > or > > someProc $something {*}$args > > are pretty common. > > I wouldn't imagine that the friendly case would be too rough to handle, > but I'm getting a little bogged down in how we would want to represent > such a thing for the 'invoke' quadcode: perhaps a new data type > 'expanded list' that would represent a series of arguments? The idea is that expansion builds a list of words. Unfortunately, the building of the list is done on the stack, so everything is a bit more complex: > I'm also kind of fuzzy on how INST_EXPAND_START, INST_EXPAND_STKTOP, > INST_EXPAND_DROP and INST_INVOKE_EXPANDED actually work. Expansion starts with EXPAND_START and finishes with either INVOKE_EXPANDED, a thrown exception (it's in the generic part of the exception handler) or EXPAND_DROP; the latter is only used when cleaning up some types of cases (notably [break] and [continue]). It works by using an auxiliary stack: START - pushes a new item on the auxiliary stack; effectively, this auxiliary item will build a list. The items are actually stored on the main stack until INVOKE, and the auxiliary stack instead actually stores a list of offsets relative to the depth that the expansion started at that will need expansion. STKTOP - adds items to the list being built on the auxiliary stack. The topmost item will be marked as one that needs expansion. Tricky point: the immediate argument is the current number of items for the expansion on the stack *ignoring* expansion, not the number since the last expansion. This is because the actual expansion isn't done until INVOKE. Or rather that's the model; the actual expansion *IS* done by this operation and what actually happens is that the stack depth being tracked in the stack is adjusted as elements are copied out of the list at the stack top and the operand is really ignored. UGLY! DROP - pops items until the stack depth of the corresponding START is achieved, and also pops and disposes one item off the auxiliary stack. INVOKE - uses the list of items on the top of the auxiliary stack to build the expanded list, model-wise, and then fires those off through the command invoke engine. Since in reality the expansion is already done, it is a pretty simple operation under the hood; the costs were paid by STKTOP. The aim is that, for example, with [a b {*}$c d {*}$e f g], the aux stack builds up this information: :- a b $c d $e f g ^^ ^^ But actually builds it up using a very simple two-element record, equivalent to: struct AuxRecord { struct AuxRecord *nextPtr; // Maintain stack unsignd int stackDepth; // Depth of stack at START }; By actually building the list of words on the stack. There's another field as well that the current implementation uses to handle more efficient allocation strategies. It's rather ugly because it is grafted into an environment which assumes it can know exactly how much memory to allocate for the stack at the start of the call rather than building piecemeal. It's also hacked on top of Tcl_Objs for memory management reasons. It's all very tricky. :-( I guess that anywhere we see expansion, we will need to have a callframe precisely because we're going to have to call to the Tcl standard command implementation. Cases where we could avoid it probably ought to be taught to the bytecode compiler to be smart with in the first place. (The problem there is that I don't think I ever figured out how to make a command that takes charge of expansion compilation — like [list] — retract the desire to generate code, and most commands have a sequence of words at the front that they don't want expanded if they are to be compileable at all.) > Every time I've attempted to make 'bytecode-stack-state' do the right thing > on those, I get bogged down, and that's a necessary prerequisite even > to making 'bytecode-to-quads' understand these opcodes. I'm going to > need to make a couple more runs at that particular hill in any case, since > I'm seeing that the 'md5' package is somehow triggering endless recursion > in there. Good luck. I recommend doing the slow generic case first this time, alas, but avoid using [list] for testing expansion; it does its own thing. Donal (I hope my explanation is coherent!) |
From: Kevin K. <kev...@gm...> - 2017-10-31 15:50:42
|
On Tue, Oct 31, 2017 at 6:42 AM, Donal K. Fellows <don...@ma...> wrote: > On 30/10/2017 21:41, Kevin Kenny wrote: >>The next feature that I >> think I need to talk to Donal about is {*} - I belated[ly] realise that we >> don't appear to have any compiler support for it. 'beginExpand' is not >> implemented. I'm hoping he has better insight than I do how to >> approach it. > > > That really depends on how smart we can get. (We currently handle expansion > for the [list] command only, because that's the only command that has an > expansion-handling strategy in its bytecode compiler.) Ultimately, if the > first word is known, we can determine if we know what to do or if we're in > the “it could be anything!” case. Alas, I fear that that latter case will be > disappointingly common. > > The simplest strategy would be to convert expansion into building a list of > the arguments — through ordinary appending or via multi-element appending — > and then to [eval] the result. The building would be easily done, but the > eval is a PITA. The friendly case: I've seen a lot of 'friendly' examples, though, where the leading word of the command is indeed constant. In particular, things like lappend list {*}$newElements or someProc $something {*}$args are pretty common. I wouldn't imagine that the friendly case would be too rough to handle, but I'm getting a little bogged down in how we would want to represent such a thing for the 'invoke' quadcode: perhaps a new data type 'expanded list' that would represent a series of arguments? I'm also kind of fuzzy on how INST_EXPAND_START, INST_EXPAND_STKTOP, INST_EXPAND_DROP and INST_INVOKE_EXPANDED actually work. Every time I've attempted to make 'bytecode-stack-state' do the right thing on those, I get bogged down, and that's a necessary prerequisite even to making 'bytecode-to-quads' understand these opcodes. I'm going to need to make a couple more runs at that particular hill in any case, since I'm seeing that the 'md5' package is somehow triggering endless recursion in there. Less-friendly cases: With respect to how smart we want to be for list contents, most questions about the content of values are things that I consider now to be of 'intermediate' difficulty. They don't construct new data flows, they don't do surgery on the control flow graph, they 'merely' track information through existing analyzed paths. That's pretty straightforward: any question like that comes down to: + What is the information that we want to track, and how do we want to represent it? + What is the rule for determining what information attaches to a literal? + What quadcode instructions produce values that we want to attach the information to, and what are the rules for what each one produces? + What happens when two values with the information attached meet at a phi? If "X is a list whose first element is drawn from among the constants {a b c}" is a useful property - which seems to be idea you raise above - I think we can answer all the questions above fairly readily, and then the analysis is quite a straightforward matter of iterating data flow equations. The pattern is well established already - it's used in type analysis, in constant propagation, in optimization of existence checks - a bunch of the micro-passes do just this sort of thing. I don't account it among the "hard" problems; the hard part for this sort of analysis is asking the right question. An additional observation: One reason I like using tcllib here is that I expect it to be representative of the sort of code that we'll see 'in the wild' - a good source of ideas for where we need to improve. The only thing about tcllib is that a lot of the code in it is 'too clever by half': I suspect that there's a lot of much simpler Tcl code in production. (That said, we'll also see a lot more nasties like double-dereference and unbraced expressions among code written by the less-sophisticated.) |
From: Donal K. F. <don...@ma...> - 2017-10-31 10:42:12
|
On 30/10/2017 21:41, Kevin Kenny wrote: > I'm going to take a few more runs at the hill of making sure that > trying to compile 'pki' compiles at least something, and doesn't crash > on what it can't compile. Probably the next bug to tackle is that > there's something in 'md5' that's causing endless recursion in the > stack analysis in the compiler front end. The next feature that I > think I need to talk to Donal about is {*} - I belated realise that we > don't appear to have any compiler support for it. 'beginExpand' is not > implemented. I'm hoping he has better insight than I do how to > approach it. That really depends on how smart we can get. (We currently handle expansion for the [list] command only, because that's the only command that has an expansion-handling strategy in its bytecode compiler.) Ultimately, if the first word is known, we can determine if we know what to do or if we're in the “it could be anything!” case. Alas, I fear that that latter case will be disappointingly common. The simplest strategy would be to convert expansion into building a list of the arguments — through ordinary appending or via multi-element appending — and then to [eval] the result. The building would be easily done, but the eval is a PITA. Also, those [expr] calls that you're complaining about probably ought to have been written: expr {"0x$value"} as that has far fewer hazards. (At the moment, I'm looking at handling [info level], which reads information out of the callframe, and which a number of tcllib packages use for more complex argument parsing.) Donal. |
From: Kevin K. <kev...@gm...> - 2017-10-30 21:41:08
|
Again, this is a delayed posting. Donal's announcement that quadcode is available as a 'pre-alpha' (we KNOW it doesn't work, we need users' help to triage what isn't working) was an exciting development at the conference in Houston a week and a half ago. Since returning, work has resumed at our usual fits-and-starts pace. I've been working on addressing problems that people have found with the quadcode engine. The biggest part of the work this week was devoted to improving error handling in the optimizer. Before, it would usually crash when asked to compile something that was not compilable. Now, it attempts to abandon the procedure in progress and go on to compile others. and in some cases even to find more errors in the current procedure. This also necessitated implementing an error reporting system for the optimizer, which is in quadcode/specializer.tcl, mostly. Having this at hand has allowed quadcode to begin reporting certain types of program warnings that, as far as I know, no other tool for static Tcl analysis has managed: - Known division by zero at compile time - Use of an uninitialized variable - Use of a value that is known to be of the wrong type. In some cases, this error can be reported even when the value is not constant. - Use of [upvar], [namespace upvar], [global] or [variable] to attempt to override an already-defined local variable, or a code path where one of these appears on one control flow path but not another. - Invocation of non-compilable code from within compiled code. In addition, a few minor language features have been implemented: - [info exists] is now implemented for array elements as well as scalars. - [namespace current] is now implemented. - [unset] is now implemented for array elements as well as scalars. - [format], [lrange], [lreplace] and [encoding] were added to the table of Tcl commands with known effects. I don't know how they were missed. At Roy Keene's suggestion, I've started looking at the 'pki' module of tcllib as a source of code to try to compile. (We're surely not handling much of it yet!) Even a cursory look at what's failing reviews a couple of things that may be low-hanging fruit, and suggests reopening a discussion that we abanoned a while back: what to do about evaluation of unknown code? The reason that this question arises is that I've noticed a fair amount of evaluation of 'unknown' code in 'pki'. One recurring instance that I've seen is the use of an unbraced [expr] to convert hexadecimal numbers: expr 0x$value This, of course, is completely unanalyzable. If $value turns out to be the string, {[exec rm -rf *]}, Tcl would happily evaluate that before discovering that '0x' followed by its return value is not numeric. Whoops. It's clear that the tcllib components ought to do [scan $value %x] instead, and I'll set about reporting and/or patching that problem separately. Nevertheless, it's only a symptom of the fact that users may well want to call uncompiled code - and therefore code about which the compiler can prove very few assertions - from compiled code. Previously, I had consoled myself with the belief that this sort of thing is rare in the performance-critical portions of a program, but what I've seen lately is convincing me that it's much commoner than I thought. Most of the problem arises if the called procedure might do something that would invalidate the compilation. Doing something like renaming [set] or establishing traces on variables that are in scope in the command code would be a typical example. It's well-nigh impossible to prove that invoked code doesn't do this sort of thing. There are several approaches to invoking unknown code. (1) Do it permissively. This is what Donal had implemented it early on. We simply trust the invoked code not to do anything that would invalidate the compilation, and let the compiled code continue executing as it stood before the change. (2) Forbid it. This is likely the right thing on the It Has To Work principle: if code is doing something that the compiler doesn't understand, then let it pay for the cost of the interpreter! (3) Do what the bytecode interpreter does: record the continuation at every invocation of uncompiled code. On return, check if recompilation is required, and JIT recompile the continuation if necessary. The third approach is totally compatible with existing Tcl, and is probably what we want to do eventually (with compiler warnings, since it would have really substantial performance implications). But it's pretty difficult to do, so won't be happening in short order. There's still therefore reason to discuss whether (1) or (2) would be the better short-term plan. That still raises the question about what to do with [expr 0x$value]. If it's a common case, we could try to peephole-optimize it (checking, of course, [string is xdigit -strict $value]). Otherwise, I'd say for that specific case that we should emit a warning (if we permit uncontrolled code) or an error (if we don't), and call it a day. I'm going to take a few more runs at the hill of making sure that trying to compile 'pki' compiles at least something, and doesn't crash on what it can't compile. Probably the next bug to tackle is that there's something in 'md5' that's causing endless recursion in the stack analysis in the compiler front end. The next feature that I think I need to talk to Donal about is {*} - I belated realise that we don't appear to have any compiler support for it. 'beginExpand' is not implemented. I'm hoping he has better insight than I do how to approach it. |
From: Donal K. F. <don...@ma...> - 2017-10-16 21:52:20
|
On 15/10/2017 23:41, Kevin Kenny wrote: > I think I need help to get over this particular hump - I've never > learnt how to attach a debugger to the LLVM-generated code to see > where it's going astray. In my experience, with the jitted code you need lldb in order to get anything sensible, as gdb gives up in a bit of a huff. HOWEVER, I've just managed to get ahead-of-time compilation of a Tcl package working, and that generates a much more conventional DLL which will support normal debugging much better. Donal. |
From: Donal K. F. <don...@ma...> - 2017-10-16 21:44:32
|
Hi I've just managed to get tclquadcode to compile a (very very simple) Tcl package to a DLL that I could then load into a bare interpreter. The code to do it is on the dkf-optimization-experiment branch, and is still rather too much in the Works On My Machine™ level of functioning, but it has worked once on a good day with a following wind in my sails. I ran the compilation by doing (i.e., that's got the code to bind the compilation driver to the package): tclsh8.6 pkgdemo.tcl and that printed out (after thinking for a while) the [load] call that I needed to use to load the package. I paste that into an absolutely plain tclsh and everything works and I can call the command. Putting it in a (very long) timing loop lets me see the calls happening in my system stack sampling utility; I can clearly see the function calls themselves. More work is needed (I'd like to avoid having to convert the bitcode to object files using an external program) but it's now work being done on a system that has shown that it can work. :-) :-) :-) Donal. |
From: Kevin K. <kev...@gm...> - 2017-10-16 04:41:15
|
This weekend, I had a few hours free and wanted to get started on support for [uplevel]. I didn't get very far with analyzing the problem (which involves code inlining, among other things), before I realized that in order to have even a shot at making the common cases go, I need to tackle constant folding first. (Otherwise, there are a lot of [uplevel 1 [list ...]] things that won't give me sufficient information to have a prayer of figuring out what's going on.) I started a prototype of constant folding on the 'constfold' branch. I found that in addition to doing the folding, I needed to do a few more things. (1) One of the test cases resulted in an attempt to iterate using [foreach] over a constant, singleton, numeric list. (like [foreach x 0 { ... }]). This caused a type error in the code issuer. I inexpertly patched 'unshareList' in build.tcl and compile.tcl to use standard type coercion to force the argument being unshared to STRING (so that there will be a Tcl_Obj* to work with). This seems to fix the problem, but I don't have a lot of confidence that what I did is right. I find a lot of that code pretty impenetrable. (2) I took the liberty of changing demosupport.tcl so that if the time taken by a case rounds to zero, we don't crash on a divide-by-zero. (3) Doing 'list' and 'dict exists' worked nicely. When I tried some other operations, I started hitting weird things in the code issuer. What appears to be going on is that 'listIndex' (the next thing I'm working on) returns FAIL STRING, but of course once I've done it at compile time, I've just a STRING (or even something more specific). With changes I made, the jumpMaybe and extractMaybe are getting taken out, but then I'm getting a segfault in upvartest0::check1 (I think!) What I suspect still remains, as far as possible causes of trouble, is that the new code has an instruction that looks like: 8: moveToCallFrame {temp 8 8} {temp 0 0} {literal n} {literal 0} {literal sum} {literal 0} This has two differences when we get into upvartest0::init. First, the variable 'sumsq' is now uninitialized, because the logic has actually detected that [scan] will write it before anything ever reads it. The other thing is simply that we've moved from a literal rather than a variable. I think I need help to get over this particular hump - I've never learnt how to attach a debugger to the LLVM-generated code to see where it's going astray. |
From: Donal K. F. <don...@ma...> - 2017-09-19 23:57:16
|
On 19/09/2017 18:14, Kevin Kenny wrote: > How can I help with that? If you look at the generated quadcode for the > *::check2 procs, you'll see that the compiler is already detecting that > 'init', 'accum', 'result' operate on the variables 'n', 'sum' and > 'sumsq', and is listing them in the 'entry' block. I went and > implemented that when I first saw how badly 'check2' was performing. > > I thought that listing them in the entry block was how they got to be in > the stack frame explicitly. Would it help if I explicitly moved > 'Nothing' to them in the post-entry? Those moves would be optimised away > again, except in the case where they aren't initialised before first > being used via 'invoke', but that's the case at hand. The problem is that I use the bytecode metadata about the procedure in the buildProcedureMetadata method in codegen/thunk.tcl. I guess I could make some assumptions about flags for unmentioned variables, assuming that they're not arguments, temporaries or resolved, and inject that sort of thing into the (decompiled) bytecode dictionary in IssueEntry in codegen/compile.tcl. Twiddly, but possible to fix once I'm awake enough. ;-) > Yes, indeed, I was aware of that - but I was intentionally trying to > test performance of [upvar]'ed variable access. It's disturbing that > it's actually slower by about 30% than the interpreted code, and not > clear to me why that is. Things are *incredibly* sensitive (and some older versions of Tcl — e.g., the one I have from ActiveTcl on this OSX machine — make things much worse), but the check2's slowness is because the variables are not allocated LVT slots. A full lookup by name is quite costly. We build linked variables so subsequent uses are reasonably speedy, but there's still a limit to what can be done. > Is the LVT not implemented at all, or are you detecting the variables by > some means other than looking at the entry block? I'm using the entry block *and* the originating bytecode, which has additional information I unfortunately need in order to generate the metadata, such as the contents of the flags field and whether there is a default; there never is for a non-argument. I need to tidy that up. > I anticipate that another thing that would help a procedure like this is > to implement tcl::pragma (which I still need to propose!). Adding > [tcl::pragma::noalias n sum sumsq] would guarantee (and optionally check > at runtime) that the three variables are actually distinct. That would > eliminate a lot of moveFromCallFrame. In the inner loop, every variable > is being refetched, because one of the other [upvar]ed variables might > alias it, so one of the other assignments might have spoilt its value. > > Of course, inlining would be the biggest possible win, but it isn't > always possible (think recursion!), so we still need to make sure that > these operations perform at least adequately. We could also do a noalias guard instruction that leads to an optimal version of the code, and into the current more pessimistic style otherwise. A pairwise alias check is fairly cheap as a post-upvar step; we don't have anything awful like overlapping C arrays to cause us headaches. I'm spitballing here. :-) Donal. |
From: Kevin K. <kev...@gm...> - 2017-09-19 17:14:27
|
On Sat, Sep 16, 2017 at 12:49 PM, Donal K. Fellows < don...@ma...> wrote: > I've been looking at the performance of [upvar] and I think there's a few > things to note. The first is that if we're going to generate variables that > aren't explicitly in the stack frame (the issue that the *::check2 tests > look at), we're going to pay a substantial cost unless we put the LVT > metadata in place. The issue is that the metadata allows Tcl to optimise > its accesses quite a bit, and we're reliant on Tcl code for quite a few > things for now. (Sampling the process while it is running the upvartest > cases indicates that a lot of the gross cost as it currently stands relates > to allocation and deallocation of Tcl_Obj values.) > How can I help with that? If you look at the generated quadcode for the *::check2 procs, you'll see that the compiler is already detecting that 'init', 'accum', 'result' operate on the variables 'n', 'sum' and 'sumsq', and is listing them in the 'entry' block. I went and implemented that when I first saw how badly 'check2' was performing. I thought that listing them in the entry block was how they got to be in the stack frame explicitly. Would it help if I explicitly moved 'Nothing' to them in the post-entry? Those moves would be optimised away again, except in the case where they aren't initialised before first being used via 'invoke', but that's the case at hand. > The second is that we get a small speedup with the current tip of the 8.6 > branch with the *::check1 tests, and we get a much better performance boost > if we avoid writing values back into the call frame just because the local > representatives have been modified. Avoiding that cost, as in the following > code, helps quite a bit: > > proc accum {args} { > upvar 1 n n_ sum sum_ sumsq sumsq_ > set n $n_; set sum $sum_; set sumsq $sumsq_ > foreach x $args { > incr n > set sum [expr {$sum + $x}] > set sumsq [expr {$sumsq + $x * $x}] > } > set n_ $n; set sum_ $sum; set sumsq_ $sumsq > return > } > > (Note the upvar'd variables have underscores.) > Yes, indeed, I was aware of that - but I was intentionally trying to test performance of [upvar]'ed variable access. It's disturbing that it's actually slower by about 30% than the interpreted code, and not clear to me why that is. The big wins will be those that allow inlining of code such as 'accum', > with passing the Tcl_Var references about being another reasonable approach > (though we'd need to bind the variables into the local variable table to > handle calling out to [scan] as in 'init'). An interim win would be for the > detected variables to be put in the LVT; that would at least remove the > cases that currently get slower. > Is the LVT not implemented at all, or are you detecting the variables by some means other than looking at the entry block? I anticipate that another thing that would help a procedure like this is to implement tcl::pragma (which I still need to propose!). Adding [tcl::pragma::noalias n sum sumsq] would guarantee (and optionally check at runtime) that the three variables are actually distinct. That would eliminate a lot of moveFromCallFrame. In the inner loop, every variable is being refetched, because one of the other [upvar]ed variables might alias it, so one of the other assignments might have spoilt its value. Of course, inlining would be the biggest possible win, but it isn't always possible (think recursion!), so we still need to make sure that these operations perform at least adequately. |
From: Donal K. F. <don...@ma...> - 2017-09-19 08:09:47
|
I've been looking at the performance of [upvar] and I think there's a few things to note. The first is that if we're going to generate variables that aren't explicitly in the stack frame (the issue that the *::check2 tests look at), we're going to pay a substantial cost unless we put the LVT metadata in place. The issue is that the metadata allows Tcl to optimise its accesses quite a bit, and we're reliant on Tcl code for quite a few things for now. (Sampling the process while it is running the upvartest cases indicates that a lot of the gross cost as it currently stands relates to allocation and deallocation of Tcl_Obj values.) The second is that we get a small speedup with the current tip of the 8.6 branch with the *::check1 tests, and we get a much better performance boost if we avoid writing values back into the call frame just because the local representatives have been modified. Avoiding that cost, as in the following code, helps quite a bit: proc accum {args} { upvar 1 n n_ sum sum_ sumsq sumsq_ set n $n_; set sum $sum_; set sumsq $sumsq_ foreach x $args { incr n set sum [expr {$sum + $x}] set sumsq [expr {$sumsq + $x * $x}] } set n_ $n; set sum_ $sum; set sumsq_ $sumsq return } (Note the upvar'd variables have underscores.) The big wins will be those that allow inlining of code such as 'accum', with passing the Tcl_Var references about being another reasonable approach (though we'd need to bind the variables into the local variable table to handle calling out to [scan] as in 'init'). An interim win would be for the detected variables to be put in the LVT; that would at least remove the cases that currently get slower. [Also, don't run with Tcl configured with --enable-symbols=all. Takes absolutely *forever* during the compilation phase!] Donal. |
From: Kevin K. <kev...@gm...> - 2017-09-17 23:29:40
|
I managed to make the 'dkf-direct-vars' branch not abort - there was a confusion about the return types of the Tcl API wrappers that implemented direct variable access. We need more checking there - in particular, we need to refuse to compile if the variable name in a direct variable access is unknown, lest it alias a callframe local. Partially qualified variable names are less of a problem, since at the present level of development, we generate code presuming that any global or namespace variable may alias any other (we don't know what code outside our control may have done with [upvar]). With these changes, references to $::variable will now work. This should be another leap forward in what we are able to compile. I'm starting to wonder how much of tcllib can be processed without running into problem areas like [eval], [uplevel], [trace], double-dereference, and non-compilable versions of the commands that we do implement. It might be a worthwhile experiment to attempt compilation of various modules and see how far we get - the nature of the failures would also help prioritize the next steps. I'm going to be away from this now for a little while, barring emergency requests, in order to prepare for the conference next month. If we can get llvmtcl prepped for release, then a release of quadcode should be doable. I've asked Roy Keene to investigate what it would take to have llvmtcl running in places that I haven't tried. I'm happy with how quickly things came together at the end. Handling 'variable', 'namespace upvar' and 'upvar' meant that direct variable access already had all the pieces in place, and what was needed was a fairly small amount of glue. Kevin |
From: Kevin K. <kev...@gm...> - 2017-09-15 04:00:25
|
Tonight's last commit has a few changes: (1) Global variables are refreshed to the callframe after catching an error. This is done so that if the proc contains 'global errorCode' or 'global errorInfo' it will get the correct value. There's a loophole that if a proc catches an error and does not rethrow it, the variable values may still be stale in an enclosing proc, but at the current level of development, anything else has an unacceptable performance impact. (2) A bug has been fixed where [entry] was often not removed because of a missing cleanup optimization. This restores performance to some of the demos. (3) Code has been added so that if a variable is created by being passed by name to a called procedure, it is detected and a callframe slot is created for it in advance. I hoped that this would fix the performance degradation of upvartest[01]::check2 with respect to check1, but no joy. Peculiarly, [check2] does considerbly less work, since [check1] is still doing excess data motion (I just realized this last item: I need to track down why the 'moveToCallFrame's aren't detecting that they are storing vars that were just loaded. A job for another day. Kevin |
From: Kevin K. <kev...@gm...> - 2017-09-14 21:28:18
|
As of last night <https://core.tcl.tk/tclquadcode/info/9bb47425610a813a>, the quadcode compiler should have a reasonably complete implementation of [global], [namespace upvar] and [upvar], except for [upvar 0] and [upvar #N] where N > 0. At present, the code as implemented is still inordinately expensive at runtime, chiefly because: (1) callframe creation/destruction is not yet optimized (2) similarly, looking up outer variable references is not yet optimized (3) alias analysis is extremely over-conservative. I do not plan to address any of these immediately, because I think that my time will be better spent finishing the implementation of direct variable access ($::path::to::name), investigating procedure inlining, so that the common cases of [upvar 1] can be simply replaced with direct access to the variables (which will then have type information, allowing callsite-specific optimizations), and trying to prepare for an alpha release for others to try. |
From: Kevin K. <kev...@gm...> - 2017-08-29 00:10:14
|
I realized yesterday as I was developing my (somewhat infelicitous) workaround for ticket https://core.tcl.tk/tclquadcode/tktview?name=dabd7e27c7 that there was an easy way to "get something working quickly" with respect to [upvar]: simply say that any procedure that contains an [upvar] requires all variables in the caller's callframe to be synchronized before and after invoking the procedure. Performance with this approach will be horrible, of course, and it will generate bad code for such oddities as [upvar 2] or [upvar #1], but it at least passes our initial tests for [upvar] handling. I committed this to the namespace-variables branch as https://core.tcl.tk/tclquadcode/info/0e3ab8a2bdc6ba66 With it in place, I can enable the [upvar] examples, and they all work, although some are now slower than interpreted code. This slowdown will be entirely fixable with better variable-effects analysis, which I'm working on. So, a couple of questions: (1) Should I merge this back into trunk? Of course I'll continue to work on making [upvar] analysis more precise. (2) (Specifically for dkf) What do you want to see in generated code for INST_LOAD_STK, INST_STORE_STK and the zoo of related instructions? These are what gets generated for things like [set path::to::var] and $path::to::var. We don't have any translation for them yet. I think we're at the point where we could do this; if the (first part of the) name is constant, then this reduces to getting/setting a named variable (namespace if qualified, local if not), and otherwise, it can be treated as a symbolic assignment that potentially aliases every other variable. (The latter is a pretty unusual case; it's usually the result of double dereferences like [set $var]. I don't mind killing performance for it, at least initially.) Let me know what you want the quads to look like, and I'll make it so. Kevin |
From: Kevin K. <kev...@gm...> - 2017-08-26 18:48:49
|
I just reported in ticket https://core.tcl.tk/tclquadcode/tktview?name=dabd7e27c7 that the 'optimise' phase is failing in recent versions of the code issuer. LLVM's complaint is that the code appears to be falling off the end of a basic block in tcl.{read,write,unset}.ptr.var. I think I can make progress by branching from before the problem, without introducing too many conflicts when we re-merge. Any ideas? |
From: Donal K. F. <don...@ma...> - 2017-07-03 08:19:56
|
On 02/07/2017 19:36, Kevin Kenny wrote: > I don't expect [upvar] to be working yet. I'm not yet generating > correct code for [upvar] into an outer callframe when one compiled > procedure calls another. That requires some bookkeeping that I > haven't finished designing. I wanted to have [variable], [global] > and [namespace upvar] working first, because I don't do well > with speculative coding. "Implement the next feature, doing > the simplest thing that could possibly work" is more the way > I attack it. (Sometimes the simplest thing is anything but > simple, because simpler things couldn't possibly work.) Well, it's all there now on code generation side for when you do tackle it, as it wasn't complicated to extend what I already had (I'd already split the binding operation up). The upvar quadcode is this: upvar $cffailbit $callframe $localName $levelDescriptor $otherName The basic type is expected to be: CALLFRAME FAIL BOOL <- CALLFRAME x STRING x STRING x STRING with usual type promotions, and the value of the boolean result is entirely unimportant, much as with 'nsupvar' and 'variable'. The semantics of the the level descriptor are exactly those supported by TclObjGetFrame; guess what is used to parse the value. ;-) The consequences are also exactly those of [upvar]. I could see extending the supported possible types with this to also allowing: CALLFRAME FAIL BOOL <- CALLFRAME x STRING x CALLFRAME x STRING but that would require code issuer changes. If we had a NAMESPACE type, representing a direct handle to a Tcl_Namespace*, I could see doing a similar thing with 'nsupvar', but that's even more speculative as it involves an entirely new type on the analysis side, and I'd rather have lists and dicts as (reference) types first, so allowing the discarding of quite a few repeated type checks (and maybe even admitting a more efficient implementation on my side). I'll land the branch back to trunk as it works (without new failures) on the full set of demos, but I'll leave the branch open as there's still work to do. Donal. |
From: Kevin K. <kev...@gm...> - 2017-07-02 18:36:57
|
On Jul 2, 2017 12:21 PM, "Donal K. Fellows" <donal.k.fellows@manchester. ac.uk> wrote: [regarding [namespace upvar], [global], [variable]: Branch works, but doesn't have 'upvar' fully implemented yet; the code required for it is absent in codegen/compile.tcl and codegen/stdlib.tcl. I don't expect [upvar] to be working yet. I'm not yet generating correct code for [upvar] into an outer callframe when one compiled procedure calls another. That requires some bookkeeping that I haven't finished designing. I wanted to have [variable], [global] and [namespace upvar] working first, because I don't do well with speculative coding. "Implement the next feature, doing the simplest thing that could possibly work" is more the way I attack it. (Sometimes the simplest thing is anything but simple, because simpler things couldn't possibly work.) Kevin |
From: Donal K. F. <don...@ma...> - 2017-07-02 16:20:58
|
On 01/07/2017 20:57, Kevin Kenny wrote: > I see that in http://core.tcl.tk/tclquadcode/info/2dd141efab9, Donal > changed the type to {CALLFRAME FAIL BOOL_INT} and I'm guessing that > he did so because EMPTY requires a Tcl_Obj* in the current code, > while BOOL_INT is simply a 1-bit value. I adjusted the result type of > 'variable' and 'nsupvar' accordingly. That was exactly it. By doing that, I get much cheaper handling of values whose formal value part is of no actual interest. Maybe we ought to revisit the type that we use in the implementation for EMPTY given that we know it is a fixed value, but it isn't too critical for now. > Everything appears to compile all right, but it appears that variable > bindings aren't actually happening at run time. The call to > [nsvartest::init] is successful, but the first [nsvartest::apply] > finds no value for $n at the first [incr n]. More peculiarly, it > throws an error for this case, even though there's an 'exists' check > guarding the access to $n. Yeah, the problem was that I wasn't actually following the links in the code for reading and writing. (D'oh! It also took me far too much of the week to realise that this was the problem.) It didn't matter before because I was always able to use pointers to the real storage variables, but now that we can point those off to other places, I have to do it right. Branch works, but doesn't have 'upvar' fully implemented yet; the code required for it is absent in codegen/compile.tcl and codegen/stdlib.tcl. Donal. |
From: Kevin K. <kev...@gm...> - 2017-07-01 19:57:58
|
Donal recently asked the question in a commit log message, (http://core.tcl.tk/tclquadcode/info/d817e69c7ac) 'can varname lookup fail?' It surely can if it's designating a nonexistent namespace, or if [upvar], [namespace upvar] or [global] is attempting to install a link in the callframe for a variable that already is defined locally. Accordingly, Donal and I decided in the chat that the variable resolution instructions in quadcode need to return {CALLFRAME FAIL EMPTY} rather than {CALLFRAME}, with the appropriate gubbins around them to check for and report errors. I see that in http://core.tcl.tk/tclquadcode/info/2dd141efab9, Donal changed the type to {CALLFRAME FAIL BOOL_INT} and I'm guessing that he did so because EMPTY requires a Tcl_Obj* in the current code, while BOOL_INT is simply a 1-bit value. I adjusted the result type of 'variable' and 'nsupvar' accordingly. I had to/was able to make the corresponding change in compile.tcl to have StoreResult put the correct value of the result into the LLVM intermediate code. Everything appears to compile all right, but it appears that variable bindings aren't actually happening at run time. The call to [nsvartest::init] is successful, but the first [nsvartest::apply] finds no value for $n at the first [incr n]. More peculiarly, it throws an error for this case, even though there's an 'exists' check guarding the access to $n. Anyway, since nothing on the 'namespace-variables' branch works yet, I've committed what I have. http://core.tcl.tk/tclquadcode/info/f58eccaca7 Kevin |
From: Kevin K. <kev...@gm...> - 2017-06-27 02:40:20
|
I think we'll be ready to begin the initial testing of variables brought into the callframe by means of [variable], [global] and [namespace upvar] as soon as the code issuer is prepared to deal with them. Accordingly, I've removed the 'kbk' prefix from the 'kbk-namespace-variables' branch to indicate that I consider it open for shared development. There are two new quadcode instructions that will be passed to the code issuer as part of this change: nsupvar callframeOut callframeIn localVarName namespaceName varName Asks to create a link in the callframe with 'localVarName' as the name of the local variable, 'namespaceName' as the name of the namespace where the link target resides and 'varName' as the name of the link target. 'namespaceName' may or may not be fully qualified. This instruction does NOT return the value of the linked variable. The front end will emit a 'moveFromCallFrame' at the appropriate time to retrieve the value. variable callframeOut callframeIn localVarName varName Asks to create a link in the callframe with 'localVarName' as the name of the local variable, and 'varName' as the name of the link target. 'varName' may or may not be fully qualified. As with 'nsupvar', the instruction does NOT return the value of the linked variable; 'moveFromCallFrame' will be emitted at the appropriate time. At present, the code being generated is horribly conservative, but I want to see how awful it is for various practical cases to get a feel for how to optimize it effectively. It's also probably horribly buggy, since I can't test without code generation. I'm going to need help with this one. Variable resolution frightens me a little. I'm still making no attempt to manage $path::to::var or $::path::to::var. Those will have to wait for another round of changes after this one settles. There's a little bit of code in there that purports to recognize [upvar], but that code is unquestionably NOT working. If we get to a point where we want to merge partial functionality to the trunk, we can just patch it out of translate.tcl. Kevin |
From: Donal K. F. <don...@ma...> - 2017-06-24 13:50:18
|
On 24/06/2017 14:40, Donal K. Fellows wrote: > The only test failures we've got at the moment are where we don't handle > some exception cases quite the same, and they're not actually related to > callframes. We've a few slowdowns too, but they don't look serious. Here's the analysis of the current issues, which I've broken out of the previous message precisely because I don't think these are really all that much to do with callframes. ;-) The failures are: -------- errortest2-caller pqr -------- Computed results differ: expected "1 pqrpqrpqr {-code 1 -level 0 -errorcode NONE -errorline 3}" but got "153289488 pqrpqrpqr {-code 158817696 -level 0 -errorcode NONE}" -------- cleanopt {errortest4 pqr} -------- Computed results differ: expected "1 {error occurred: pqrpqrpqr} {-code 1 -during {-code 1 -level 0 -errorstack {INNER {returnImm pqrpqrpqr {}} CALL {errortest4 pqr}} -errorcode NONE -errorinfo {pqrpqrpqr while executing "error $x$x$x"} -errorline 5} -errorcode NONE -errorline 1 -level 0}" but got "1 {error occurred: pqrpqrpqr} {-code 1 -during {-code 1 -level 0 -errorstack {INNER {invokeStk1 errortest2 pqr} UP 1} -errorcode NONE -errorinfo pqrpqrpqr -errorline 1} -errorcode NONE -errorline 1 -level 0}" -------- cleanopt {calltest2 1 5 ,} -------- Computed results differ: expected "1 {wrong # args: should be "join list ?joinString?"} {-code 1 -errorcode {TCL WRONGARGS} -errorline 1 -level 0}" but got "1 {wrong # args: should be "::join list ?joinString?"} {-code 1 -errorcode {TCL WRONGARGS} -errorline 1 -level 0}" The first is a failure to pass some parts of the exception through correctly (the actual error code and the error line); that'll take some tweakery on my side to fix, but shouldn't impact the analysis. The second is related to the first, and also mixes in the handling of error stack info (which I think we should explicitly not guarantee to match). This probably requires some alterations to the [cleanopt] helper to make it strip this extra stuff, but we might want to think what we want to say at that point instead. The third is because we're converting to fully-qualified command names inside the 'invoke' instruction. Now that we have a proper callframe, we can maybe stop that. It'd be awesome to add an API to Tcl 8.7 to let callers pass in a name resolution context explicitly when evaluating a command. Current slowdown cases are: -------- nstest::nstest5 -------- Acceleration -0.72% -------- comps 1 2 -------- Acceleration -13.18% -------- comps 2 1 -------- Acceleration -16.54% -------- comps 3 0x10 -------- Acceleration -18.44% Of these, I wouldn't expect [nstest5] to be significantly different in speed anyway (I'm not sure why it is slower, but it is) and [comps] is really a correctness test and is unlikely to benefit much from advanced analysis anyway. |