From: Andy G. <and...@gm...> - 2017-09-18 03:33:34
|
Please review the amg-string-insert branch. I have implemented bytecoding, written test cases, written documentation, and created a common backend function shared between both the bytecoded and non-bytecoded [string insert] and [string replace] commands. This backend function is publicly exported via the stubs interface and is also documented. No TIP is written yet, as I didn't have access to the tips repository until just a little while ago. First I'd like some review and comments because one of the TIP requirements is some level of community interest. It appears bytecoding has changed since the creation of stringComp.test. Previously, only proc bodies were bytecoded, so string.test tested the non-bytecode string commands and stringComp.test tested the bytecoded commands. Now, string.test and stringComp.test both test the bytecoded commands, and the non-bytecoded commands are only tested by accident if ever. For [string insert] and [string replace], I added tests for both bytecoded and non-bytecoded commands, though we may wish to go about testing the two flavors in a different way. I used gdb breakpoints to confirm I was exercising all my new and modified code paths, though I suppose if we really wanted to get fancy, we could employ gcov. This is how I first discovered string.test doesn't actually test the non-bytecoded commands. I also created an amg-string-insert branch in the [incr Tcl] repository to contain the necessary updates to its test suite. Otherwise adding a [string insert] command causes several of its tests to fail. Potential incompatibility: code abbreviating [string index] as [string in] will now fail due to ambiguity with [string insert]. The real reason I'm adding [string insert] is as a learning exercise. I find it very odd that [string insert] doesn't already exist, but we've lived without it this long, so it's not the biggest deal. Yet I see it as an opportunity to dig deeper into the process of adding new commands to Tcl. I would like to return to my [array] C API project which has languished since January or so, and I see [string insert] as a stepping stone toward that work. -- Andy Goth | <andrew.m.goth/at/gmail/dot/com> |
From: Donald G P. <don...@ni...> - 2017-09-18 14:03:33
|
On 09/17/2017 11:32 PM, Andy Goth wrote: > Please review the amg-string-insert branch. ... > The real reason I'm adding [string insert] is as a learning exercise. I'm happy you're going through this exercise, getting experience on making improvements and getting access to the various systems. This is going to be valuable. On the actual proposed [string insert] command, there is one clear alternative that I like better. Make an incompatible revision to the [string replace] command so that it stops refusing to replace empty strings. Make that change and you no longer need a [string insert]. And if you just like the name, you could build it on top of a [string replace] with proper capabilities. We approved exactly this as part of TIP 323 "Do Nothing Gracefully" but then after implementation, the incompatibility was found to actually break something somewhere out there. The change was reverted to be re-considered later. Some damn fool neglected to actually write down what package or app it was out there that demands [string replace] refuse to replace empty strings. It would be helpful to recover that information to know how best to proceed. > It appears bytecoding has changed since the creation of stringComp.test. I believe the reliable way to exercise direct evaluation pathways is to use the [testevalex] command. See for example how the [run] alias is redirected in basic.test. > I also created an amg-string-insert branch in the [incr Tcl] repository > to contain the necessary updates to its test suite. Otherwise adding a > [string insert] command causes several of its tests to fail. Thank you! for actually checking how a change influences bundled packages. That said, the better fix is to get the Itcl test suite out of the business of checking literal Tcl error messages. That's an encapsulation mistake. Don't make Itcl start demanding to see "insert" in the error message (it will start failing when built against Tcl 8.6), but make the test stop caring about parts of the error messages that are outside Itcl control and are not part of the functioning under test. -- | Don Porter Applied and Computational Mathematics Division | | don...@ni... Information Technology Laboratory | | http://math.nist.gov/~DPorter/ NIST | |______________________________________________________________________| |
From: Andy G. <and...@gm...> - 2017-09-18 15:25:18
|
On 09/18/17 09:01, Donald G Porter wrote: > On 09/17/2017 11:32 PM, Andy Goth wrote: >> Please review the amg-string-insert branch. > ... >> The real reason I'm adding [string insert] is as a learning exercise. > > I'm happy you're going through this exercise, getting experience > on making improvements and getting access to the various systems. > This is going to be valuable. I'm honored that you would say this! Thank you. Also I hope this experience is useful to others as a worked example. > On the actual proposed [string insert] command, there is one clear > alternative that I like better. Make an incompatible revision to > the [string replace] command so that it stops refusing to replace > empty strings. Make that change and you no longer need a [string > insert]. And if you just like the name, you could build it on top > of a [string replace] with proper capabilities. I agree, though I wish to keep [string insert] as well because it is clearer to use. The problem with using [string replace] to implement [string insert] is the way one must specify indexes: proc ::tcl::string::insert {string index insertString} { replace $string $index [expr {$index - 1}] $insertString } Heaven help you if you use end-relative indexes... STR_INSERT exists to bypass the empty substring replacement refusal logic in STR_REPLACE, so if we made your suggested change to [string replace], could we remove STR_INSERT? First answer this question: if STR_INSERT didn't exist, how would the [string insert] bytecode compiler direct STR_REPLACE to pass zero removeCount to Tcl_ReplaceObj()? The lack of a clear answer becomes another reason to keep STR_INSERT. > We approved exactly this as part of TIP 323 "Do Nothing Gracefully" > but then after implementation, the incompatibility was found to > actually break something somewhere out there. The change was reverted > to be re-considered later. I saw this documented in the TIP history. https://core.tcl.tk/tips/fdiff?v1=6e0ba0ee9838accc&v2=34809f1432fc528f&sbs=0 > Some damn fool neglected to actually write down what package or app it > was out there that demands [string replace] refuse to replace empty > strings. It would be helpful to recover that information to know how > best to proceed. Well hey, we just broke ground on a new version of Tcl, so why not? >> It appears bytecoding has changed since the creation of stringComp.test. > > I believe the reliable way to exercise direct evaluation pathways is to > use the [testevalex] command. See for example how the [run] alias is > redirected in basic.test. Oh cool! Would you like the same mechanism implemented in string.test with stringComp.test deleted? >> I also created an amg-string-insert branch in the [incr Tcl] >> repository to contain the necessary updates to its test suite. >> Otherwise adding a [string insert] command causes several of its tests >> to fail. > > Thank you! for actually checking how a change influences bundled > packages. That said, the better fix is to get the Itcl test suite out > of the business of checking literal Tcl error messages. That's an > encapsulation mistake. Don't make Itcl start demanding to see "insert" > in the error message (it will start failing when built against Tcl 8.6), > but make the test stop caring about parts of the error messages that > are outside Itcl control and are not part of the functioning under test. Yeah, I was wondering how the versions are intended to match up between Tcl core and extensions. You've answered it by saying they're not. -- Andy Goth | <andrew.m.goth/at/gmail/dot/com> |
From: Andy G. <and...@gm...> - 2017-09-18 18:00:08
|
On 09/18/17 10:24, Andy Goth wrote: > I agree, though I wish to keep [string insert] as well because it is > clearer to use. The problem with using [string replace] to implement > [string insert] is the way one must specify indexes: > > proc ::tcl::string::insert {string index insertString} { > replace $string $index [expr {$index - 1}] $insertString > } > > Heaven help you if you use end-relative indexes... It turns out only the start index for [string replace] is limited to be nonnegative. The second is allowed to go negative, so if negative, it is guaranteed to be less than the start index. The question is now whether the following always works: proc ::tcl::string::insert {string index insertString} { replace $string $index -1 $insertString } If so, then STR_INSERT can be removed, and [string insert] can emit a STR_REPLACE opcode with the toIdx always being -1. (ROT-13 spoiler: vg qbrfa'g jbex) Let's give it a try: --- generic/tclCompCmdsSZ.c +++ generic/tclCompCmdsSZ.c @@ -529,8 +529,9 @@ CompileWord(envPtr, valueTokenPtr, interp, 1); CompileWord(envPtr, indexTokenPtr, interp, 2); + PUSH("-1"); CompileWord(envPtr, insertionTokenPtr, interp, 3); - OP(STR_INSERT); + OP(STR_REPLACE); } else { /* * Special case: prepending and appending. --- generic/tclExecute.c +++ generic/tclExecute.c @@ -5553,7 +5553,6 @@ fromIdx = 0; } - if (fromIdx <= toIdx && fromIdx <= length) { value2Ptr = Tcl_ReplaceObj(interp, valuePtr, fromIdx, toIdx - fromIdx + 1, value3Ptr); if (!value2Ptr) { @@ -5564,7 +5563,6 @@ PUSH_OBJECT(value2Ptr); TclDecrRefCount(valuePtr); } - } TRACE_APPEND(("\"%.30s\"\n", O2S(value2Ptr))); TclDecrRefCount(value3Ptr); This mostly works but results in the following two test failures: ==== string-14.18 string replace FAILED ==== Contents of test case: string replace abcdefghijklmnop 10 9 XXX ---- Result was: abcdefghijXXXklmnop ---- Result should have been (exact matching): abcdefghijklmnop ==== string-14.18 FAILED ==== string-30.5 string insert, middle of string, end-relative FAILED ==== Contents of test case: tcl::string::insert 0123 end-2 _ ---- Result was: 0_123 ---- Result should have been (exact matching): 01_23 ==== string-30.5 FAILED string-14.18 I expect to fail, and the corrective action would be to remove the test. By the way, this test exists only in amg-string-insert, and I had to add this test because there was no baseline test confirming [string replace] does nothing when asked to replace an empty string. string-30.5 exposes another difference between [string replace] and [string insert], something I've forgotten about. End-relative indexing doesn't behave quite the same way. For [string replace], end refers to the final character. For [string insert], end refers to the airy void following the final character. [string replace abc end end x] should return abx. [string insert abc end x] should return abcx. How do we deal with this? The bytecode compiler for [string insert] would have to tell STR_REPLACE not to decrement the return value of Tcl_GetCharLength(). The only way I see to do this would be changing STR_REPLACE to accept an operand flag. That flag would only be set by [string insert], meaning it would always be accompanied by a toIdx of -1, with the effect of the removeCount argument to Tcl_ReplaceObj() being computed as -fromIdx which will always be less than or equal to zero. So STR_REPLACE could also just use that flag to force removeCount to be zero. Furthermore, we could avoid even pushing the -1 on the stack because it doesn't say anything the operand flag doesn't already say. This seems overcomplicated. I vote to keep STR_INSERT. -- Andy Goth | <andrew.m.goth/at/gmail/dot/com> |
From: Andy G. <and...@gm...> - 2017-09-18 15:49:52
|
> On 09/17/2017 11:32 PM, Andy Goth wrote: >> I also created an amg-string-insert branch in [incr Tcl] > > Thank you! for actually checking how a change influences bundled > packages. That said, the better fix is to get the Itcl test suite out > of the business of checking literal Tcl error messages. Please review amg-string-insert in [incr Tcl] and merge to trunk if satisfied. -- Andy Goth | <andrew.m.goth/at/gmail/dot/com> |
From: Kevin K. <kev...@gm...> - 2017-09-18 17:18:35
|
On Mon, Sep 18, 2017 at 11:48 AM, Andy Goth <and...@gm...> wrote: > On 09/17/2017 11:32 PM, Andy Goth wrote: >> >>> I also created an amg-string-insert branch in [incr Tcl] >>> >> >> Thank you! for actually checking how a change influences bundled >> packages. That said, the better fix is to get the Itcl test suite out >> of the business of checking literal Tcl error messages. >> > > Please review amg-string-insert in [incr Tcl] and merge to trunk if > satisfied. > I certainly will review it sooner or later - but I'm swamped at the moment. Can you give me an idea of how much it changes existing bytecode? If INST_STRING_REPLACE behaves differently, then quadcode will have to be modified to match. If there's a new instruction then quadcode will have to implement it as well. (The path by which Tcl turns into quadcode depends on tcl::unsupported::getbytecode) |
From: Andy G. <and...@gm...> - 2017-09-18 17:30:15
|
On 09/18/17 12:18, Kevin Kenny wrote: > On Mon, Sep 18, 2017 at 11:48 AM, Andy Goth <and...@gm...> > wrote: >> Please review amg-string-insert in [incr Tcl] and merge to trunk if >> satisfied. > > I certainly will review it sooner or later - but I'm swamped at the > moment. In the email to which you replied I asked only about [incr Tcl], in which my entire change is to fix a few test files to not care about the exact wording of the [string foo] command, where foo is an invalid subcommand. That should only take a moment to look at, and that's the only thing I'm asking to be merged to [incr Tcl] trunk. My changes to the Tcl core repository will take more time, and I am not asking anyone to merge them. A TIP is needed, for example. > Can you give me an idea of how much it changes existing bytecode? The only change to existing bytecode is making STR_REPLACE be implemented using a common, stubs-exported function shared between bytecoded and non-bytecoded [string replace] and the new [string insert] command. > If INST_STRING_REPLACE behaves differently, then quadcode will have to > be modified to match. STR_REPLACE does not behave differently. The new backend is designed to perform the same as the existing implementation. Any deviation is a bug. > If there's a new instruction then quadcode will have to implement it > as well. Yes, there is a new instruction: STR_INSERT. It's basically STR_REPLACE but severely chopped down. It takes three stack arguments rather than four, and the removeCount argument to Tcl_ReplaceObj() is always zero. > (The path by which Tcl turns into quadcode depends on > tcl::unsupported::getbytecode) Quadcode is wholly unknown to me. Please tell me more, or at least where I can read about it. Since my primary goal is learning, it would be good to study quadcode as well as bytecode. -- Andy Goth | <andrew.m.goth/at/gmail/dot/com> |
From: Kevin K. <kev...@gm...> - 2017-09-18 18:13:05
|
On Mon, Sep 18, 2017 at 1:29 PM, Andy Goth <and...@gm...> wrote: > Quadcode is wholly unknown to me. Please tell me more, or at least where > I can read about it. Since my primary goal is learning, it would be good > to study quadcode as well as bytecode. > > Work in progress for a couple or three years now, toward developing a path to compile Tcl to machine code using LLVM. The project was announced at Tcl 2015, https://www.tcl.tk/community/tcl2015/assets/talk14/TheTclQuadcodeCompiler.pdf and there have been regular status reports at the conferences since then. The code is over at https://core.tcl.tk/tclquadcode . Donal's most recent status report was https://www.youtube.com/watch?v=NCNfXNEcXSw - if you want just the slides, they're at http://www.eurotcl.tcl3d.org/presentations/EuroTcl2017-Fellows-TclQuadcodeStatus.pptx - and I'm on the hook to give the status talk next month. It's finally getting to the point where it can handle a reasonable subset of the language - enough that package authors may be able to consider coding in that subset and compiling their code to machine code - which sometimes can run at speeds comparable with C. Numeric-intensive code runs 10-100x faster; general simple code is typically in the 2-6x faster range. String and I/O bound code, alas, sees little improvement, simply because Tcl's string implementation is already pretty good. It's been a long slog, partly because dkf and I both have very limited time to work on it, and partly because this stuff simply is insanely difficult. We have, to some extent, hamstrung ourselves, because we're trying to face the challenge of how to do this without turning Tcl into a different language; forcing the user to give copious hints about what the code is doing (e.g., by declaring data types) is a road we don't want to follow very far. |
From: Andy G. <and...@gm...> - 2017-09-18 18:29:29
|
On 09/18/17 13:12, Kevin Kenny wrote: > On Mon, Sep 18, 2017 at 1:29 PM, Andy Goth <and...@gm... > <mailto:and...@gm...>> wrote: >> Quadcode is wholly unknown to me. > > Work in progress for a couple or three years now, toward developing a > path to compile Tcl to machine code using LLVM. > > String and I/O bound code, alas, sees little improvement, simply > because Tcl's string implementation is already pretty good. That last part is actually good news. Since STR_INSERT is essentially a simplified STR_REPLACE, and since quadcode has little effect on strings, it should be easy to plop STR_INSERT into quadcode, especially if the STR_REPLACE implementation is already done. -- Andy Goth | <andrew.m.goth/at/gmail/dot/com> |
From: Andy G. <and...@gm...> - 2017-09-18 22:14:52
|
The functionality of STR_REPLACE is not changed, though on the mailing list we're discussing the possibility of changing it. (I'm addressing my reply to the list so it can be archived.) The implementation of STR_REPLACE is changed to call Tcl_ReplaceObj(), but this is optional since said function behaves the same as the code it replaces. The translation from [string insert] to [string replace] is harder than anticipated due to the nature of end-relative indexing. At this time I feel the best solution to this problem is to have separate opcodes implemented in terms of a common function. In other words: keep the solution I already have. I plan to review the quadcode implementation for STR_REPLACE to see how or if it can be adapted to do an insert, also if it's appropriate to make it call Tcl_ReplaceObj(). On Sep 18, 2017 16:43, "Kevin Kenny" <kev...@gm...> wrote: > > On Mon, Sep 18, 2017 at 2:28 PM, Andy Goth <and...@gm...> > wrote: > >> On 09/18/17 13:12, Kevin Kenny wrote: >> >>> On Mon, Sep 18, 2017 at 1:29 PM, Andy Goth <and...@gm... >>> <mailto:and...@gm...>> wrote: >>> >>>> Quadcode is wholly unknown to me. >>>> >>> >>> Work in progress for a couple or three years now, toward developing a >>> path to compile Tcl to machine code using LLVM. >>> >>> String and I/O bound code, alas, sees little improvement, simply >>> because Tcl's string implementation is already pretty good. >>> >> >> That last part is actually good news. Since STR_INSERT is essentially a >> simplified STR_REPLACE, and since quadcode has little effect on strings, it >> should be easy to plop STR_INSERT into quadcode, especially if the >> STR_REPLACE implementation is already done. >> > > Yeah, INST_STR_REPLACE is already done. But if its functionality has > changed, then Donal's version (written in LLVM-IR assembly language, more > or less) will have to follow. Not too big a deal, I think. At quadcode > level, we might be able to translate INST_STR_INSERT to call the > STR_REPLACE implementation and then let LLVM inline it. > |
From: Andy G. <and...@gm...> - 2017-09-19 15:36:30
|
On 09/18/17 17:14, Andy Goth wrote: > I plan to review the quadcode implementation for STR_REPLACE to see > how or if it can be adapted to do an insert, Here's how I would add support for STR_INSERT. http://core.tcl.tk/tclquadcode/artifact/287b756c65a7843d?ln=136-139 Add a new section describing strinsert: strinsert [string insert] code: inserts a substring into a string Stack: ... string index insertString => ... newString http://core.tcl.tk/tclquadcode/artifact/b1a5190236f6779e?ln=5332-5382 Duplicate for strinsert. Replace "from" and "to" with "index", and update comments. http://core.tcl.tk/tclquadcode/artifact/ad030c24b96b163b?ln=513-522 Duplicate for strinsert. Replace "srcs" with a single index. http://core.tcl.tk/tclquadcode/artifact/e6111dcf163c26f4?ln=28+1400-1552 Duplicate and significantly chop down for strinsert. strinsert has far fewer cases to consider than strreplace. Concatenate string before index with substring with string after index, be happy! If the index is hard-coded 0 or end, skip the first or the last step, respectively. http://core.tcl.tk/tclquadcode/artifact?name=7c86ea91a8bc42a7&ln=272-283 Add strinsert to list of bytecodes handled by this [switch] case. http://core.tcl.tk/tclquadcode/artifact/47490ac3ae3458a5?ln=284-290 Duplicate for strinsert. Replace "i2" and "i1" with a single "i" index. http://core.tcl.tk/tclquadcode/artifact/654c015594155f5a?ln=664-669 Add strinsert to list of bytecodes handled by this [switch] case. http://core.tcl.tk/tclquadcode/artifact/acb3d16e75b8a528?ln=86-88 Add a new section describing strinsert: strinsert TGT SRC1 SRC2 SRC3 Let TGT be the string SRC1 with the content of the string SRC3 inserted at index SRC2. If SRC2 is start-/end-relative, the first/last inserted character in TGT will have index SRC2. > also if it's appropriate to make it call Tcl_ReplaceObj(). Clearly not. -- Andy Goth | <andrew.m.goth/at/gmail/dot/com> |
From: Andreas L. <av...@lo...> - 2017-09-20 10:35:22
|
Andy Goth <and...@gm...> wrote: > http://core.tcl.tk/tclquadcode/artifact/acb3d16e75b8a528?ln=86-88 > Add a new section describing strinsert: > strinsert TGT SRC1 SRC2 SRC3 > Let TGT be the string SRC1 with the content of the string SRC3 inserted > at index SRC2. If SRC2 is start-/end-relative, the first/last inserted > character in TGT will have index SRC2. This is surely correct, but in a slightly confusing way... With an end-relative index, [string insert abc end-1 xy] would yield abxyc, and [string index abxyc end-1] is indeed "y", but only because string index has a different semantics for end-relative indices than the proposed string insert has. In this context, the meaning of an end-relative index is a bit ambiguous. Maybe it could be reworded as: If SRC2 is start-/end-relative, then string index SRC1 SRC2 will be the first/last inserted character. to make clear, by which semantics indexation is meant. |