tcl-quadcode Mailing List for Tcl (Page 2)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sun, Apr 22, 2018 at 9:42 PM, Kevin Kenny <kev...@gm...>
wrote:

> I see what's going on with the crash in 'mrtest::calc' - and why we've
> gone so far astray in trying to diagnose it!
>
> I got a disassembly from the start of thunk::mrtest::calc(STRING), which
> inlines the 'on-ramp' of the coroutine.  It looks like:
>
> JIT(0x5555583b75e0)`cmd.thunk::mrtest::calc:
>     0x7ffff7fef8e0 <+0>:  pushq  %r15
>     0x7ffff7fef8e2 <+2>:  pushq  %r14
>     0x7ffff7fef8e4 <+4>:  pushq  %rbx
>     0x7ffff7fef8e5 <+5>:  movq   %rsi, %rax
>
> -> 1    # demo.tcl --
>
> ->  0x7ffff7fef8e8 <+8>:   cmpl   $0x2, %edx
>     0x7ffff7fef8eb <+11>:  jne    0x7ffff7fef97d            ; <+157>
>
> ** 2    #
>    3    # Code that demonstrates the Tcl-to-LLVM compilation capabilities
> of
>    4    # tclquadcode. Most of this file is demonstration procedures that
> we can
>
>     0x7ffff7fef8f1 <+17>:  movq   0x8(%rcx), %r14
>
>    4595 params size
>    4596 build {
> ** 4597     my ret [$api Tcl_Alloc $size]
>    4598 }
>    4599
>
>     0x7ffff7fef8f5 <+21>:  movabsq $0x7ffff7a651f0, %rax     ; imm =
> 0x7FFFF7A651F0
>     0x7ffff7fef8ff <+31>:  movl   $0x160, %edi              ; imm = 0x160
>     0x7ffff7fef904 <+36>:  callq  *%rax
>     0x7ffff7fef906 <+38>:  movq   %rax, %rbx
>     0x7ffff7fef909 <+41>:  movabsq $0x7ffff7ff1fa0, %rax     ; imm =
> 0x7FFFF7FF1FA0
>     0x7ffff7fef913 <+51>:  movq   %rax, (%rbx)
>     0x7ffff7fef916 <+54>:  movabsq $0x7ffff7ff3800, %rax     ; imm =
> 0x7FFFF7FF3800
>     0x7ffff7fef920 <+64>:  movq   %rax, 0x8(%rbx)
>     0x7ffff7fef924 <+68>:  movq   %r14, 0x30(%rbx)
>     0x7ffff7fef928 <+72>:  movb   $0x5, 0x28(%rbx)
>
> <+8> through <+11> are checking objc == 2 and jumping off to an error
> routine for 'wrong # args'.
>
> <+21> through <+38> are allocating the coroutine activation record, which
> is 0x160 bytes in size.
>
> <+41> through <+72> are creating the initial content of the activation
> record, The layout of the
> activation record is:
>
> +0 void (*resume)(FrameType*)  The function that resumes the coroutine
> +8 void (*destroy)(FrameType*)  The function that destroys the coroutine
> +16 ... The coroutine promise.  The promise type is:
>         struct {
>             int32 status;      Tcl result code from the last NR callback
>             struct {              MAYBE INT return value from mrtest::calc
>                 int1 flag;
>                 int32 shortword;
>                 int64 longword;
>            } retval;
> +40? char resumePointIndex   The index of the next resume point
> +48? Tcl_Obj* x                        The 'x' parameter to the procedure
>             .... more stuff, not initialized.....
>
> The initialization code correctly initializes the resume pointer
> <+41>-<+51>, and the destroy pointer <+54>-<+64>. Note the two addresses
> that I have marked with ? in the layout of the activation record. The code
> is confused about the size and alignment of the promise. The code from
> 'coro.begin' appears to be presuming that its size is at most 24 bytes, and
> has stored the index of the resume point at offset 40 (0x28) from the start
> of the frame <+72>. The resume index is followed with the variable 'x' from
> objv[1], extracted at <+17> and saved at offset 48 (0x30) at <+68>,
>
> The rest of the call thunk does two Tcl_NRAddCallback calls, one for the
> return thunk and one for tcl.coro.runner, and returns to the trampoline.
>
> This all looks OK so far.
>
> I then see a normal call to tcl.coro.runner. The lead-in of that function
> looks like:
>
>     frame #0: 0x00007ffff7fef1b5 JIT(0x5555583b75e0)`tcl.coro.runner at
> coro.tcl:57
>    54      build {
>    55
>    56  # Get the coroutine handle from client data
> -> 57  set coro.handle [my load $clientDataArray "coro.handle"]
>    58
>    59  # First, has the NRE proc finished execution? If so, we simply
>    60  # want to return back to the trampoline and execute the next
> (lldb) di -m -f -c 20
> JIT(0x5555583b75e0)`tcl.coro.runner:
>     0x7ffff7fef1b0 <+0>:  pushq  %rbp
>     0x7ffff7fef1b1 <+1>:  pushq  %rbx
>     0x7ffff7fef1b2 <+2>:  pushq  %rax
>     0x7ffff7fef1b3 <+3>:  movl   %edx, %ebp
>
> -> 55
> -> 56  # Get the coroutine handle from client data
> -> 57  set coro.handle [my load $clientDataArray "coro.handle"]
> -> 58
> -> 59  # First, has the NRE proc finished execution? If so, we simply
>
> ->  0x7ffff7fef1b5 <+5>:  movq   (%rdi), %rbx
>
>    62
>    63  set llvm.coro.done [$m intrinsic coro.done]
> ** 64  set done [my call ${llvm.coro.done} [list ${coro.handle}]
> "doneFlag"]
>
>     0x7ffff7fef1b8 <+8>:  cmpq   $0x0, (%rbx)
>
> ** 65  my condBr $done $finished $needResume
>    66
>    67      label needResume:
>
>     0x7ffff7fef1bc <+12>: je     0x7ffff7fef1ee            ; <+62>
>
>    70  # that the next time it suspends, we'll loop back to here.
>    71
> ** 72  $api Tcl_NRAddCallback $interp ${tcl.coro.runner} ${coro.handle} \
>    73      [my null char*] [my null char*] [my null char*]
>    74
>
>     0x7ffff7fef1be <+14>: movabsq $0x7ffff7fef1b0, %rax     ; imm =
> 0x7FFFF7FEF1B0
>     0x7ffff7fef1c8 <+24>: movabsq $0x7ffff7a612f0, %r10     ; imm =
> 0x7FFFF7A612F0
>     0x7ffff7fef1d2 <+34>: xorl   %ecx, %ecx
>     0x7ffff7fef1d4 <+36>: xorl   %r8d, %r8d
>     0x7ffff7fef1d7 <+39>: xorl   %r9d, %r9d
>     0x7ffff7fef1da <+42>: movq   %rsi, %rdi
>     0x7ffff7fef1dd <+45>: movq   %rax, %rsi
>     0x7ffff7fef1e0 <+48>: movq   %rbx, %rdx
>     0x7ffff7fef1e3 <+51>: callq  *%r10
>
>    79  set llvm.coro.promise [$m intrinsic coro.promise]
>    80  set promise.addr.raw \
> ** 81      [my call ${llvm.coro.promise} \
>    82  [list ${coro.handle} \
>    83        [Const $alignment int32] \
>
>     0x7ffff7fef1e6 <+54>: movl   %ebp, 0x10(%rbx)
>
>    85  "promise.addr.raw"]
>    86  set promise.addr [my cast(ptr) ${promise.addr.raw} int32
> "promise.addr"]
> ** 87  my store $result ${promise.addr}
>    88
>    89  # Resume the coroutine, and return to the trampoline to await
>
>     0x7ffff7fef1e9 <+57>: movq   %rbx, %rdi
>     0x7ffff7fef1ec <+60>: callq  *(%rbx)
>
> The entry sequence <+0>-<+2>, the copy of the Tcl status from the third
> arg to the callback <+3>, the extraction of the frame pointer from the
> client data array <+5>, the check for 'done' <+8>-<+12>, and the
> rescheduling of Tcl_NRAddCallback <+14>-<+51> all happen normally. The Tcl
> status gets moved to the start of the promise <+54>, and then <+57>-<+60>
> the actual body of 'mrtest::calc' is invoked.
>
> The lead-in of that procedure looks like:
>
> JIT(0x5555583b75e0)`tcl ::mrtest::calc -1929838593.resume:
> ->  0x7ffff7ff1fa0 <+0>:  pushq  %rbp
>     0x7ffff7ff1fa1 <+1>:  pushq  %r15
>     0x7ffff7ff1fa3 <+3>:  pushq  %r14
>     0x7ffff7ff1fa5 <+5>:  pushq  %r13
>     0x7ffff7ff1fa7 <+7>:  pushq  %r12
>     0x7ffff7ff1fa9 <+9>:  pushq  %rbx
>     0x7ffff7ff1faa <+10>: subq   $0x38, %rsp
>     0x7ffff7ff1fae <+14>: movq   %rdi, %r12
>     0x7ffff7ff1fb1 <+17>: movb   0x30(%r12), %al
>     0x7ffff7ff1fb6 <+22>: decb   %al
>     0x7ffff7ff1fb8 <+24>: cmpb   $0x4, %al
>     0x7ffff7ff1fba <+26>: ja     0x7ffff7ff20c8            ; <+296>
>     0x7ffff7ff1fc0 <+32>: movzbl %al, %eax
>     0x7ffff7ff1fc3 <+35>: movabsq $0x7ffff7fe0c60, %rcx     ; imm =
> 0x7FFFF7FE0C60
>     0x7ffff7ff1fcd <+45>: jmpq   *(%rcx,%rax,8)
>
> <+0>-<+10> are creating a stack frame to hold temporaries that do not live
> across coroutine suspension.
> <+14> is recovering the frame pointer that was created in the call thunk,
> and it's indeed at the same address and has the same content.
> Now <+17> is recovering the resume index from the coroutine frame. THIS
> INDEX WAS SAVED AT OFFSET +0X28 IN THE FRAME, BUT IS BEING RECOVERED FROM
> OFFSET 0X30
>
>

Continuing: The code is written to expect the index initially to be 0x5, or
that's at least what the thunk put there. Instead, it's getting the least
significant byte of the Tcl_Obj* that represents the 'x' argument, which is
what is actually stored at offset 0x30 in the coroutine frame. This is a
pretty arbitrary value, On my most recent test run, it happens to have come
out to be '0xf0'. This is decremented to '0xef', which is above 4, so
instead of jumping to the start of the procedure, it is jumping to the case
for the value 5, which is label %AfterCoroSuspend892 in the dump of the
optimized code. I haven't followed the logic very far from there, but it's
the wrong switch case, and it's starting out by loading from uninitialized
memory in the coroutine frame, so Here Be Nasal Daemons.

So -- how does this come about? What I see in the optimized LLVM assembly
is:

define internal hidden fastcc void @"tcl ::mrtest::calc
-1929838593.resume"(%"tcl ::mrtest::calc -1929838593.Frame"* %FramePtr)
!dbg !12517 {
entry.resume:
  %index.addr = getelementptr inbounds %"tcl ::mrtest::calc
-1929838593.Frame", %"tcl ::mrtest::calc -1929838593.Frame"* %FramePtr, i64
0, i32 3
  %index = load i3, i3* %index.addr, align 1
  switch i3 %index, label %unreachable [
    i3 0, label %pc.98
    i3 1, label %pc.169
    i3 2, label %pc.240
    i3 3, label %pc.302
    i3 -4, label %pc.408
    i3 -3, label %AfterCoroSuspend892
  ]

Instead of getting the index by character offset, as the thunk did, it's
getting it from the %"tcl ::mrtest::calc -1929838593.Frame" type. The
layout of this is

%"tcl ::mrtest::calc -1929838593.Frame" = type {
    void (%"tcl ::mrtest::calc -1929838593.Frame"*)*,    ; resume address
    void (%"tcl ::mrtest::calc -1929838593.Frame"*)*,    ; destroy address
    %"tcl ::mrtest::calc -1929838593.promise",     ; promise
    i3,      ; resume index
;  ...followed by lots more stuff:
    %Tcl_Obj*, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32,
%Tcl_Obj**, i32, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32,
%Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, %Interp*,
%Tcl_Obj*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*,
i8*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*, i8* }

The promise is:

%"tcl ::mrtest::calc -1929838593.promise" = type { i32, { i32, %INT } }

And of course %INT is %INT = type { i1, i32, i64 }

It would appear to me that the coro.begin logic in the thunk and the
'getelementptr' logic within the coroutine differ on the size and alignment
of the promise, causing all the trouble.

Unfortunately, it's not clear to me how to get out of this gracefully.  I
don't know whether it would help to allocate the promise simply as a
character array of the requisite size (rounded to a multiple of
2*sizeof(pointer)) - but how to determine what that size needs to be?
Alternatively, we'd need to figure out how the size computation is going
astray. One thing that's suggestive is that

https://llvm.org/docs/Coroutines.html#areas-requiring-attention

shows

    6. Alignment is ignored by coro.begin and coro.free intrinsics.

which seems to be what's happening here.

I suppose that the next thing to try would be to overallocate a character
array for the promise. It appears that the only types that are legitimate
for the result are:

    CALLFRAME?  FAIL?  IMPURE? ( ZEROONE | INT | DOUBLE | NUMERIC | STRING )

The worst case should be FAIL IMPURE NUMERIC, which would need:

- 8 bytes (int32 + structure alignment padding) for FAIL
- 16 bytes (int1 + Tcl_Obj* + structure alignment padding) for IMPURE
- 16 bytes for the INT inside the NUMERIC (int1 + int32 + int64 + structure
alignment padding
- 32 bytes, therefore, for the NUMERIC (int1 + INT + double + structure
alignment padding)
In addition, there's the 'int32' at the start that carries the Tcl result
that arrives at NR callbacks, again with padding for structure alignment.
So 8 (Tcl status) + 8 (FAIL) + 16 (IMPURE) + 32 (NUMERIC) would give us 64
bytes for the largest promise. If we allocated this much always (16-byte
aligned), carried it around as a character array for LLVM's purposes and
bitcast it to the correct structure only in our own code, it might start
working. But it's too late tonight to try that change.

I welcome other suggestions!

2017	Jan	Feb (2)	Mar (6)	Apr (4)	May (20)	Jun (15)	Jul (4)	Aug (2)	Sep (6)	Oct (6)	Nov (20)	Dec (3)
2018	Jan (16)	Feb (3)	Mar (7)	Apr (40)	May (1)	Jun	Jul	Aug	Sep	Oct (2)	Nov	Dec (1)
2019	Jan (7)	Feb (5)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

tcl-quadcode Mailing List for Tcl (Page 2)

The Tool Command Language implementation

tcl-quadcode — Development of the tclquadcode Tcl Native Compilation System