You can subscribe to this list here.
| 2017 |
Jan
|
Feb
(2) |
Mar
(6) |
Apr
(4) |
May
(20) |
Jun
(15) |
Jul
(4) |
Aug
(2) |
Sep
(6) |
Oct
(6) |
Nov
(20) |
Dec
(3) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2018 |
Jan
(16) |
Feb
(3) |
Mar
(7) |
Apr
(40) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
(1) |
| 2019 |
Jan
(7) |
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Kevin K. <kev...@gm...> - 2018-04-23 02:56:14
|
On Sun, Apr 22, 2018 at 9:42 PM, Kevin Kenny <kev...@gm...>
wrote:
> I see what's going on with the crash in 'mrtest::calc' - and why we've
> gone so far astray in trying to diagnose it!
>
> I got a disassembly from the start of thunk::mrtest::calc(STRING), which
> inlines the 'on-ramp' of the coroutine. It looks like:
>
> JIT(0x5555583b75e0)`cmd.thunk::mrtest::calc:
> 0x7ffff7fef8e0 <+0>: pushq %r15
> 0x7ffff7fef8e2 <+2>: pushq %r14
> 0x7ffff7fef8e4 <+4>: pushq %rbx
> 0x7ffff7fef8e5 <+5>: movq %rsi, %rax
>
> -> 1 # demo.tcl --
>
> -> 0x7ffff7fef8e8 <+8>: cmpl $0x2, %edx
> 0x7ffff7fef8eb <+11>: jne 0x7ffff7fef97d ; <+157>
>
> ** 2 #
> 3 # Code that demonstrates the Tcl-to-LLVM compilation capabilities
> of
> 4 # tclquadcode. Most of this file is demonstration procedures that
> we can
>
> 0x7ffff7fef8f1 <+17>: movq 0x8(%rcx), %r14
>
> 4595 params size
> 4596 build {
> ** 4597 my ret [$api Tcl_Alloc $size]
> 4598 }
> 4599
>
> 0x7ffff7fef8f5 <+21>: movabsq $0x7ffff7a651f0, %rax ; imm =
> 0x7FFFF7A651F0
> 0x7ffff7fef8ff <+31>: movl $0x160, %edi ; imm = 0x160
> 0x7ffff7fef904 <+36>: callq *%rax
> 0x7ffff7fef906 <+38>: movq %rax, %rbx
> 0x7ffff7fef909 <+41>: movabsq $0x7ffff7ff1fa0, %rax ; imm =
> 0x7FFFF7FF1FA0
> 0x7ffff7fef913 <+51>: movq %rax, (%rbx)
> 0x7ffff7fef916 <+54>: movabsq $0x7ffff7ff3800, %rax ; imm =
> 0x7FFFF7FF3800
> 0x7ffff7fef920 <+64>: movq %rax, 0x8(%rbx)
> 0x7ffff7fef924 <+68>: movq %r14, 0x30(%rbx)
> 0x7ffff7fef928 <+72>: movb $0x5, 0x28(%rbx)
>
> <+8> through <+11> are checking objc == 2 and jumping off to an error
> routine for 'wrong # args'.
>
> <+21> through <+38> are allocating the coroutine activation record, which
> is 0x160 bytes in size.
>
> <+41> through <+72> are creating the initial content of the activation
> record, The layout of the
> activation record is:
>
> +0 void (*resume)(FrameType*) The function that resumes the coroutine
> +8 void (*destroy)(FrameType*) The function that destroys the coroutine
> +16 ... The coroutine promise. The promise type is:
> struct {
> int32 status; Tcl result code from the last NR callback
> struct { MAYBE INT return value from mrtest::calc
> int1 flag;
> int32 shortword;
> int64 longword;
> } retval;
> +40? char resumePointIndex The index of the next resume point
> +48? Tcl_Obj* x The 'x' parameter to the procedure
> .... more stuff, not initialized.....
>
> The initialization code correctly initializes the resume pointer
> <+41>-<+51>, and the destroy pointer <+54>-<+64>. Note the two addresses
> that I have marked with ? in the layout of the activation record. The code
> is confused about the size and alignment of the promise. The code from
> 'coro.begin' appears to be presuming that its size is at most 24 bytes, and
> has stored the index of the resume point at offset 40 (0x28) from the start
> of the frame <+72>. The resume index is followed with the variable 'x' from
> objv[1], extracted at <+17> and saved at offset 48 (0x30) at <+68>,
>
> The rest of the call thunk does two Tcl_NRAddCallback calls, one for the
> return thunk and one for tcl.coro.runner, and returns to the trampoline.
>
> This all looks OK so far.
>
> I then see a normal call to tcl.coro.runner. The lead-in of that function
> looks like:
>
> frame #0: 0x00007ffff7fef1b5 JIT(0x5555583b75e0)`tcl.coro.runner at
> coro.tcl:57
> 54 build {
> 55
> 56 # Get the coroutine handle from client data
> -> 57 set coro.handle [my load $clientDataArray "coro.handle"]
> 58
> 59 # First, has the NRE proc finished execution? If so, we simply
> 60 # want to return back to the trampoline and execute the next
> (lldb) di -m -f -c 20
> JIT(0x5555583b75e0)`tcl.coro.runner:
> 0x7ffff7fef1b0 <+0>: pushq %rbp
> 0x7ffff7fef1b1 <+1>: pushq %rbx
> 0x7ffff7fef1b2 <+2>: pushq %rax
> 0x7ffff7fef1b3 <+3>: movl %edx, %ebp
>
> -> 55
> -> 56 # Get the coroutine handle from client data
> -> 57 set coro.handle [my load $clientDataArray "coro.handle"]
> -> 58
> -> 59 # First, has the NRE proc finished execution? If so, we simply
>
> -> 0x7ffff7fef1b5 <+5>: movq (%rdi), %rbx
>
> 62
> 63 set llvm.coro.done [$m intrinsic coro.done]
> ** 64 set done [my call ${llvm.coro.done} [list ${coro.handle}]
> "doneFlag"]
>
> 0x7ffff7fef1b8 <+8>: cmpq $0x0, (%rbx)
>
> ** 65 my condBr $done $finished $needResume
> 66
> 67 label needResume:
>
> 0x7ffff7fef1bc <+12>: je 0x7ffff7fef1ee ; <+62>
>
> 70 # that the next time it suspends, we'll loop back to here.
> 71
> ** 72 $api Tcl_NRAddCallback $interp ${tcl.coro.runner} ${coro.handle} \
> 73 [my null char*] [my null char*] [my null char*]
> 74
>
> 0x7ffff7fef1be <+14>: movabsq $0x7ffff7fef1b0, %rax ; imm =
> 0x7FFFF7FEF1B0
> 0x7ffff7fef1c8 <+24>: movabsq $0x7ffff7a612f0, %r10 ; imm =
> 0x7FFFF7A612F0
> 0x7ffff7fef1d2 <+34>: xorl %ecx, %ecx
> 0x7ffff7fef1d4 <+36>: xorl %r8d, %r8d
> 0x7ffff7fef1d7 <+39>: xorl %r9d, %r9d
> 0x7ffff7fef1da <+42>: movq %rsi, %rdi
> 0x7ffff7fef1dd <+45>: movq %rax, %rsi
> 0x7ffff7fef1e0 <+48>: movq %rbx, %rdx
> 0x7ffff7fef1e3 <+51>: callq *%r10
>
> 79 set llvm.coro.promise [$m intrinsic coro.promise]
> 80 set promise.addr.raw \
> ** 81 [my call ${llvm.coro.promise} \
> 82 [list ${coro.handle} \
> 83 [Const $alignment int32] \
>
> 0x7ffff7fef1e6 <+54>: movl %ebp, 0x10(%rbx)
>
> 85 "promise.addr.raw"]
> 86 set promise.addr [my cast(ptr) ${promise.addr.raw} int32
> "promise.addr"]
> ** 87 my store $result ${promise.addr}
> 88
> 89 # Resume the coroutine, and return to the trampoline to await
>
> 0x7ffff7fef1e9 <+57>: movq %rbx, %rdi
> 0x7ffff7fef1ec <+60>: callq *(%rbx)
>
> The entry sequence <+0>-<+2>, the copy of the Tcl status from the third
> arg to the callback <+3>, the extraction of the frame pointer from the
> client data array <+5>, the check for 'done' <+8>-<+12>, and the
> rescheduling of Tcl_NRAddCallback <+14>-<+51> all happen normally. The Tcl
> status gets moved to the start of the promise <+54>, and then <+57>-<+60>
> the actual body of 'mrtest::calc' is invoked.
>
> The lead-in of that procedure looks like:
>
> JIT(0x5555583b75e0)`tcl ::mrtest::calc -1929838593.resume:
> -> 0x7ffff7ff1fa0 <+0>: pushq %rbp
> 0x7ffff7ff1fa1 <+1>: pushq %r15
> 0x7ffff7ff1fa3 <+3>: pushq %r14
> 0x7ffff7ff1fa5 <+5>: pushq %r13
> 0x7ffff7ff1fa7 <+7>: pushq %r12
> 0x7ffff7ff1fa9 <+9>: pushq %rbx
> 0x7ffff7ff1faa <+10>: subq $0x38, %rsp
> 0x7ffff7ff1fae <+14>: movq %rdi, %r12
> 0x7ffff7ff1fb1 <+17>: movb 0x30(%r12), %al
> 0x7ffff7ff1fb6 <+22>: decb %al
> 0x7ffff7ff1fb8 <+24>: cmpb $0x4, %al
> 0x7ffff7ff1fba <+26>: ja 0x7ffff7ff20c8 ; <+296>
> 0x7ffff7ff1fc0 <+32>: movzbl %al, %eax
> 0x7ffff7ff1fc3 <+35>: movabsq $0x7ffff7fe0c60, %rcx ; imm =
> 0x7FFFF7FE0C60
> 0x7ffff7ff1fcd <+45>: jmpq *(%rcx,%rax,8)
>
> <+0>-<+10> are creating a stack frame to hold temporaries that do not live
> across coroutine suspension.
> <+14> is recovering the frame pointer that was created in the call thunk,
> and it's indeed at the same address and has the same content.
> Now <+17> is recovering the resume index from the coroutine frame. THIS
> INDEX WAS SAVED AT OFFSET +0X28 IN THE FRAME, BUT IS BEING RECOVERED FROM
> OFFSET 0X30
>
>
Continuing: The code is written to expect the index initially to be 0x5, or
that's at least what the thunk put there. Instead, it's getting the least
significant byte of the Tcl_Obj* that represents the 'x' argument, which is
what is actually stored at offset 0x30 in the coroutine frame. This is a
pretty arbitrary value, On my most recent test run, it happens to have come
out to be '0xf0'. This is decremented to '0xef', which is above 4, so
instead of jumping to the start of the procedure, it is jumping to the case
for the value 5, which is label %AfterCoroSuspend892 in the dump of the
optimized code. I haven't followed the logic very far from there, but it's
the wrong switch case, and it's starting out by loading from uninitialized
memory in the coroutine frame, so Here Be Nasal Daemons.
So -- how does this come about? What I see in the optimized LLVM assembly
is:
define internal hidden fastcc void @"tcl ::mrtest::calc
-1929838593.resume"(%"tcl ::mrtest::calc -1929838593.Frame"* %FramePtr)
!dbg !12517 {
entry.resume:
%index.addr = getelementptr inbounds %"tcl ::mrtest::calc
-1929838593.Frame", %"tcl ::mrtest::calc -1929838593.Frame"* %FramePtr, i64
0, i32 3
%index = load i3, i3* %index.addr, align 1
switch i3 %index, label %unreachable [
i3 0, label %pc.98
i3 1, label %pc.169
i3 2, label %pc.240
i3 3, label %pc.302
i3 -4, label %pc.408
i3 -3, label %AfterCoroSuspend892
]
Instead of getting the index by character offset, as the thunk did, it's
getting it from the %"tcl ::mrtest::calc -1929838593.Frame" type. The
layout of this is
%"tcl ::mrtest::calc -1929838593.Frame" = type {
void (%"tcl ::mrtest::calc -1929838593.Frame"*)*, ; resume address
void (%"tcl ::mrtest::calc -1929838593.Frame"*)*, ; destroy address
%"tcl ::mrtest::calc -1929838593.promise", ; promise
i3, ; resume index
; ...followed by lots more stuff:
%Tcl_Obj*, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32,
%Tcl_Obj**, i32, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32,
%Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, %Interp*,
%Tcl_Obj*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*,
i8*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*, i8* }
The promise is:
%"tcl ::mrtest::calc -1929838593.promise" = type { i32, { i32, %INT } }
And of course %INT is %INT = type { i1, i32, i64 }
It would appear to me that the coro.begin logic in the thunk and the
'getelementptr' logic within the coroutine differ on the size and alignment
of the promise, causing all the trouble.
Unfortunately, it's not clear to me how to get out of this gracefully. I
don't know whether it would help to allocate the promise simply as a
character array of the requisite size (rounded to a multiple of
2*sizeof(pointer)) - but how to determine what that size needs to be?
Alternatively, we'd need to figure out how the size computation is going
astray. One thing that's suggestive is that
https://llvm.org/docs/Coroutines.html#areas-requiring-attention
shows
6. Alignment is ignored by coro.begin and coro.free intrinsics.
which seems to be what's happening here.
I suppose that the next thing to try would be to overallocate a character
array for the promise. It appears that the only types that are legitimate
for the result are:
CALLFRAME? FAIL? IMPURE? ( ZEROONE | INT | DOUBLE | NUMERIC | STRING )
The worst case should be FAIL IMPURE NUMERIC, which would need:
- 8 bytes (int32 + structure alignment padding) for FAIL
- 16 bytes (int1 + Tcl_Obj* + structure alignment padding) for IMPURE
- 16 bytes for the INT inside the NUMERIC (int1 + int32 + int64 + structure
alignment padding
- 32 bytes, therefore, for the NUMERIC (int1 + INT + double + structure
alignment padding)
In addition, there's the 'int32' at the start that carries the Tcl result
that arrives at NR callbacks, again with padding for structure alignment.
So 8 (Tcl status) + 8 (FAIL) + 16 (IMPURE) + 32 (NUMERIC) would give us 64
bytes for the largest promise. If we allocated this much always (16-byte
aligned), carried it around as a character array for LLVM's purposes and
bitcast it to the correct structure only in our own code, it might start
working. But it's too late tonight to try that change.
I welcome other suggestions!
|
|
From: Kevin K. <kev...@gm...> - 2018-04-23 01:43:56
|
Sorry about the truncated message - some key combination sent it
prematurely, I'll retransmit once I've finished typing it.
On Sun, Apr 22, 2018 at 9:42 PM, Kevin Kenny <kev...@gm...>
wrote:
> I see what's going on with the crash in 'mrtest::calc' - and why we've
> gone so far astray in trying to diagnose it!
>
> I got a disassembly from the start of thunk::mrtest::calc(STRING), which
> inlines the 'on-ramp' of the coroutine. It looks like:
>
> JIT(0x5555583b75e0)`cmd.thunk::mrtest::calc:
> 0x7ffff7fef8e0 <+0>: pushq %r15
> 0x7ffff7fef8e2 <+2>: pushq %r14
> 0x7ffff7fef8e4 <+4>: pushq %rbx
> 0x7ffff7fef8e5 <+5>: movq %rsi, %rax
>
> -> 1 # demo.tcl --
>
> -> 0x7ffff7fef8e8 <+8>: cmpl $0x2, %edx
> 0x7ffff7fef8eb <+11>: jne 0x7ffff7fef97d ; <+157>
>
> ** 2 #
> 3 # Code that demonstrates the Tcl-to-LLVM compilation capabilities
> of
> 4 # tclquadcode. Most of this file is demonstration procedures that
> we can
>
> 0x7ffff7fef8f1 <+17>: movq 0x8(%rcx), %r14
>
> 4595 params size
> 4596 build {
> ** 4597 my ret [$api Tcl_Alloc $size]
> 4598 }
> 4599
>
> 0x7ffff7fef8f5 <+21>: movabsq $0x7ffff7a651f0, %rax ; imm =
> 0x7FFFF7A651F0
> 0x7ffff7fef8ff <+31>: movl $0x160, %edi ; imm = 0x160
> 0x7ffff7fef904 <+36>: callq *%rax
> 0x7ffff7fef906 <+38>: movq %rax, %rbx
> 0x7ffff7fef909 <+41>: movabsq $0x7ffff7ff1fa0, %rax ; imm =
> 0x7FFFF7FF1FA0
> 0x7ffff7fef913 <+51>: movq %rax, (%rbx)
> 0x7ffff7fef916 <+54>: movabsq $0x7ffff7ff3800, %rax ; imm =
> 0x7FFFF7FF3800
> 0x7ffff7fef920 <+64>: movq %rax, 0x8(%rbx)
> 0x7ffff7fef924 <+68>: movq %r14, 0x30(%rbx)
> 0x7ffff7fef928 <+72>: movb $0x5, 0x28(%rbx)
>
> <+8> through <+11> are checking objc == 2 and jumping off to an error
> routine for 'wrong # args'.
>
> <+21> through <+38> are allocating the coroutine activation record, which
> is 0x160 bytes in size.
>
> <+41> through <+72> are creating the initial content of the activation
> record, The layout of the
> activation record is:
>
> +0 void (*resume)(FrameType*) The function that resumes the coroutine
> +8 void (*destroy)(FrameType*) The function that destroys the coroutine
> +16 ... The coroutine promise. The promise type is:
> struct {
> int32 status; Tcl result code from the last NR callback
> struct { MAYBE INT return value from mrtest::calc
> int1 flag;
> int32 shortword;
> int64 longword;
> } retval;
> +40? char resumePointIndex The index of the next resume point
> +48? Tcl_Obj* x The 'x' parameter to the procedure
> .... more stuff, not initialized.....
>
> The initialization code correctly initializes the resume pointer
> <+41>-<+51>, and the destroy pointer <+54>-<+64>. Note the two addresses
> that I have marked with ? in the layout of the activation record. The code
> is confused about the size and alignment of the promise. The code from
> 'coro.begin' appears to be presuming that its size is at most 24 bytes, and
> has stored the index of the resume point at offset 40 (0x28) from the start
> of the frame <+72>. The resume index is followed with the variable 'x' from
> objv[1], extracted at <+17> and saved at offset 48 (0x30) at <+68>,
>
> The rest of the call thunk does two Tcl_NRAddCallback calls, one for the
> return thunk and one for tcl.coro.runner, and returns to the trampoline.
>
> This all looks OK so far.
>
> I then see a normal call to tcl.coro.runner. The lead-in of that function
> looks like:
>
> frame #0: 0x00007ffff7fef1b5 JIT(0x5555583b75e0)`tcl.coro.runner at
> coro.tcl:57
> 54 build {
> 55
> 56 # Get the coroutine handle from client data
> -> 57 set coro.handle [my load $clientDataArray "coro.handle"]
> 58
> 59 # First, has the NRE proc finished execution? If so, we simply
> 60 # want to return back to the trampoline and execute the next
> (lldb) di -m -f -c 20
> JIT(0x5555583b75e0)`tcl.coro.runner:
> 0x7ffff7fef1b0 <+0>: pushq %rbp
> 0x7ffff7fef1b1 <+1>: pushq %rbx
> 0x7ffff7fef1b2 <+2>: pushq %rax
> 0x7ffff7fef1b3 <+3>: movl %edx, %ebp
>
> -> 55
> -> 56 # Get the coroutine handle from client data
> -> 57 set coro.handle [my load $clientDataArray "coro.handle"]
> -> 58
> -> 59 # First, has the NRE proc finished execution? If so, we simply
>
> -> 0x7ffff7fef1b5 <+5>: movq (%rdi), %rbx
>
> 62
> 63 set llvm.coro.done [$m intrinsic coro.done]
> ** 64 set done [my call ${llvm.coro.done} [list ${coro.handle}]
> "doneFlag"]
>
> 0x7ffff7fef1b8 <+8>: cmpq $0x0, (%rbx)
>
> ** 65 my condBr $done $finished $needResume
> 66
> 67 label needResume:
>
> 0x7ffff7fef1bc <+12>: je 0x7ffff7fef1ee ; <+62>
>
> 70 # that the next time it suspends, we'll loop back to here.
> 71
> ** 72 $api Tcl_NRAddCallback $interp ${tcl.coro.runner} ${coro.handle} \
> 73 [my null char*] [my null char*] [my null char*]
> 74
>
> 0x7ffff7fef1be <+14>: movabsq $0x7ffff7fef1b0, %rax ; imm =
> 0x7FFFF7FEF1B0
> 0x7ffff7fef1c8 <+24>: movabsq $0x7ffff7a612f0, %r10 ; imm =
> 0x7FFFF7A612F0
> 0x7ffff7fef1d2 <+34>: xorl %ecx, %ecx
> 0x7ffff7fef1d4 <+36>: xorl %r8d, %r8d
> 0x7ffff7fef1d7 <+39>: xorl %r9d, %r9d
> 0x7ffff7fef1da <+42>: movq %rsi, %rdi
> 0x7ffff7fef1dd <+45>: movq %rax, %rsi
> 0x7ffff7fef1e0 <+48>: movq %rbx, %rdx
> 0x7ffff7fef1e3 <+51>: callq *%r10
>
> 79 set llvm.coro.promise [$m intrinsic coro.promise]
> 80 set promise.addr.raw \
> ** 81 [my call ${llvm.coro.promise} \
> 82 [list ${coro.handle} \
> 83 [Const $alignment int32] \
>
> 0x7ffff7fef1e6 <+54>: movl %ebp, 0x10(%rbx)
>
> 85 "promise.addr.raw"]
> 86 set promise.addr [my cast(ptr) ${promise.addr.raw} int32
> "promise.addr"]
> ** 87 my store $result ${promise.addr}
> 88
> 89 # Resume the coroutine, and return to the trampoline to await
>
> 0x7ffff7fef1e9 <+57>: movq %rbx, %rdi
> 0x7ffff7fef1ec <+60>: callq *(%rbx)
>
> The entry sequence <+0>-<+2>, the copy of the Tcl status from the third
> arg to the callback <+3>, the extraction of the frame pointer from the
> client data array <+5>, the check for 'done' <+8>-<+12>, and the
> rescheduling of Tcl_NRAddCallback <+14>-<+51> all happen normally. The Tcl
> status gets moved to the start of the promise <+54>, and then <+57>-<+60>
> the actual body of 'mrtest::calc' is invoked.
>
> The lead-in of that procedure looks like:
>
> JIT(0x5555583b75e0)`tcl ::mrtest::calc -1929838593.resume:
> -> 0x7ffff7ff1fa0 <+0>: pushq %rbp
> 0x7ffff7ff1fa1 <+1>: pushq %r15
> 0x7ffff7ff1fa3 <+3>: pushq %r14
> 0x7ffff7ff1fa5 <+5>: pushq %r13
> 0x7ffff7ff1fa7 <+7>: pushq %r12
> 0x7ffff7ff1fa9 <+9>: pushq %rbx
> 0x7ffff7ff1faa <+10>: subq $0x38, %rsp
> 0x7ffff7ff1fae <+14>: movq %rdi, %r12
> 0x7ffff7ff1fb1 <+17>: movb 0x30(%r12), %al
> 0x7ffff7ff1fb6 <+22>: decb %al
> 0x7ffff7ff1fb8 <+24>: cmpb $0x4, %al
> 0x7ffff7ff1fba <+26>: ja 0x7ffff7ff20c8 ; <+296>
> 0x7ffff7ff1fc0 <+32>: movzbl %al, %eax
> 0x7ffff7ff1fc3 <+35>: movabsq $0x7ffff7fe0c60, %rcx ; imm =
> 0x7FFFF7FE0C60
> 0x7ffff7ff1fcd <+45>: jmpq *(%rcx,%rax,8)
>
> <+0>-<+10> are creating a stack frame to hold temporaries that do not live
> across coroutine suspension.
> <+14> is recovering the frame pointer that was created in the call thunk,
> and it's indeed at the same address and has the same content.
> Now <+17> is recovering the resume index from the coroutine frame. THIS
> INDEX WAS SAVED AT OFFSET +0X28 IN THE FRAME, BUT IS BEING RECOVERED FROM
> OFFSET 0X30
>
>
|
|
From: Kevin K. <kev...@gm...> - 2018-04-23 01:42:59
|
I see what's going on with the crash in 'mrtest::calc' - and why we've gone
so far astray in trying to diagnose it!
I got a disassembly from the start of thunk::mrtest::calc(STRING), which
inlines the 'on-ramp' of the coroutine. It looks like:
JIT(0x5555583b75e0)`cmd.thunk::mrtest::calc:
0x7ffff7fef8e0 <+0>: pushq %r15
0x7ffff7fef8e2 <+2>: pushq %r14
0x7ffff7fef8e4 <+4>: pushq %rbx
0x7ffff7fef8e5 <+5>: movq %rsi, %rax
-> 1 # demo.tcl --
-> 0x7ffff7fef8e8 <+8>: cmpl $0x2, %edx
0x7ffff7fef8eb <+11>: jne 0x7ffff7fef97d ; <+157>
** 2 #
3 # Code that demonstrates the Tcl-to-LLVM compilation capabilities of
4 # tclquadcode. Most of this file is demonstration procedures that
we can
0x7ffff7fef8f1 <+17>: movq 0x8(%rcx), %r14
4595 params size
4596 build {
** 4597 my ret [$api Tcl_Alloc $size]
4598 }
4599
0x7ffff7fef8f5 <+21>: movabsq $0x7ffff7a651f0, %rax ; imm =
0x7FFFF7A651F0
0x7ffff7fef8ff <+31>: movl $0x160, %edi ; imm = 0x160
0x7ffff7fef904 <+36>: callq *%rax
0x7ffff7fef906 <+38>: movq %rax, %rbx
0x7ffff7fef909 <+41>: movabsq $0x7ffff7ff1fa0, %rax ; imm =
0x7FFFF7FF1FA0
0x7ffff7fef913 <+51>: movq %rax, (%rbx)
0x7ffff7fef916 <+54>: movabsq $0x7ffff7ff3800, %rax ; imm =
0x7FFFF7FF3800
0x7ffff7fef920 <+64>: movq %rax, 0x8(%rbx)
0x7ffff7fef924 <+68>: movq %r14, 0x30(%rbx)
0x7ffff7fef928 <+72>: movb $0x5, 0x28(%rbx)
<+8> through <+11> are checking objc == 2 and jumping off to an error
routine for 'wrong # args'.
<+21> through <+38> are allocating the coroutine activation record, which
is 0x160 bytes in size.
<+41> through <+72> are creating the initial content of the activation
record, The layout of the
activation record is:
+0 void (*resume)(FrameType*) The function that resumes the coroutine
+8 void (*destroy)(FrameType*) The function that destroys the coroutine
+16 ... The coroutine promise. The promise type is:
struct {
int32 status; Tcl result code from the last NR callback
struct { MAYBE INT return value from mrtest::calc
int1 flag;
int32 shortword;
int64 longword;
} retval;
+40? char resumePointIndex The index of the next resume point
+48? Tcl_Obj* x The 'x' parameter to the procedure
.... more stuff, not initialized.....
The initialization code correctly initializes the resume pointer
<+41>-<+51>, and the destroy pointer <+54>-<+64>. Note the two addresses
that I have marked with ? in the layout of the activation record. The code
is confused about the size and alignment of the promise. The code from
'coro.begin' appears to be presuming that its size is at most 24 bytes, and
has stored the index of the resume point at offset 40 (0x28) from the start
of the frame <+72>. The resume index is followed with the variable 'x' from
objv[1], extracted at <+17> and saved at offset 48 (0x30) at <+68>,
The rest of the call thunk does two Tcl_NRAddCallback calls, one for the
return thunk and one for tcl.coro.runner, and returns to the trampoline.
This all looks OK so far.
I then see a normal call to tcl.coro.runner. The lead-in of that function
looks like:
frame #0: 0x00007ffff7fef1b5 JIT(0x5555583b75e0)`tcl.coro.runner at
coro.tcl:57
54 build {
55
56 # Get the coroutine handle from client data
-> 57 set coro.handle [my load $clientDataArray "coro.handle"]
58
59 # First, has the NRE proc finished execution? If so, we simply
60 # want to return back to the trampoline and execute the next
(lldb) di -m -f -c 20
JIT(0x5555583b75e0)`tcl.coro.runner:
0x7ffff7fef1b0 <+0>: pushq %rbp
0x7ffff7fef1b1 <+1>: pushq %rbx
0x7ffff7fef1b2 <+2>: pushq %rax
0x7ffff7fef1b3 <+3>: movl %edx, %ebp
-> 55
-> 56 # Get the coroutine handle from client data
-> 57 set coro.handle [my load $clientDataArray "coro.handle"]
-> 58
-> 59 # First, has the NRE proc finished execution? If so, we simply
-> 0x7ffff7fef1b5 <+5>: movq (%rdi), %rbx
62
63 set llvm.coro.done [$m intrinsic coro.done]
** 64 set done [my call ${llvm.coro.done} [list ${coro.handle}] "doneFlag"]
0x7ffff7fef1b8 <+8>: cmpq $0x0, (%rbx)
** 65 my condBr $done $finished $needResume
66
67 label needResume:
0x7ffff7fef1bc <+12>: je 0x7ffff7fef1ee ; <+62>
70 # that the next time it suspends, we'll loop back to here.
71
** 72 $api Tcl_NRAddCallback $interp ${tcl.coro.runner} ${coro.handle} \
73 [my null char*] [my null char*] [my null char*]
74
0x7ffff7fef1be <+14>: movabsq $0x7ffff7fef1b0, %rax ; imm =
0x7FFFF7FEF1B0
0x7ffff7fef1c8 <+24>: movabsq $0x7ffff7a612f0, %r10 ; imm =
0x7FFFF7A612F0
0x7ffff7fef1d2 <+34>: xorl %ecx, %ecx
0x7ffff7fef1d4 <+36>: xorl %r8d, %r8d
0x7ffff7fef1d7 <+39>: xorl %r9d, %r9d
0x7ffff7fef1da <+42>: movq %rsi, %rdi
0x7ffff7fef1dd <+45>: movq %rax, %rsi
0x7ffff7fef1e0 <+48>: movq %rbx, %rdx
0x7ffff7fef1e3 <+51>: callq *%r10
79 set llvm.coro.promise [$m intrinsic coro.promise]
80 set promise.addr.raw \
** 81 [my call ${llvm.coro.promise} \
82 [list ${coro.handle} \
83 [Const $alignment int32] \
0x7ffff7fef1e6 <+54>: movl %ebp, 0x10(%rbx)
85 "promise.addr.raw"]
86 set promise.addr [my cast(ptr) ${promise.addr.raw} int32
"promise.addr"]
** 87 my store $result ${promise.addr}
88
89 # Resume the coroutine, and return to the trampoline to await
0x7ffff7fef1e9 <+57>: movq %rbx, %rdi
0x7ffff7fef1ec <+60>: callq *(%rbx)
The entry sequence <+0>-<+2>, the copy of the Tcl status from the third arg
to the callback <+3>, the extraction of the frame pointer from the client
data array <+5>, the check for 'done' <+8>-<+12>, and the rescheduling of
Tcl_NRAddCallback <+14>-<+51> all happen normally. The Tcl status gets
moved to the start of the promise <+54>, and then <+57>-<+60> the actual
body of 'mrtest::calc' is invoked.
The lead-in of that procedure looks like:
JIT(0x5555583b75e0)`tcl ::mrtest::calc -1929838593.resume:
-> 0x7ffff7ff1fa0 <+0>: pushq %rbp
0x7ffff7ff1fa1 <+1>: pushq %r15
0x7ffff7ff1fa3 <+3>: pushq %r14
0x7ffff7ff1fa5 <+5>: pushq %r13
0x7ffff7ff1fa7 <+7>: pushq %r12
0x7ffff7ff1fa9 <+9>: pushq %rbx
0x7ffff7ff1faa <+10>: subq $0x38, %rsp
0x7ffff7ff1fae <+14>: movq %rdi, %r12
0x7ffff7ff1fb1 <+17>: movb 0x30(%r12), %al
0x7ffff7ff1fb6 <+22>: decb %al
0x7ffff7ff1fb8 <+24>: cmpb $0x4, %al
0x7ffff7ff1fba <+26>: ja 0x7ffff7ff20c8 ; <+296>
0x7ffff7ff1fc0 <+32>: movzbl %al, %eax
0x7ffff7ff1fc3 <+35>: movabsq $0x7ffff7fe0c60, %rcx ; imm =
0x7FFFF7FE0C60
0x7ffff7ff1fcd <+45>: jmpq *(%rcx,%rax,8)
<+0>-<+10> are creating a stack frame to hold temporaries that do not live
across coroutine suspension.
<+14> is recovering the frame pointer that was created in the call thunk,
and it's indeed at the same address and has the same content.
Now <+17> is recovering the resume index from the coroutine frame. THIS
INDEX WAS SAVED AT OFFSET +0X28 IN THE FRAME, BUT IS BEING RECOVERED FROM
OFFSET 0X30
|
|
From: Kevin K. <kev...@gm...> - 2018-04-22 04:51:41
|
I found one case where the offending 'llvm.trap' is being generated on
procedure exit.
Even though the value coming into 'procReturn' (or ${tcl.procedure.return})
is a FAIL, that does not guarantee that the Tcl result code coming out of
it is still nonzero!
Look at
https://core.tcl-lang.org/tclquadcode/artifact?udc=1&ln=4237-4240&name=a73528563f
If some inner context has executed [return -level N somevalue], and we're
now N levels out from that context, the call to TclUpdateReturnInfo in
'handleReturn' will return TCL_OK, and that code will now leave us with a
FAIL object and an OK status code.
That looks to be the only path that falls into the one particular trap that
I traced back from. Clearly, nothing in the procs is doing [return -level],
so that particular trap is likely not the issue, but we surely need to
address it anyway. Any ideas what you want to do about it?
It appears to me that if a procedure might possibly have a [return -level
n] escape from it, then its nth-level caller needs to have the possibility
of the [return -level] accounted for. Since it's a fairly unusual
operation, I'd be comfortable with saying that if 'return -level' is
present at a level of other than 0 or 1, treat the proc and all callers as
returning FAIL STRING.
We could, of course, attempt bookkeeping for other 'return -level'
operations, but that's for when we really start worrying about compilation
of user-defined control structures - in other words, after we've worked out
inlining to the level where we can do simple things like [uplevel 1 $arg].
That's not the immediate issue at hand.
A naked FAIL that is actually a 'success' is not much of a problem until it
comes time to widen it to a FAIL SOMETHING - which, in the idea that I'm
thinking of, would always be FAIL STRING. At the time we're raising FAIL to
FAIL STRING, we should be processing the result of an operation that always
throws, and so we should still have the interpreter's result available.
That would mean that widening from FAIL to FAIL STRING will need to fetch
the interpreter result, bump the ref count and stash it in the FAIL STRING
value, along with the TCL_OK status.
Am I totally mad here? (Wait. I know I'm totally mad. Is the *idea* totally
mad?)
|
|
From: Donal K. F. <don...@ma...> - 2018-04-21 13:27:11
|
On 20/04/2018 15:49, Kevin Kenny wrote:
> Also the header of ${tcl.procedure.return} surely is describing its
> return value incorrectly.
The mapping of a pure FAIL is a plain int32 containing just the Tcl
result code. The main complication at that point is that the result in
the Tcl_Interp may be updated due to processing a [return -level $x $y]
and… oh, yes, that could be very buggy. :-)
The documentation of procLeave(INT,STRING) in build.tcl is more
thoroughly misleading.
Donal.
|
|
From: Kevin K. <kev...@gm...> - 2018-04-20 14:49:26
|
meant to copy list, to capture 'developer stream of consciousness'
---------- Forwarded message ----------
From: Kevin Kenny <kev...@gm...>
Date: Fri, Apr 20, 2018 at 10:42 AM
Subject: Re: [tclquadcode] Still groping forward on the crash in CoroEarly
To: "Donal K. Fellows" <don...@ma...>
On Fri, Apr 20, 2018 at 9:43 AM, Donal K. Fellows <
don...@ma...> wrote:
Poking at a core dump with lldb after building the offending code as a
> separate DLL (as that gives me some debugging info!) I see this instruction
> sequence leading up the point the crash happened (+276) inside 'tcl
> ::mrtest::calc -1929838593.resume':
>
> 0x1049aeb9c <+204>: callq 0x1049b0e04 ; symbol stub
> for: Tcl_AppendObjToErrorInfo
> 0x1049aeba1 <+209>: andl $-0x5, %ebx
> 0x1049aeba4 <+212>: movl %ebx, 0x1a8(%r13)
> 0x1049aebab <+219>: movl 0x4(%rsp), %ebx
> 0x1049aebaf <+223>: movq 0x110(%rbp), %rdi
> 0x1049aebb6 <+230>: movl (%rdi), %eax
> 0x1049aebb8 <+232>: cmpl $0x1, %eax
> 0x1049aebbb <+235>: jg 0x1049af08b ; <+1467>
> 0x1049aebc1 <+241>: callq 0x1049b0df2 ; symbol stub
> for: TclFreeObj
> 0x1049aebc6 <+246>: movq 0x118(%rbp), %rdi
> 0x1049aebcd <+253>: movl (%rdi), %eax
> 0x1049aebcf <+255>: cmpl $0x1, %eax
> 0x1049aebd2 <+258>: jle 0x1049aee57 ; <+903>
> 0x1049aebd8 <+264>: jmp 0x1049aef77 ; <+1191>
> 0x1049aebdd <+269>: movq 0x108(%rbp), %rdi
> -> 0x1049aebe4 <+276>: movl 0x14(%rdi), %r12d
> 0x1049aebe8 <+280>: movq 0x20(%rdi), %r15
>
> I'm guessing that the crash area correlates to the optimised IR code:
>
> action.required.i162: ; preds =
> %tcl.list.index1.exit
> %tmp.1074 = extractvalue { i32, %Tcl_Obj* } %tmp.107130, 1, !dbg !13055
> %refCount.i.i157 = getelementptr %Tcl_Obj, %Tcl_Obj* %tmp.1074, i64 0,
> i32 0, !dbg !13080
> %refCount1.i.i158 = load i32, i32* %refCount.i.i157, align 4, !dbg
> !13083
> %15 = add nsw i32 %refCount1.i.i158, 1, !dbg !13083
> store i32 %15, i32* %refCount.i.i157, align 4, !dbg !13083
> %16 = icmp slt i32 %refCount1.i.i158, 1, !dbg !13084
> br i1 %16, label %freeObject.i.i163, label %leave.i.i164, !dbg !13084
>
> Decoding the debug metadata, this appears to be part of the inlining of
> tcl.procedure.return within tcl ::mrtest::rfib 252, and I've got lost after
> that point. :-)
>
Me, too! Some weird things happen when LLVM inlines coroutine calls.
It looks very much like there's a problem with the preconditions for the
> calling of tcl.procedure.return. I've also no idea why we're using an NRE
> call for mrtest in the first place; that code doesn't have a 'yield' or
> 'coroutine' anywhere near it.
>
I've been using NRE at the moment for all recursive procedures - that
follows Tcl's practice, so as not to blow the C stack. Supporting tailcall
and coroutines was Miguel's offer to offset the cost of allocating
activation records on the heap, but he told me that his primary motivation
was to get Tcl's recursion off the C stack so that large numbers of threads
become manageable.
Another thing that will require NRE (although that's not quite right in the
implementation yet) is any of the I/O calls. This is done specifically
because of the existence of the 'coroutine' package in tcllib and of
stacked channels. In the presence of channel stacking, it's highly likely
that a call such as 'read' or 'write' will yield until completion, and I
surely can't tell what the underlying I/O system will be at compile time!
I think you're getting farther than I am. I may be deceiving myself, but at
least I *was* fairly certain that I was not successfully resuming 'calc'
after the initial suspend.
With you building a DLL while I'm running JIT, I'm also not sure that we're
seeing entirely the same set of passes. Are rectest* and qsort now working
for you? They also appear to have the offending return sequences, but
don't fail for me now.
Anyway, this is indeed a place to look further. I'm trying to look
upstream in quadcode, too, for where I might be violating a precondition of
'procReturn' - what are its preconditions, anyway?
Also the header of ${tcl.procedure.return} surely is describing its return
value incorrectly.
|
|
From: Donal K. F. <don...@ma...> - 2018-04-20 13:43:42
|
On 20/04/2018 05:09, Kevin Kenny wrote:
> The resume proc, mrtest::calc(STRING).resume, begins with a short
> codeburst that LLVM inserts; I can't instrument this readily, or at
> least I don't know how to do it.
>
> I had a printf inserted immediately following the call to
> llvm.coro.suspend in the entry block of mrtest::calc. That printf was
> never reached.
>
> And that's as far as I can get tonight.
Poking at a core dump with lldb after building the offending code as a
separate DLL (as that gives me some debugging info!) I see this
instruction sequence leading up the point the crash happened (+276)
inside 'tcl ::mrtest::calc -1929838593.resume':
0x1049aeb9c <+204>: callq 0x1049b0e04 ; symbol stub
for: Tcl_AppendObjToErrorInfo
0x1049aeba1 <+209>: andl $-0x5, %ebx
0x1049aeba4 <+212>: movl %ebx, 0x1a8(%r13)
0x1049aebab <+219>: movl 0x4(%rsp), %ebx
0x1049aebaf <+223>: movq 0x110(%rbp), %rdi
0x1049aebb6 <+230>: movl (%rdi), %eax
0x1049aebb8 <+232>: cmpl $0x1, %eax
0x1049aebbb <+235>: jg 0x1049af08b ; <+1467>
0x1049aebc1 <+241>: callq 0x1049b0df2 ; symbol stub
for: TclFreeObj
0x1049aebc6 <+246>: movq 0x118(%rbp), %rdi
0x1049aebcd <+253>: movl (%rdi), %eax
0x1049aebcf <+255>: cmpl $0x1, %eax
0x1049aebd2 <+258>: jle 0x1049aee57 ; <+903>
0x1049aebd8 <+264>: jmp 0x1049aef77 ; <+1191>
0x1049aebdd <+269>: movq 0x108(%rbp), %rdi
-> 0x1049aebe4 <+276>: movl 0x14(%rdi), %r12d
0x1049aebe8 <+280>: movq 0x20(%rdi), %r15
I'm guessing that the crash area correlates to the optimised IR code:
action.required.i162: ; preds =
%tcl.list.index1.exit
%tmp.1074 = extractvalue { i32, %Tcl_Obj* } %tmp.107130, 1, !dbg !13055
%refCount.i.i157 = getelementptr %Tcl_Obj, %Tcl_Obj* %tmp.1074, i64
0, i32 0, !dbg !13080
%refCount1.i.i158 = load i32, i32* %refCount.i.i157, align 4, !dbg
!13083
%15 = add nsw i32 %refCount1.i.i158, 1, !dbg !13083
store i32 %15, i32* %refCount.i.i157, align 4, !dbg !13083
%16 = icmp slt i32 %refCount1.i.i158, 1, !dbg !13084
br i1 %16, label %freeObject.i.i163, label %leave.i.i164, !dbg !13084
Decoding the debug metadata, this appears to be part of the inlining of
tcl.procedure.return within tcl ::mrtest::rfib 252, and I've got lost
after that point. :-)
It looks very much like there's a problem with the preconditions for the
calling of tcl.procedure.return. I've also no idea why we're using an
NRE call for mrtest in the first place; that code doesn't have a 'yield'
or 'coroutine' anywhere near it.
Donal.
|
|
From: Kevin K. <kev...@gm...> - 2018-04-20 04:09:24
|
What's going on with mrtest::* really puzzles me, and I'd really appreciate another pair of eyeballs. The tester calls 'mrtest::calc' from Tcl, which enters the call thunk. As far as I can tell, the failing sequence looks to be the same code as the three test cases that pass. By inserting strategic printf's, what I discovered: The initial call from the thunk to 'mrtest::calc(STRING) succeeds. mrtest::calc Tcl_Alloc's the coroutine frame, fills the initial values in, and suspends, which returns the coro frame to the thunk. (This 'on ramp' portion of the proc is actually inlined into the thunk.) The coro frame starts with the resume address (I don't have any idea how to get a link map, so no idea if it's the RIIGHT resume address!), the destroy address (ditto!), the promise (correct at this point; TCL_OK for the Tcl status and an uninitialized INT return value), the resume index (-3), and the parameter 'x'. The call to Tcl_NRAddCallback for the return thunk is made, and returns. The call to Tcl_NRAddCallback for tcl.coro.runner is made, and returns. The thunk returns to the trampoline, and the callback into tcl.coro.runner fires. tcl.coro.runner sees the same pointer to the coro frame that mrtest::calc allocated. It calls whatever the resume address in the coro frame points to. That address is the same address that mrtest::calc put there. As I said, I have no idea whether it's the right address, but the code follows the same pattern as the procedures that work. The resume proc, mrtest::calc(STRING).resume, begins with a short codeburst that LLVM inserts; I can't instrument this readily, or at least I don't know how to do it. I had a printf inserted immediately following the call to llvm.coro.suspend in the entry block of mrtest::calc. That printf was never reached. And that's as far as I can get tonight. |
|
From: Kevin K. <kev...@gm...> - 2018-04-20 03:47:22
|
On Tue, Apr 17, 2018 at 6:27 PM, Donal K. Fellows <
don...@ma...> wrote:
> I'm getting reliable SIGSEGVs instead of SIGFPEs; those appear to emanate
> from 'llvm.trap' intrinsics, which are the sort of things that turn up when
> the issued code is actively wrong in some subtle way. I don't know which
> one we might be hitting, but they shouldn't be turning up (assuming we're
> not in the nasal-demon case of coro.trap).
>
> Places to look for trouble:
>
> * tcl ::rectest1 252.resume (in relation to label action.required.i58)
> * tcl ::rectest1 -1929838593.resume (in relation to label
> action.required.i145)
> * tcl ::rectest1 -1929838593 (in relation to label action.required.i119)
> * tcl ::rectest1 252 (in relation to label action.required.i44)
>
> The 'action.required' labels come from the code to add and drop reference
> counts in various scenarios; there's ten of them in stdlib.tcl, all in
> trivial functions. It looks (with very little analysis) like something
> feeding a NULL into a STRING argument; the code issuer *REALLY* assumes
> that that's impossible. Or maybe it is an undef that is leaking through;
> I'm not sure, but those are the usual causes.
>
> Interestingly, the predecessor blocks of all of those labels with problems
> have the same label: tcl.procedure.return.exit (a generated label resulting
> from the exit of tcl.procedure.return, which is what applies the tricks
> that happen to transform exception codes when a procedure terminates). That
> hints that there might be problems with marshalling the result across the
> boundary of the procedure now that the extra coroutine machinery is in
> place? I don't know, but at least that points the finger at one place than
> many, even if where that place is remains uncertain. In any case, we're now
> able to reliably spit out intelligible optimised IR. It shows that there's
> a bug.
I have things to the point where rectest1, rectest2 and qsort all run
perfectly - and still contain these questionable traps. I've figured out
what's generating them, and it isn't NRE-specific.
(mrtest::* is still failing - I'll discuss that in a separate message.)
The offending sequence appears to be:
110: procLeave {temp 28 110} {temp 27 108} {literal rectest1} : FAIL ⇐ INT
STRING
111: free {} {temp 27 108} : VOID ⇐ INT
112: {widenTo -1392967681 {FAIL STRING}} {temp 0 112} {temp 28 110} : {FAIL
STRING} ⇐ FAIL
113: free {} {temp 28 110} : VOID ⇐ FAIL
114: NRE.return {} Nothing {temp 0 112} : VOID ⇐ NEXIST {FAIL STRING}
In the initial translation, we get:
; procLeave - translates to a call to tcl.procedure.return
%tmp.28 = call i32 @tcl.procedure.return(%INT %13, %Tcl_Obj* %str.18,
i32* %tcl.errorCode), !dbg !4394
; freeing an INT does nothing
; widening a pure FAIL to a FAIL STRING - the STRING is properly an UNDEF
; since we started with a pure FAIL
%tmp.014 = insertvalue { i32, %Tcl_Obj* } undef, i32 %tmp.28, 0, !dbg
!4394
; add to the refcount of the FAIL STRING
call void @tcl.addFailReference({ i32, %Tcl_Obj* } %tmp.014), !dbg !4394
; gubbins to return from an NRE procedure
%15 = getelementptr %"tcl ::rectest1 252.promise", %"tcl ::rectest1
252.promise"* %"coro,promise", i32 0, i32 0, !dbg !4394
store i32 0, i32* %15, !dbg !4394
%16 = getelementptr %"tcl ::rectest1 252.promise", %"tcl ::rectest1
252.promise"* %"coro,promise", i32 0, i32 1, !dbg !4394
store { i32, %Tcl_Obj* } %tmp.014, { i32, %Tcl_Obj* }* %16, !dbg !4394
br label %coro.exit, !dbg !4394
What the optimizer does with this:
; before this point is the inlined code from tcl.procedure.return, which
; isn't actually relevant
tcl.procedure.return.exit: ; preds = %done.i,
%handle.return.i
; this phi is collecting the Tcl status code from 'tcl.procedure.return'
%tmp.2837 = phi i32 [ %36, %handle.return.i ], [ %19, %done.i ]
; now we make the FAIL STRING from the pure FAIL
%tmp.014 = insertvalue { i32, %Tcl_Obj* } undef, i32 %tmp.2837, 0, !dbg
!10673
; The inlined 'addFailReference' apparently cannot prove that
tcl.procedure.return
; always stored a nonzero value in the FAIL, so it has this test:
%40 = icmp eq i32 %tmp.2837, 0, !dbg !10716
br i1 %40, label %action.required.i41, label %coro.exit, !dbg !10716
; But if the value is zero, it can prove that promoting the FAIL to a
; FAIL STRING will cause reference of an 'undef':
action.required.i41: ; preds =
%tcl.procedure.return.exit
call void @llvm.trap(), !dbg !10718
unreachable
My guess is that we have a code path that will get into tcl.procedure.return
with a zero code - or at least LLVM is unable to prove that we don't.
Should I be worried?
|
|
From: Kevin K. <kev...@gm...> - 2018-04-18 23:34:55
|
Good news: with the change to make the alignment constraint on llvm.coro.promise be [Const $::tcl_platform(pointerSize) int32], everything compiles, and the rectest1 and qsort cases even work. I'm pretty sure that I have a bug in returning error from an NRE proc, and that's what you're seeing. That 'trap; unreachable" sequence is what I expect if an LLVM coro is resumed after a Tcl proc returns. It's precisely the nasal daemons code from coro.trap, and I expect to see it soon after procedure return. I'll add a test case that throws an error from a recursive nest. The good news is that rectest1 and qsort now are working! mrtest::* still isn't, probably owing to some problem related to what you turned up. I fully expected to be generating bad code at this point. I was mildly surprised that anything worked! Onward to debugging, now that the optimizer no longer crashes and burns! |
|
From: Donal K. F. <don...@ma...> - 2018-04-18 14:13:32
|
On 18/04/2018 15:09, Kevin Kenny wrote: > LLVM gave up a long time ago on portability of bitcode among machines > with different DataLayouts. It's not-so-awful assembler, nothing more. :-) Donal. |
|
From: Kevin K. <kev...@gm...> - 2018-04-18 14:10:01
|
On Wed, Apr 18, 2018 at 10:02 AM, Donal K. Fellows < don...@ma...> wrote: > On 18/04/2018 13:16, Kevin Kenny wrote: > >> The 'alignment' arg to @llvm.coro.promise must be an integer constant. >> AlignOf doesn't return an integer constant, it returns a complex expression. >> > > It's a constant expression (involving shenanigans with GEP)… so if the > promise builder really wants an integer constant, we're unable to use > alignof. > > What's worse, the documentation for all this says just: > > The second argument is an alignment requirements of the promise. If a >> frontend designated %promise = alloca i32 as a promise, the alignment >> argument to coro.promise should be the alignment of i32 on the target >> platform. If a frontend designated %promise = alloca i32, align 16 as a >> promise, the alignment argument should be 16. This argument only accepts >> constants. >> > > Yes, that it's got to be an actual genuine constant integer (as opposed to > just being a constant with integer value) is entirely undocumented. > > You can probably just use $tcl_platform(wordSize) or > $tcl_platform(pointerSize) as alignments. (After converting to LLVM > constants.) > > I might give that a try. Swift (or was it Rust?) reaches into the DataLayout to construct sizeof and alignof, rather than using the SizeOf and AlignOf calls in the C API. Since we gave up on avoiding C++ quite some time ago, I was thinking of doing the same, but you're right that tcl_platform would be a quick and dirty fix. LLVM gave up a long time ago on portability of bitcode among machines with different DataLayouts. |
|
From: Donal K. F. <don...@ma...> - 2018-04-18 14:02:39
|
On 18/04/2018 13:16, Kevin Kenny wrote: > The 'alignment' arg to @llvm.coro.promise must be an integer constant. > AlignOf doesn't return an integer constant, it returns a complex expression. It's a constant expression (involving shenanigans with GEP)… so if the promise builder really wants an integer constant, we're unable to use alignof. What's worse, the documentation for all this says just: > The second argument is an alignment requirements of the promise. If a frontend designated %promise = alloca i32 as a promise, the alignment argument to coro.promise should be the alignment of i32 on the target platform. If a frontend designated %promise = alloca i32, align 16 as a promise, the alignment argument should be 16. This argument only accepts constants. > Yes, that it's got to be an actual genuine constant integer (as opposed to just being a constant with integer value) is entirely undocumented. You can probably just use $tcl_platform(wordSize) or $tcl_platform(pointerSize) as alignments. (After converting to LLVM constants.) Donal. |
|
From: Kevin K. <kev...@gm...> - 2018-04-18 12:16:22
|
On Tue, Apr 17, 2018 at 8:01 PM, Kevin Kenny <kev...@gm...> wrote: > > Wait, you can dump post-optimization code? > > I cannot - I'm getting the SEGV or FPE inside the optimizer. I'm not > entering the generated code at all; [$m optimize] does not return. > > If you're getting through the optimizer from tester.tcl, I'd very much > like to know how! > > I wasn't even expecting the code to *work* at this stage! I haven't been > able even to attempt to run it because of the compiler crashes. > > I'm slogging through the steps in https://llvm.org/docs/ > GettingStarted.html - in hopes that making a debugging build of a more > current version will shed light on the subject. > Having a debug build did indeed shed some light. The 'alignment' arg to @llvm.coro.promise must be an integer constant. AlignOf doesn't return an integer constant, it returns a complex expression. This particular crash should go away if I implement the calls into the DataLayout that I discussed a few days ago, to get sizeof and alignof from there rather than using pointer arithmetic. opt: /storage/kennykb/llvm.org/llvm/include/llvm/Support/Casting.h:255: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::ConstantInt; Y = llvm::Value; typename llvm::cast_retty<X, Y*>::ret_type = llvm::ConstantInt*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed. Stack dump: 0. Program arguments: /opt/llvm/bin/opt -S -O3 -enable-coroutines -o test1.ll test.bc 1. Running pass 'Lower early coroutine intrinsics' on function '@"tcl ::rectest1 -1929838593"' #0 0x000055d5cf30d66d llvm::sys::PrintStackTrace(llvm::raw_ostream&) /storage/kennykb/llvm.org/llvm/lib/Support/Unix/Signals.inc:398:0 #1 0x000055d5cf30d700 PrintStackTraceSignalHandler(void*) /storage/kennykb/ llvm.org/llvm/lib/Support/Unix/Signals.inc:462:0 #2 0x000055d5cf30b8d5 llvm::sys::RunSignalHandlers() /storage/kennykb/ llvm.org/llvm/lib/Support/Signals.cpp:49:0 #3 0x000055d5cf30ced9 SignalHandler(int) /storage/kennykb/ llvm.org/llvm/lib/Support/Unix/Signals.inc:252:0 #4 0x00007f39da6b9150 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x13150) #5 0x00007f39d93c00bb gsignal /build/glibc-itYbWN/glibc-2.26/signal/../sysdeps/unix/sysv/linux/raise.c:51:0 #6 0x00007f39d93c1f5d abort /build/glibc-itYbWN/glibc-2.26/stdlib/abort.c:92:0 #7 0x00007f39d93b7f17 __assert_fail_base /build/glibc-itYbWN/glibc-2.26/assert/assert.c:92:0 #8 0x00007f39d93b7fc2 (/lib/x86_64-linux-gnu/libc.so.6+0x2efc2) #9 0x000055d5ccea8e1d llvm::cast_retty<llvm::ConstantInt, llvm::Value*>::ret_type llvm::cast<llvm::ConstantInt, llvm::Value>(llvm::Value*) /storage/kennykb/ llvm.org/llvm/include/llvm/Support/Casting.h:255:0 #10 0x000055d5ceba52b5 llvm::CoroPromiseInst::getAlignment() const /storage/kennykb/llvm.org/llvm/lib/Transforms/Coroutines/CoroInstr.h:253:0 #11 0x000055d5ceba39d5 (anonymous namespace)::Lowerer::lowerCoroPromise(llvm::CoroPromiseInst*) /storage/kennykb/llvm.org/llvm/lib/Transforms/Coroutines/CoroEarly.cpp:68:0 #12 0x000055d5ceba44d8 (anonymous namespace)::Lowerer::lowerEarlyIntrinsics(llvm::Function&) /storage/kennykb/ llvm.org/llvm/lib/Transforms/Coroutines/CoroEarly.cpp:201:0 #13 0x000055d5ceba4922 (anonymous namespace)::CoroEarly::runOnFunction(llvm::Function&) /storage/kennykb/ llvm.org/llvm/lib/Transforms/Coroutines/CoroEarly.cpp:248:0 #14 0x000055d5cead5430 llvm::FPPassManager::runOnFunction(llvm::Function&) /storage/kennykb/llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1520:0 #15 0x000055d5cead51c1 llvm::legacy::FunctionPassManagerImpl::run(llvm::Function&) /storage/kennykb/llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1469:0 #16 0x000055d5cead4d96 llvm::legacy::FunctionPassManager::run(llvm::Function&) /storage/kennykb/ llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1394:0 #17 0x000055d5cce6745a main /storage/kennykb/ llvm.org/llvm/tools/opt/opt.cpp:712:0 #18 0x00007f39d93aa1c1 __libc_start_main /build/glibc-itYbWN/glibc-2.26/csu/../csu/libc-start.c:342:0 #19 0x000055d5cce309aa _start (/opt/llvm/bin/opt+0x14db9aa) |
|
From: Kevin K. <kev...@gm...> - 2018-04-18 00:01:50
|
On Tue, Apr 17, 2018 at 6:27 PM, Donal K. Fellows < don...@ma...> wrote: > On 17/04/2018 02:06, Kevin Kenny wrote: > >> WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* >> in 'f351(tcl.nr.add.callback)' >> WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* >> in '0' >> > > I've added in the (definitely weird and wonderful) incantation to make > these warnings go away and I can dump post-optimization code (which is > sometimes easier to read than pre-optimization; sometimes…). With that > done, I'm getting reliable SIGSEGVs instead of SIGFPEs; those appear to > emanate from 'llvm.trap' intrinsics, which are the sort of things that turn > up when the issued code is actively wrong in some subtle way. I don't know > which one we might be hitting, but they shouldn't be turning up (assuming > we're not in the nasal-demon case of coro.trap). > > Places to look for trouble: > > * tcl ::rectest1 252.resume (in relation to label action.required.i58) > * tcl ::rectest1 -1929838593.resume (in relation to label > action.required.i145) > * tcl ::rectest1 -1929838593 (in relation to label action.required.i119) > * tcl ::rectest1 252 (in relation to label action.required.i44) > > The 'action.required' labels come from the code to add and drop reference > counts in various scenarios; there's ten of them in stdlib.tcl, all in > trivial functions. It looks (with very little analysis) like something > feeding a NULL into a STRING argument; the code issuer *REALLY* assumes > that that's impossible. Or maybe it is an undef that is leaking through; > I'm not sure, but those are the usual causes. > > Interestingly, the predecessor blocks of all of those labels with problems > have the same label: tcl.procedure.return.exit (a generated label resulting > from the exit of tcl.procedure.return, which is what applies the tricks > that happen to transform exception codes when a procedure terminates). That > hints that there might be problems with marshalling the result across the > boundary of the procedure now that the extra coroutine machinery is in > place? I don't know, but at least that points the finger at one place than > many, even if where that place is remains uncertain. In any case, we're now > able to reliably spit out intelligible optimised IR. It shows that there's > a bug. :-) Wait, you can dump post-optimization code? I cannot - I'm getting the SEGV or FPE inside the optimizer. I'm not entering the generated code at all; [$m optimize] does not return. If you're getting through the optimizer from tester.tcl, I'd very much like to know how! I wasn't even expecting the code to *work* at this stage! I haven't been able even to attempt to run it because of the compiler crashes. I'm slogging through the steps in https://llvm.org/docs/GettingStarted.html - in hopes that making a debugging build of a more current version will shed light on the subject. |
|
From: Donal K. F. <don...@ma...> - 2018-04-17 22:27:59
|
On 17/04/2018 02:06, Kevin Kenny wrote: > WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* > in 'f351(tcl.nr.add.callback)' > WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* > in '0' I've added in the (definitely weird and wonderful) incantation to make these warnings go away and I can dump post-optimization code (which is sometimes easier to read than pre-optimization; sometimes…). With that done, I'm getting reliable SIGSEGVs instead of SIGFPEs; those appear to emanate from 'llvm.trap' intrinsics, which are the sort of things that turn up when the issued code is actively wrong in some subtle way. I don't know which one we might be hitting, but they shouldn't be turning up (assuming we're not in the nasal-demon case of coro.trap). Places to look for trouble: * tcl ::rectest1 252.resume (in relation to label action.required.i58) * tcl ::rectest1 -1929838593.resume (in relation to label action.required.i145) * tcl ::rectest1 -1929838593 (in relation to label action.required.i119) * tcl ::rectest1 252 (in relation to label action.required.i44) The 'action.required' labels come from the code to add and drop reference counts in various scenarios; there's ten of them in stdlib.tcl, all in trivial functions. It looks (with very little analysis) like something feeding a NULL into a STRING argument; the code issuer *REALLY* assumes that that's impossible. Or maybe it is an undef that is leaking through; I'm not sure, but those are the usual causes. Interestingly, the predecessor blocks of all of those labels with problems have the same label: tcl.procedure.return.exit (a generated label resulting from the exit of tcl.procedure.return, which is what applies the tricks that happen to transform exception codes when a procedure terminates). That hints that there might be problems with marshalling the result across the boundary of the procedure now that the extra coroutine machinery is in place? I don't know, but at least that points the finger at one place than many, even if where that place is remains uncertain. In any case, we're now able to reliably spit out intelligible optimised IR. It shows that there's a bug. :-) Donal. |
|
From: Kevin K. <kev...@gm...> - 2018-04-17 01:07:44
|
On Mon, Apr 16, 2018 at 12:50 PM, Kevin Kenny <kev...@gm...>
wrote:
> You're right, I was siphoning off the bitcode in LLVM::optimise (jit.tcl)
> in the 'dump-pre' step, which comes just before verifying the module.
> Once I'm in a place to do it, I'll move that after the 'verify' and report
> back. (It surely doesn't explain the crash in '$module optimize', which
> comes after verification and in any case finalizes the debug information,
> but getting the assembly and bitcode after the info is finalized might
> offer more insight.)
OK, I've done this.
(Also, uniqueUndef is exonerated - by virtue of the fact that it's no longer
in the codebase. All of our variable references are now handled without
having to call ReplaceAllUsesWith for any of them.)
Still, as expected, running the optimizer in process crashes:
$ tclsh8.7 demos/perftest/tester.tcl -just rectest1
Linker: /usr/lib/llvm-5.0/bin/clang
restrict <- rectest1
CONFIGURE: -quadcode-log {}
==================TIME (ORIGINAL)===================
eval rectest1: "..."
time rectest1: 1.0779985283296543 microseconds per iteration
======================COMPILING=====================
get-command-list: 37 microseconds
WARNING:varframe.tcl:2972:callframe.store.array:make.array should trigger
traces
WARNING:varframe.tcl:783:tcl.callframe.load of array should trigger traces
WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* in
'f351(tcl.nr.add.callback)'
WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* in
'0'
init-module: 773642 microseconds
register: 4928 microseconds
specialise: 79990 microseconds
generate: 36118 microseconds
dump-pre: 17040 microseconds
Segmentation fault (core dumped)
I added a dump of the code after verification but before optimization, and
it produces both bitcode and LLVM assembly. The debug information is
finalized, and those 'temporary' items are gone.
$ grep temporary test.ll
(no output)
Trying to translate either crashes 'opt'. The crashes are subtly different.
$ opt -S test.bc -O3 -enable-coroutines -o optimized.ll
#0 0x00007f8c9ee73dfa llvm::sys::PrintStackTrace(llvm::raw_ostream&)
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x7aadfa)
#1 0x00007f8c9ee72016 llvm::sys::RunSignalHandlers()
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x7a9016)
#2 0x00007f8c9ee7214b (/usr/lib/llvm-5.0/bin/../lib/
libLLVM-5.0.so.1+0x7a914b)
#3 0x00007f8ca1ce4150 __restore_rt (/lib/x86_64-linux-gnu/
libpthread.so.0+0x13150)
#4 0x00007f8c9faba6ea (/usr/lib/llvm-5.0/bin/../lib/
libLLVM-5.0.so.1+0x13f16ea)
#5 0x00007f8c9ef4ca28 llvm::FPPassManager::runOnFunction(llvm::Function&)
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883a28)
#6 0x00007f8c9ef4cb1e
llvm::legacy::FunctionPassManagerImpl::run(llvm::Function&)
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883b1e)
#7 0x00007f8c9ef4cf26 llvm::legacy::FunctionPassManager::run(llvm::Function&)
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883f26)
#8 0x00005614bea0f346 (opt+0x9e346)
#9 0x00007f8c9d9fc1c1 __libc_start_main /build/glibc-itYbWN/glibc-2.
26/csu/../csu/libc-start.c:342:0
#10 0x00005614bea1d44a (opt+0xac44a)
Stack dump:
0. Program arguments: opt -S test.bc -O3 -enable-coroutines -o optimized.ll
1. Running pass 'Lower early coroutine intrinsics' on function '@"tcl
::rectest1 -1929838593"'
Segmentation fault (core dumped)
$ opt -S test.ll -O3 -enable-coroutines -o optimized.ll
#0 0x00007fa40ff53dfa llvm::sys::PrintStackTrace(llvm::raw_ostream&)
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x7aadfa)
#1 0x00007fa40ff52016 llvm::sys::RunSignalHandlers()
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x7a9016)
#2 0x00007fa40ff5214b (/usr/lib/llvm-5.0/bin/../lib/
libLLVM-5.0.so.1+0x7a914b)
#3 0x00007fa412dc4150 __restore_rt (/lib/x86_64-linux-gnu/
libpthread.so.0+0x13150)
#4 0x00007fa410b99ca7 (/usr/lib/llvm-5.0/bin/../lib/
libLLVM-5.0.so.1+0x13f0ca7)
#5 0x00007fa41002ca28 llvm::FPPassManager::runOnFunction(llvm::Function&)
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883a28)
#6 0x00007fa41002cb1e
llvm::legacy::FunctionPassManagerImpl::run(llvm::Function&)
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883b1e)
#7 0x00007fa41002cf26 llvm::legacy::FunctionPassManager::run(llvm::Function&)
(/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883f26)
#8 0x000055a45a879346 (opt+0x9e346)
#9 0x00007fa40eadc1c1 __libc_start_main /build/glibc-itYbWN/glibc-2.
26/csu/../csu/libc-start.c:342:0
#10 0x000055a45a88744a (opt+0xac44a)
Stack dump:
0. Program arguments: opt -S test.ll -O3 -enable-coroutines -o optimized.ll
1. Running pass 'Lower early coroutine intrinsics' on function '@"tcl
::rectest1 -1929838593"'
Floating point exception (core dumped)
If I strip the debug information and then try to run the optimizer from
generated assembly language, things are happier. Coroutine operations have
been lowered. There are references to coroutine intrinsic functions, but no
calls to them.
$ opt -strip-debug test.bc -S -o test1.ll
$ opt -O3 -enable-coroutines -S test1.ll -o optimized.ll
$ grep llvm\\\.coro optimized.ll
declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*)
#17
declare i1 @llvm.coro.alloc(token) #4
declare i8* @llvm.coro.begin(token, i8* writeonly) #4
declare i8* @llvm.coro.subfn.addr(i8* nocapture readonly, i8) #17
Somewhat weirdly, I still get the crash if instead I work from bitcode with
the debug information stripped:
$ opt -strip-debug test.bc -o test1.bc
$ opt -O3 -enable-coroutines -S test1.bc -o optimized.ll
... stack trace snipped ....
Floating point exception (core dumped)
Damned if I know! It's particularly puzzling in light of the fact that
CoroEarly
doesn't actually do very much - and none of it appears to be floating point.
http://llvm.org/doxygen/CoroEarly_8cpp_source.html
For what it's worth:
$ fossil stat
repository: /home/kennykb/FOSSIL/tclquadcode.fossil
local-root: /home/kennykb/tcl-core/tclquadcode-nre/
config-db: /home/kennykb/.fossil
checkout: 1855f5dc7a0840f77943c5952cadcdc8aafd5192 2018-04-17 00:11:41
UTC
parent: 504ceb334bd134dbe77d61b508917bfec264bd37 2018-04-16 23:46:34
UTC
merged-from: f71cdfbdaecc0247e5991602514c58ba795a1ecb 2018-04-17 00:09:09
UTC
tags: kbk-nre
comment: Merge trunk: expunge last uses of 'uniqueUndef' in
constructing
phi's (user: kbk)
|
|
From: Kevin K. <kev...@gm...> - 2018-04-16 16:50:25
|
On Mon, Apr 16, 2018 at 8:24 AM, Donal K. Fellows <don...@ma...> wrote: >> I've probably overlooked something. This is where a second pair of >> eyeballs might come in handy. > > Indeed. (I'm guessing that the coro passes care because they're doing > surgery to the debug info. Which seems reasonable.) You're right, I was siphoning off the bitcode in LLVM::optimise (jit.tcl) in the 'dump-pre' step, which comes just before verifying the module. Once I'm in a place to do it, I'll move that after the 'verify' and report back. (It surely doesn't explain the crash in '$module optimize', which comes after verification and in any case finalizes the debug information, but getting the assembly and bitcode after the info is finalized might offer more insight.) |
|
From: Donal K. F. <don...@ma...> - 2018-04-16 12:24:50
|
On 16/04/2018 13:04, Kevin Kenny wrote: > CoroEarly gets the same crash from the public 'optimize' method, which > supposedly does the finalization, so I'm still confused. I knew about > the finalization requirement, and stuck a 'puts' next to the call, and > it's definitely being done. I'm not aware of changing anything in > 'jit.tcl' that would affect the finalization sequence. I'm pretty sure > that I'm seeing unfinished temporaries on the trunk as well. Unfinished debug temporaries are definitely a mark of the debug issuer still being alive; it's supposed to resolve them when it is destroyed. (The issue is that we're ultimately building circular references; the functions refer to the (list of) variables they contain, and the variables refer to their defining function.) > Peculiarly, only the coroutine passes seem to care. If I patch out > 'qsort', 'rectest' and 'mrtest' from tester.tcl, everything compiles in > spite of them. (The .ll code is still unusable, though.) > > I've probably overlooked something. This is where a second pair of > eyeballs might come in handy. Indeed. (I'm guessing that the coro passes care because they're doing surgery to the debug info. Which seems reasonable.) Donal. |
|
From: Kevin K. <kev...@gm...> - 2018-04-16 12:04:37
|
On 04/16/2018 04:31 AM, Donal K. Fellows wrote: > The variable list starts out as a temporary. It's probably[†] one of > the things that is only converted to a real value during finalisation > of the debug builder. That's done by calling the > *FinalizeDebuggingMetadata* method of module; after that, you can't > write more out. That in turn is called from the public *verify* and > *optimize* methods. I strongly suggest calling *verify* before getting > the *bitcode* of a module; it also catches quite a few other basic > errors and is pretty worthwhile. I'm not going to force finalization > before producing the LLVM IR (via *dump*) since that's a debugging > tool as well. > > Note that verifying a function is less effective and doesn't finalize > anything. (In particular, it doesn't say what's wrong; that's a fault > of the C API. I ought to fix that as there's never a good reason from > Tcl for a verifier to do anything other than produce a proper Tcl > error on failures. https://github.com/dkfellows/llvmtcl/issues/15 made > to track as the C++ API has no such restriction… Yet another example > of a poor-cousin partially-maintained C API causing us trouble. It's > not Jos's fault either; I get the impression that the LLVM C API > really doesn't get much love at all outside the most basic use cases.) CoroEarly gets the same crash from the public 'optimize' method, which supposedly does the finalization, so I'm still confused. I knew about the finalization requirement, and stuck a 'puts' next to the call, and it's definitely being done. I'm not aware of changing anything in 'jit.tcl' that would affect the finalization sequence. I'm pretty sure that I'm seeing unfinished temporaries on the trunk as well. Peculiarly, only the coroutine passes seem to care. If I patch out 'qsort', 'rectest' and 'mrtest' from tester.tcl, everything compiles in spite of them. (The .ll code is still unusable, though.) I've probably overlooked something. This is where a second pair of eyeballs might come in handy. |
|
From: Kevin K. <kev...@gm...> - 2018-04-16 11:49:20
|
On 04/16/2018 07:21 AM, Donal K. Fellows wrote:
> On 16/04/2018 03:15, Kevin Kenny wrote:
>> I have a sneaking suspicion that it's the unique undefs that are
>> confusing it - this is because that's really all that I see that's
>> different in the function that fails when I compare the input and
>> output of 'mem2reg'. Since I want to get rid of those anyway, because
>> that would make 'phi' generation a lot tidier, that may be what I work
>> on next.
>
> I commented earlier, but the issue there was simply that I needed a
> unique value (i.e., non-constant) that I could pass into
> ReplaceAllUsesWith to have substituted with something real. (Replacing
> all uses of a constant would “work”, but is a Really Bad Idea.) If
> we're going to put the variables into a coroutine state presentation,
> we don't actually need to do that; the location in the state
> presentation would be entirely adequate (and LLVM is pretty good about
> optimising those things given that it's a key for getting performance
> from C and C++ code).
>
> It was an awful hack to make things work. If you've found a way to do
> a better thing, that's nice.
The trick is that the basic blocks are presented to the code issuer in
topological order. There's a nice theorem that states that in SSA form,
all definitions of variables dominate their uses, except for phis. So
everything BUT a phi operation will have its inputs defined and won't
need ReplaceAllUsesWith. For phi operations, LLVM provides AddIncoming
to patch a value and source block into a phi after creating it. What the
fix does is to create the phi nodes without any data sources at all, but
just keep a list of {phi sources values}. Then, after processing all the
quadcode, the issuer goes back and patches in all the connections.
After I sent the emails last night, I added procedures called PHI and
FixupPhis to the build helper, so that the assembly code in the library
can just write
[PHI [Type int] {$value1 $value2} {$source1 $source2} "varName"]
without needing the sources and values defined at the time PHI is
executed. It's the same trick: issue an empty 'phi' node and patch the
data in at the end.
|
|
From: Donal K. F. <don...@ma...> - 2018-04-16 11:22:03
|
On 16/04/2018 03:15, Kevin Kenny wrote: > I have a sneaking suspicion that it's the unique undefs that are > confusing it - this is because that's really all that I see that's > different in the function that fails when I compare the input and > output of 'mem2reg'. Since I want to get rid of those anyway, because > that would make 'phi' generation a lot tidier, that may be what I work > on next. I commented earlier, but the issue there was simply that I needed a unique value (i.e., non-constant) that I could pass into ReplaceAllUsesWith to have substituted with something real. (Replacing all uses of a constant would “work”, but is a Really Bad Idea.) If we're going to put the variables into a coroutine state presentation, we don't actually need to do that; the location in the state presentation would be entirely adequate (and LLVM is pretty good about optimising those things given that it's a key for getting performance from C and C++ code). It was an awful hack to make things work. If you've found a way to do a better thing, that's nice. Donal. |
|
From: Donal K. F. <don...@ma...> - 2018-04-16 08:32:07
|
On 16/04/2018 03:20, Kevin Kenny wrote: > (a) The variables of "tcl.int.32" - which should be the single > parameter 'x' - are not encoded. > > (b) There is a temporary object at !15, in place of what should be the > variable collection. The variable list starts out as a temporary. It's probably[†] one of the things that is only converted to a real value during finalisation of the debug builder. That's done by calling the *FinalizeDebuggingMetadata* method of module; after that, you can't write more out. That in turn is called from the public *verify* and *optimize* methods. I strongly suggest calling *verify* before getting the *bitcode* of a module; it also catches quite a few other basic errors and is pretty worthwhile. I'm not going to force finalization before producing the LLVM IR (via *dump*) since that's a debugging tool as well. Note that verifying a function is less effective and doesn't finalize anything. (In particular, it doesn't say what's wrong; that's a fault of the C API. I ought to fix that as there's never a good reason from Tcl for a verifier to do anything other than produce a proper Tcl error on failures. https://github.com/dkfellows/llvmtcl/issues/15 made to track as the C++ API has no such restriction… Yet another example of a poor-cousin partially-maintained C API causing us trouble. It's not Jos's fault either; I get the impression that the LLVM C API really doesn't get much love at all outside the most basic use cases.) > (c) Attempting to parse this with > opt -S test.ll > yields: > /usr/lib/llvm-5.0/bin/opt: test.ll:12038:7: error: Expected '!' here > !15 = <temporary!> !{} Yes. If you're wanting to pass things to opt, try using the bitcode method of the module. Though in this case it probably still wouldn't work because the debug info builder wasn't finalized. :-) > (d) 'opt' does not object to reading the bitcode file and can produce > a .ll that parses correctly. The resulting .ll crashes in > CoroEarly with a FloatingPointException. > > $ opt -S test.bc > test2.ll > $ opt -S -O3 -enable-coroutines test2.ll > test3.ll > <crash!> > > (e) If all debug info is stripped from the bitcode in 'opt', then > there is no crash when attempting to optimize the module. > > $ opt -S -strip-debug test.bc -o test2.ll > $ opt -S -O3 -enable-coroutines test2.ll > test3.ll OK, then that's definitely a failure to finalize the debug info. I've seen that sort of thing before and it was much more of a PITA to debug then. :-) Donal. [† I forget exactly what there was that was doing this, but it was something like this. ] |
|
From: Donal K. F. <don...@ma...> - 2018-04-16 05:06:15
|
On 16/04/2018 03:20, Kevin Kenny wrote: > I've managed on trunk to get rid of all the initialization of quadcode > variables with unique undefs (I still need to work on this in the > libraries). That's curious, because it means that you have a different strategy for pinning variable uses and definitions together. The original issue was that we didn't guarantee that definitions would precede uses in the order of processing of basic blocks during code issue (as opposed to on the code path taken) and this would make the main compiler code as it always assumed that variable references would exist at the point of use. I solved it by actually initially setting each variable to undef, and I obtained a unique one of those by reading it from an undefined unique memory location (as using the undef constant resulted in catastrophic excessive unification). Donal. |
|
From: Kevin K. <kev...@gm...> - 2018-04-16 02:20:39
|
I've managed on trunk to get rid of all the initialization of quadcode
variables with unique undefs (I still need to work on this in the
libraries).
Unfortunately, it did nothing for buildability - although now mem2reg
doesn't alter the code of a simple NRE test proc. There's still the
segfault.
But now I've got another suspect: the debug information.
When I try simply to parse and reformat an LLVM assembly language
source dumped from llvmtcl, I'm seeing:
define hidden i32 @tcl.int.32(%INT %x) #0 !dbg !3 { ... }
And then the debug info has:
!3 = distinct !DISubprogram(name: "tcl.int.32", scope: !5, file: !4,
type: !6, isLocal: true, isDefinition: true, isOptimized: true, unit:
!1, variables: !15)
!4 = !DIFile(filename: "mathlib.tcl", directory:
"/home/kennykb/tcl-core/tclquadcode-nre/codegen")
!5 = !DINamespace(name: "tcl.support", scope: null)
!6 = !DISubroutineType(types: !7)
!7 = !{!8, !9}
!8 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!9 = !DICompositeType(tag: DW_TAG_structure_type, name: "INT", scope:
!11, file: !10, line: 1545, size: 24, elements: !12)
!10 = !DIFile(filename: "struct.tcl", directory:
"/home/kennykb/tcl-core/tclquadcode-nre/codegen")
!11 = !DINamespace(name: "types", scope: !5)
!12 = !{!13, !8, !14}
!13 = !DIBasicType(name: "bool", size: 1, encoding: DW_ATE_boolean)
!14 = !DIBasicType(name: "int64", size: 64, encoding: DW_ATE_signed)
!15 = <temporary!> !{}
Observations:
(a) The variables of "tcl.int.32" - which should be the single
parameter 'x' - are not encoded.
(b) There is a temporary object at !15, in place of what should be the
variable collection.
(c) Attempting to parse this with
opt -S test.ll
yields:
/usr/lib/llvm-5.0/bin/opt: test.ll:12038:7: error: Expected '!' here
!15 = <temporary!> !{}
(d) 'opt' does not object to reading the bitcode file and can produce
a .ll that parses correctly. The resulting .ll crashes in
CoroEarly with a FloatingPointException.
$ opt -S test.bc > test2.ll
$ opt -S -O3 -enable-coroutines test2.ll > test3.ll
<crash!>
(e) If all debug info is stripped from the bitcode in 'opt', then
there is no crash when attempting to optimize the module.
$ opt -S -strip-debug test.bc -o test2.ll
$ opt -S -O3 -enable-coroutines test2.ll > test3.ll
I have absolutely no idea why the variable list is showing up as
<temporary!>, or how to get it to be resolved to the actual list of
variables. So far, my Google-fu has been failing me as well.
I suspect that the <temporary!> thing is a bug in llvmtcl, but
the state of documentation of the debug info API is such that
tracking this down is going to be challenging.
In the course of attempting to track this down, I've discovered that
the inlining pass of LLVM inserts correct llvm.lifetime.start/end
intrinsic calls surrounding the bodies of inlined functions, so that
project is off my list. I still want to get rid of the rest of the
'unique undef' stuff, since that gives rise to extraordinarily
unreadable assembly code. I think I'll tackle that next.
|