You can subscribe to this list here.
2017 |
Jan
|
Feb
(2) |
Mar
(6) |
Apr
(4) |
May
(20) |
Jun
(15) |
Jul
(4) |
Aug
(2) |
Sep
(6) |
Oct
(6) |
Nov
(20) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2018 |
Jan
(16) |
Feb
(3) |
Mar
(7) |
Apr
(40) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
(1) |
2019 |
Jan
(7) |
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Kevin K. <kev...@gm...> - 2018-04-23 02:56:14
|
On Sun, Apr 22, 2018 at 9:42 PM, Kevin Kenny <kev...@gm...> wrote: > I see what's going on with the crash in 'mrtest::calc' - and why we've > gone so far astray in trying to diagnose it! > > I got a disassembly from the start of thunk::mrtest::calc(STRING), which > inlines the 'on-ramp' of the coroutine. It looks like: > > JIT(0x5555583b75e0)`cmd.thunk::mrtest::calc: > 0x7ffff7fef8e0 <+0>: pushq %r15 > 0x7ffff7fef8e2 <+2>: pushq %r14 > 0x7ffff7fef8e4 <+4>: pushq %rbx > 0x7ffff7fef8e5 <+5>: movq %rsi, %rax > > -> 1 # demo.tcl -- > > -> 0x7ffff7fef8e8 <+8>: cmpl $0x2, %edx > 0x7ffff7fef8eb <+11>: jne 0x7ffff7fef97d ; <+157> > > ** 2 # > 3 # Code that demonstrates the Tcl-to-LLVM compilation capabilities > of > 4 # tclquadcode. Most of this file is demonstration procedures that > we can > > 0x7ffff7fef8f1 <+17>: movq 0x8(%rcx), %r14 > > 4595 params size > 4596 build { > ** 4597 my ret [$api Tcl_Alloc $size] > 4598 } > 4599 > > 0x7ffff7fef8f5 <+21>: movabsq $0x7ffff7a651f0, %rax ; imm = > 0x7FFFF7A651F0 > 0x7ffff7fef8ff <+31>: movl $0x160, %edi ; imm = 0x160 > 0x7ffff7fef904 <+36>: callq *%rax > 0x7ffff7fef906 <+38>: movq %rax, %rbx > 0x7ffff7fef909 <+41>: movabsq $0x7ffff7ff1fa0, %rax ; imm = > 0x7FFFF7FF1FA0 > 0x7ffff7fef913 <+51>: movq %rax, (%rbx) > 0x7ffff7fef916 <+54>: movabsq $0x7ffff7ff3800, %rax ; imm = > 0x7FFFF7FF3800 > 0x7ffff7fef920 <+64>: movq %rax, 0x8(%rbx) > 0x7ffff7fef924 <+68>: movq %r14, 0x30(%rbx) > 0x7ffff7fef928 <+72>: movb $0x5, 0x28(%rbx) > > <+8> through <+11> are checking objc == 2 and jumping off to an error > routine for 'wrong # args'. > > <+21> through <+38> are allocating the coroutine activation record, which > is 0x160 bytes in size. > > <+41> through <+72> are creating the initial content of the activation > record, The layout of the > activation record is: > > +0 void (*resume)(FrameType*) The function that resumes the coroutine > +8 void (*destroy)(FrameType*) The function that destroys the coroutine > +16 ... The coroutine promise. The promise type is: > struct { > int32 status; Tcl result code from the last NR callback > struct { MAYBE INT return value from mrtest::calc > int1 flag; > int32 shortword; > int64 longword; > } retval; > +40? char resumePointIndex The index of the next resume point > +48? Tcl_Obj* x The 'x' parameter to the procedure > .... more stuff, not initialized..... > > The initialization code correctly initializes the resume pointer > <+41>-<+51>, and the destroy pointer <+54>-<+64>. Note the two addresses > that I have marked with ? in the layout of the activation record. The code > is confused about the size and alignment of the promise. The code from > 'coro.begin' appears to be presuming that its size is at most 24 bytes, and > has stored the index of the resume point at offset 40 (0x28) from the start > of the frame <+72>. The resume index is followed with the variable 'x' from > objv[1], extracted at <+17> and saved at offset 48 (0x30) at <+68>, > > The rest of the call thunk does two Tcl_NRAddCallback calls, one for the > return thunk and one for tcl.coro.runner, and returns to the trampoline. > > This all looks OK so far. > > I then see a normal call to tcl.coro.runner. The lead-in of that function > looks like: > > frame #0: 0x00007ffff7fef1b5 JIT(0x5555583b75e0)`tcl.coro.runner at > coro.tcl:57 > 54 build { > 55 > 56 # Get the coroutine handle from client data > -> 57 set coro.handle [my load $clientDataArray "coro.handle"] > 58 > 59 # First, has the NRE proc finished execution? If so, we simply > 60 # want to return back to the trampoline and execute the next > (lldb) di -m -f -c 20 > JIT(0x5555583b75e0)`tcl.coro.runner: > 0x7ffff7fef1b0 <+0>: pushq %rbp > 0x7ffff7fef1b1 <+1>: pushq %rbx > 0x7ffff7fef1b2 <+2>: pushq %rax > 0x7ffff7fef1b3 <+3>: movl %edx, %ebp > > -> 55 > -> 56 # Get the coroutine handle from client data > -> 57 set coro.handle [my load $clientDataArray "coro.handle"] > -> 58 > -> 59 # First, has the NRE proc finished execution? If so, we simply > > -> 0x7ffff7fef1b5 <+5>: movq (%rdi), %rbx > > 62 > 63 set llvm.coro.done [$m intrinsic coro.done] > ** 64 set done [my call ${llvm.coro.done} [list ${coro.handle}] > "doneFlag"] > > 0x7ffff7fef1b8 <+8>: cmpq $0x0, (%rbx) > > ** 65 my condBr $done $finished $needResume > 66 > 67 label needResume: > > 0x7ffff7fef1bc <+12>: je 0x7ffff7fef1ee ; <+62> > > 70 # that the next time it suspends, we'll loop back to here. > 71 > ** 72 $api Tcl_NRAddCallback $interp ${tcl.coro.runner} ${coro.handle} \ > 73 [my null char*] [my null char*] [my null char*] > 74 > > 0x7ffff7fef1be <+14>: movabsq $0x7ffff7fef1b0, %rax ; imm = > 0x7FFFF7FEF1B0 > 0x7ffff7fef1c8 <+24>: movabsq $0x7ffff7a612f0, %r10 ; imm = > 0x7FFFF7A612F0 > 0x7ffff7fef1d2 <+34>: xorl %ecx, %ecx > 0x7ffff7fef1d4 <+36>: xorl %r8d, %r8d > 0x7ffff7fef1d7 <+39>: xorl %r9d, %r9d > 0x7ffff7fef1da <+42>: movq %rsi, %rdi > 0x7ffff7fef1dd <+45>: movq %rax, %rsi > 0x7ffff7fef1e0 <+48>: movq %rbx, %rdx > 0x7ffff7fef1e3 <+51>: callq *%r10 > > 79 set llvm.coro.promise [$m intrinsic coro.promise] > 80 set promise.addr.raw \ > ** 81 [my call ${llvm.coro.promise} \ > 82 [list ${coro.handle} \ > 83 [Const $alignment int32] \ > > 0x7ffff7fef1e6 <+54>: movl %ebp, 0x10(%rbx) > > 85 "promise.addr.raw"] > 86 set promise.addr [my cast(ptr) ${promise.addr.raw} int32 > "promise.addr"] > ** 87 my store $result ${promise.addr} > 88 > 89 # Resume the coroutine, and return to the trampoline to await > > 0x7ffff7fef1e9 <+57>: movq %rbx, %rdi > 0x7ffff7fef1ec <+60>: callq *(%rbx) > > The entry sequence <+0>-<+2>, the copy of the Tcl status from the third > arg to the callback <+3>, the extraction of the frame pointer from the > client data array <+5>, the check for 'done' <+8>-<+12>, and the > rescheduling of Tcl_NRAddCallback <+14>-<+51> all happen normally. The Tcl > status gets moved to the start of the promise <+54>, and then <+57>-<+60> > the actual body of 'mrtest::calc' is invoked. > > The lead-in of that procedure looks like: > > JIT(0x5555583b75e0)`tcl ::mrtest::calc -1929838593.resume: > -> 0x7ffff7ff1fa0 <+0>: pushq %rbp > 0x7ffff7ff1fa1 <+1>: pushq %r15 > 0x7ffff7ff1fa3 <+3>: pushq %r14 > 0x7ffff7ff1fa5 <+5>: pushq %r13 > 0x7ffff7ff1fa7 <+7>: pushq %r12 > 0x7ffff7ff1fa9 <+9>: pushq %rbx > 0x7ffff7ff1faa <+10>: subq $0x38, %rsp > 0x7ffff7ff1fae <+14>: movq %rdi, %r12 > 0x7ffff7ff1fb1 <+17>: movb 0x30(%r12), %al > 0x7ffff7ff1fb6 <+22>: decb %al > 0x7ffff7ff1fb8 <+24>: cmpb $0x4, %al > 0x7ffff7ff1fba <+26>: ja 0x7ffff7ff20c8 ; <+296> > 0x7ffff7ff1fc0 <+32>: movzbl %al, %eax > 0x7ffff7ff1fc3 <+35>: movabsq $0x7ffff7fe0c60, %rcx ; imm = > 0x7FFFF7FE0C60 > 0x7ffff7ff1fcd <+45>: jmpq *(%rcx,%rax,8) > > <+0>-<+10> are creating a stack frame to hold temporaries that do not live > across coroutine suspension. > <+14> is recovering the frame pointer that was created in the call thunk, > and it's indeed at the same address and has the same content. > Now <+17> is recovering the resume index from the coroutine frame. THIS > INDEX WAS SAVED AT OFFSET +0X28 IN THE FRAME, BUT IS BEING RECOVERED FROM > OFFSET 0X30 > > Continuing: The code is written to expect the index initially to be 0x5, or that's at least what the thunk put there. Instead, it's getting the least significant byte of the Tcl_Obj* that represents the 'x' argument, which is what is actually stored at offset 0x30 in the coroutine frame. This is a pretty arbitrary value, On my most recent test run, it happens to have come out to be '0xf0'. This is decremented to '0xef', which is above 4, so instead of jumping to the start of the procedure, it is jumping to the case for the value 5, which is label %AfterCoroSuspend892 in the dump of the optimized code. I haven't followed the logic very far from there, but it's the wrong switch case, and it's starting out by loading from uninitialized memory in the coroutine frame, so Here Be Nasal Daemons. So -- how does this come about? What I see in the optimized LLVM assembly is: define internal hidden fastcc void @"tcl ::mrtest::calc -1929838593.resume"(%"tcl ::mrtest::calc -1929838593.Frame"* %FramePtr) !dbg !12517 { entry.resume: %index.addr = getelementptr inbounds %"tcl ::mrtest::calc -1929838593.Frame", %"tcl ::mrtest::calc -1929838593.Frame"* %FramePtr, i64 0, i32 3 %index = load i3, i3* %index.addr, align 1 switch i3 %index, label %unreachable [ i3 0, label %pc.98 i3 1, label %pc.169 i3 2, label %pc.240 i3 3, label %pc.302 i3 -4, label %pc.408 i3 -3, label %AfterCoroSuspend892 ] Instead of getting the index by character offset, as the thunk did, it's getting it from the %"tcl ::mrtest::calc -1929838593.Frame" type. The layout of this is %"tcl ::mrtest::calc -1929838593.Frame" = type { void (%"tcl ::mrtest::calc -1929838593.Frame"*)*, ; resume address void (%"tcl ::mrtest::calc -1929838593.Frame"*)*, ; destroy address %"tcl ::mrtest::calc -1929838593.promise", ; promise i3, ; resume index ; ...followed by lots more stuff: %Tcl_Obj*, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, i32, %Tcl_Obj**, %Interp*, %Tcl_Obj*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*, i8*, %Tcl_Obj*, i8*, %Tcl_Obj*, %Tcl_Obj*, i8* } The promise is: %"tcl ::mrtest::calc -1929838593.promise" = type { i32, { i32, %INT } } And of course %INT is %INT = type { i1, i32, i64 } It would appear to me that the coro.begin logic in the thunk and the 'getelementptr' logic within the coroutine differ on the size and alignment of the promise, causing all the trouble. Unfortunately, it's not clear to me how to get out of this gracefully. I don't know whether it would help to allocate the promise simply as a character array of the requisite size (rounded to a multiple of 2*sizeof(pointer)) - but how to determine what that size needs to be? Alternatively, we'd need to figure out how the size computation is going astray. One thing that's suggestive is that https://llvm.org/docs/Coroutines.html#areas-requiring-attention shows 6. Alignment is ignored by coro.begin and coro.free intrinsics. which seems to be what's happening here. I suppose that the next thing to try would be to overallocate a character array for the promise. It appears that the only types that are legitimate for the result are: CALLFRAME? FAIL? IMPURE? ( ZEROONE | INT | DOUBLE | NUMERIC | STRING ) The worst case should be FAIL IMPURE NUMERIC, which would need: - 8 bytes (int32 + structure alignment padding) for FAIL - 16 bytes (int1 + Tcl_Obj* + structure alignment padding) for IMPURE - 16 bytes for the INT inside the NUMERIC (int1 + int32 + int64 + structure alignment padding - 32 bytes, therefore, for the NUMERIC (int1 + INT + double + structure alignment padding) In addition, there's the 'int32' at the start that carries the Tcl result that arrives at NR callbacks, again with padding for structure alignment. So 8 (Tcl status) + 8 (FAIL) + 16 (IMPURE) + 32 (NUMERIC) would give us 64 bytes for the largest promise. If we allocated this much always (16-byte aligned), carried it around as a character array for LLVM's purposes and bitcast it to the correct structure only in our own code, it might start working. But it's too late tonight to try that change. I welcome other suggestions! |
From: Kevin K. <kev...@gm...> - 2018-04-23 01:43:56
|
Sorry about the truncated message - some key combination sent it prematurely, I'll retransmit once I've finished typing it. On Sun, Apr 22, 2018 at 9:42 PM, Kevin Kenny <kev...@gm...> wrote: > I see what's going on with the crash in 'mrtest::calc' - and why we've > gone so far astray in trying to diagnose it! > > I got a disassembly from the start of thunk::mrtest::calc(STRING), which > inlines the 'on-ramp' of the coroutine. It looks like: > > JIT(0x5555583b75e0)`cmd.thunk::mrtest::calc: > 0x7ffff7fef8e0 <+0>: pushq %r15 > 0x7ffff7fef8e2 <+2>: pushq %r14 > 0x7ffff7fef8e4 <+4>: pushq %rbx > 0x7ffff7fef8e5 <+5>: movq %rsi, %rax > > -> 1 # demo.tcl -- > > -> 0x7ffff7fef8e8 <+8>: cmpl $0x2, %edx > 0x7ffff7fef8eb <+11>: jne 0x7ffff7fef97d ; <+157> > > ** 2 # > 3 # Code that demonstrates the Tcl-to-LLVM compilation capabilities > of > 4 # tclquadcode. Most of this file is demonstration procedures that > we can > > 0x7ffff7fef8f1 <+17>: movq 0x8(%rcx), %r14 > > 4595 params size > 4596 build { > ** 4597 my ret [$api Tcl_Alloc $size] > 4598 } > 4599 > > 0x7ffff7fef8f5 <+21>: movabsq $0x7ffff7a651f0, %rax ; imm = > 0x7FFFF7A651F0 > 0x7ffff7fef8ff <+31>: movl $0x160, %edi ; imm = 0x160 > 0x7ffff7fef904 <+36>: callq *%rax > 0x7ffff7fef906 <+38>: movq %rax, %rbx > 0x7ffff7fef909 <+41>: movabsq $0x7ffff7ff1fa0, %rax ; imm = > 0x7FFFF7FF1FA0 > 0x7ffff7fef913 <+51>: movq %rax, (%rbx) > 0x7ffff7fef916 <+54>: movabsq $0x7ffff7ff3800, %rax ; imm = > 0x7FFFF7FF3800 > 0x7ffff7fef920 <+64>: movq %rax, 0x8(%rbx) > 0x7ffff7fef924 <+68>: movq %r14, 0x30(%rbx) > 0x7ffff7fef928 <+72>: movb $0x5, 0x28(%rbx) > > <+8> through <+11> are checking objc == 2 and jumping off to an error > routine for 'wrong # args'. > > <+21> through <+38> are allocating the coroutine activation record, which > is 0x160 bytes in size. > > <+41> through <+72> are creating the initial content of the activation > record, The layout of the > activation record is: > > +0 void (*resume)(FrameType*) The function that resumes the coroutine > +8 void (*destroy)(FrameType*) The function that destroys the coroutine > +16 ... The coroutine promise. The promise type is: > struct { > int32 status; Tcl result code from the last NR callback > struct { MAYBE INT return value from mrtest::calc > int1 flag; > int32 shortword; > int64 longword; > } retval; > +40? char resumePointIndex The index of the next resume point > +48? Tcl_Obj* x The 'x' parameter to the procedure > .... more stuff, not initialized..... > > The initialization code correctly initializes the resume pointer > <+41>-<+51>, and the destroy pointer <+54>-<+64>. Note the two addresses > that I have marked with ? in the layout of the activation record. The code > is confused about the size and alignment of the promise. The code from > 'coro.begin' appears to be presuming that its size is at most 24 bytes, and > has stored the index of the resume point at offset 40 (0x28) from the start > of the frame <+72>. The resume index is followed with the variable 'x' from > objv[1], extracted at <+17> and saved at offset 48 (0x30) at <+68>, > > The rest of the call thunk does two Tcl_NRAddCallback calls, one for the > return thunk and one for tcl.coro.runner, and returns to the trampoline. > > This all looks OK so far. > > I then see a normal call to tcl.coro.runner. The lead-in of that function > looks like: > > frame #0: 0x00007ffff7fef1b5 JIT(0x5555583b75e0)`tcl.coro.runner at > coro.tcl:57 > 54 build { > 55 > 56 # Get the coroutine handle from client data > -> 57 set coro.handle [my load $clientDataArray "coro.handle"] > 58 > 59 # First, has the NRE proc finished execution? If so, we simply > 60 # want to return back to the trampoline and execute the next > (lldb) di -m -f -c 20 > JIT(0x5555583b75e0)`tcl.coro.runner: > 0x7ffff7fef1b0 <+0>: pushq %rbp > 0x7ffff7fef1b1 <+1>: pushq %rbx > 0x7ffff7fef1b2 <+2>: pushq %rax > 0x7ffff7fef1b3 <+3>: movl %edx, %ebp > > -> 55 > -> 56 # Get the coroutine handle from client data > -> 57 set coro.handle [my load $clientDataArray "coro.handle"] > -> 58 > -> 59 # First, has the NRE proc finished execution? If so, we simply > > -> 0x7ffff7fef1b5 <+5>: movq (%rdi), %rbx > > 62 > 63 set llvm.coro.done [$m intrinsic coro.done] > ** 64 set done [my call ${llvm.coro.done} [list ${coro.handle}] > "doneFlag"] > > 0x7ffff7fef1b8 <+8>: cmpq $0x0, (%rbx) > > ** 65 my condBr $done $finished $needResume > 66 > 67 label needResume: > > 0x7ffff7fef1bc <+12>: je 0x7ffff7fef1ee ; <+62> > > 70 # that the next time it suspends, we'll loop back to here. > 71 > ** 72 $api Tcl_NRAddCallback $interp ${tcl.coro.runner} ${coro.handle} \ > 73 [my null char*] [my null char*] [my null char*] > 74 > > 0x7ffff7fef1be <+14>: movabsq $0x7ffff7fef1b0, %rax ; imm = > 0x7FFFF7FEF1B0 > 0x7ffff7fef1c8 <+24>: movabsq $0x7ffff7a612f0, %r10 ; imm = > 0x7FFFF7A612F0 > 0x7ffff7fef1d2 <+34>: xorl %ecx, %ecx > 0x7ffff7fef1d4 <+36>: xorl %r8d, %r8d > 0x7ffff7fef1d7 <+39>: xorl %r9d, %r9d > 0x7ffff7fef1da <+42>: movq %rsi, %rdi > 0x7ffff7fef1dd <+45>: movq %rax, %rsi > 0x7ffff7fef1e0 <+48>: movq %rbx, %rdx > 0x7ffff7fef1e3 <+51>: callq *%r10 > > 79 set llvm.coro.promise [$m intrinsic coro.promise] > 80 set promise.addr.raw \ > ** 81 [my call ${llvm.coro.promise} \ > 82 [list ${coro.handle} \ > 83 [Const $alignment int32] \ > > 0x7ffff7fef1e6 <+54>: movl %ebp, 0x10(%rbx) > > 85 "promise.addr.raw"] > 86 set promise.addr [my cast(ptr) ${promise.addr.raw} int32 > "promise.addr"] > ** 87 my store $result ${promise.addr} > 88 > 89 # Resume the coroutine, and return to the trampoline to await > > 0x7ffff7fef1e9 <+57>: movq %rbx, %rdi > 0x7ffff7fef1ec <+60>: callq *(%rbx) > > The entry sequence <+0>-<+2>, the copy of the Tcl status from the third > arg to the callback <+3>, the extraction of the frame pointer from the > client data array <+5>, the check for 'done' <+8>-<+12>, and the > rescheduling of Tcl_NRAddCallback <+14>-<+51> all happen normally. The Tcl > status gets moved to the start of the promise <+54>, and then <+57>-<+60> > the actual body of 'mrtest::calc' is invoked. > > The lead-in of that procedure looks like: > > JIT(0x5555583b75e0)`tcl ::mrtest::calc -1929838593.resume: > -> 0x7ffff7ff1fa0 <+0>: pushq %rbp > 0x7ffff7ff1fa1 <+1>: pushq %r15 > 0x7ffff7ff1fa3 <+3>: pushq %r14 > 0x7ffff7ff1fa5 <+5>: pushq %r13 > 0x7ffff7ff1fa7 <+7>: pushq %r12 > 0x7ffff7ff1fa9 <+9>: pushq %rbx > 0x7ffff7ff1faa <+10>: subq $0x38, %rsp > 0x7ffff7ff1fae <+14>: movq %rdi, %r12 > 0x7ffff7ff1fb1 <+17>: movb 0x30(%r12), %al > 0x7ffff7ff1fb6 <+22>: decb %al > 0x7ffff7ff1fb8 <+24>: cmpb $0x4, %al > 0x7ffff7ff1fba <+26>: ja 0x7ffff7ff20c8 ; <+296> > 0x7ffff7ff1fc0 <+32>: movzbl %al, %eax > 0x7ffff7ff1fc3 <+35>: movabsq $0x7ffff7fe0c60, %rcx ; imm = > 0x7FFFF7FE0C60 > 0x7ffff7ff1fcd <+45>: jmpq *(%rcx,%rax,8) > > <+0>-<+10> are creating a stack frame to hold temporaries that do not live > across coroutine suspension. > <+14> is recovering the frame pointer that was created in the call thunk, > and it's indeed at the same address and has the same content. > Now <+17> is recovering the resume index from the coroutine frame. THIS > INDEX WAS SAVED AT OFFSET +0X28 IN THE FRAME, BUT IS BEING RECOVERED FROM > OFFSET 0X30 > > |
From: Kevin K. <kev...@gm...> - 2018-04-23 01:42:59
|
I see what's going on with the crash in 'mrtest::calc' - and why we've gone so far astray in trying to diagnose it! I got a disassembly from the start of thunk::mrtest::calc(STRING), which inlines the 'on-ramp' of the coroutine. It looks like: JIT(0x5555583b75e0)`cmd.thunk::mrtest::calc: 0x7ffff7fef8e0 <+0>: pushq %r15 0x7ffff7fef8e2 <+2>: pushq %r14 0x7ffff7fef8e4 <+4>: pushq %rbx 0x7ffff7fef8e5 <+5>: movq %rsi, %rax -> 1 # demo.tcl -- -> 0x7ffff7fef8e8 <+8>: cmpl $0x2, %edx 0x7ffff7fef8eb <+11>: jne 0x7ffff7fef97d ; <+157> ** 2 # 3 # Code that demonstrates the Tcl-to-LLVM compilation capabilities of 4 # tclquadcode. Most of this file is demonstration procedures that we can 0x7ffff7fef8f1 <+17>: movq 0x8(%rcx), %r14 4595 params size 4596 build { ** 4597 my ret [$api Tcl_Alloc $size] 4598 } 4599 0x7ffff7fef8f5 <+21>: movabsq $0x7ffff7a651f0, %rax ; imm = 0x7FFFF7A651F0 0x7ffff7fef8ff <+31>: movl $0x160, %edi ; imm = 0x160 0x7ffff7fef904 <+36>: callq *%rax 0x7ffff7fef906 <+38>: movq %rax, %rbx 0x7ffff7fef909 <+41>: movabsq $0x7ffff7ff1fa0, %rax ; imm = 0x7FFFF7FF1FA0 0x7ffff7fef913 <+51>: movq %rax, (%rbx) 0x7ffff7fef916 <+54>: movabsq $0x7ffff7ff3800, %rax ; imm = 0x7FFFF7FF3800 0x7ffff7fef920 <+64>: movq %rax, 0x8(%rbx) 0x7ffff7fef924 <+68>: movq %r14, 0x30(%rbx) 0x7ffff7fef928 <+72>: movb $0x5, 0x28(%rbx) <+8> through <+11> are checking objc == 2 and jumping off to an error routine for 'wrong # args'. <+21> through <+38> are allocating the coroutine activation record, which is 0x160 bytes in size. <+41> through <+72> are creating the initial content of the activation record, The layout of the activation record is: +0 void (*resume)(FrameType*) The function that resumes the coroutine +8 void (*destroy)(FrameType*) The function that destroys the coroutine +16 ... The coroutine promise. The promise type is: struct { int32 status; Tcl result code from the last NR callback struct { MAYBE INT return value from mrtest::calc int1 flag; int32 shortword; int64 longword; } retval; +40? char resumePointIndex The index of the next resume point +48? Tcl_Obj* x The 'x' parameter to the procedure .... more stuff, not initialized..... The initialization code correctly initializes the resume pointer <+41>-<+51>, and the destroy pointer <+54>-<+64>. Note the two addresses that I have marked with ? in the layout of the activation record. The code is confused about the size and alignment of the promise. The code from 'coro.begin' appears to be presuming that its size is at most 24 bytes, and has stored the index of the resume point at offset 40 (0x28) from the start of the frame <+72>. The resume index is followed with the variable 'x' from objv[1], extracted at <+17> and saved at offset 48 (0x30) at <+68>, The rest of the call thunk does two Tcl_NRAddCallback calls, one for the return thunk and one for tcl.coro.runner, and returns to the trampoline. This all looks OK so far. I then see a normal call to tcl.coro.runner. The lead-in of that function looks like: frame #0: 0x00007ffff7fef1b5 JIT(0x5555583b75e0)`tcl.coro.runner at coro.tcl:57 54 build { 55 56 # Get the coroutine handle from client data -> 57 set coro.handle [my load $clientDataArray "coro.handle"] 58 59 # First, has the NRE proc finished execution? If so, we simply 60 # want to return back to the trampoline and execute the next (lldb) di -m -f -c 20 JIT(0x5555583b75e0)`tcl.coro.runner: 0x7ffff7fef1b0 <+0>: pushq %rbp 0x7ffff7fef1b1 <+1>: pushq %rbx 0x7ffff7fef1b2 <+2>: pushq %rax 0x7ffff7fef1b3 <+3>: movl %edx, %ebp -> 55 -> 56 # Get the coroutine handle from client data -> 57 set coro.handle [my load $clientDataArray "coro.handle"] -> 58 -> 59 # First, has the NRE proc finished execution? If so, we simply -> 0x7ffff7fef1b5 <+5>: movq (%rdi), %rbx 62 63 set llvm.coro.done [$m intrinsic coro.done] ** 64 set done [my call ${llvm.coro.done} [list ${coro.handle}] "doneFlag"] 0x7ffff7fef1b8 <+8>: cmpq $0x0, (%rbx) ** 65 my condBr $done $finished $needResume 66 67 label needResume: 0x7ffff7fef1bc <+12>: je 0x7ffff7fef1ee ; <+62> 70 # that the next time it suspends, we'll loop back to here. 71 ** 72 $api Tcl_NRAddCallback $interp ${tcl.coro.runner} ${coro.handle} \ 73 [my null char*] [my null char*] [my null char*] 74 0x7ffff7fef1be <+14>: movabsq $0x7ffff7fef1b0, %rax ; imm = 0x7FFFF7FEF1B0 0x7ffff7fef1c8 <+24>: movabsq $0x7ffff7a612f0, %r10 ; imm = 0x7FFFF7A612F0 0x7ffff7fef1d2 <+34>: xorl %ecx, %ecx 0x7ffff7fef1d4 <+36>: xorl %r8d, %r8d 0x7ffff7fef1d7 <+39>: xorl %r9d, %r9d 0x7ffff7fef1da <+42>: movq %rsi, %rdi 0x7ffff7fef1dd <+45>: movq %rax, %rsi 0x7ffff7fef1e0 <+48>: movq %rbx, %rdx 0x7ffff7fef1e3 <+51>: callq *%r10 79 set llvm.coro.promise [$m intrinsic coro.promise] 80 set promise.addr.raw \ ** 81 [my call ${llvm.coro.promise} \ 82 [list ${coro.handle} \ 83 [Const $alignment int32] \ 0x7ffff7fef1e6 <+54>: movl %ebp, 0x10(%rbx) 85 "promise.addr.raw"] 86 set promise.addr [my cast(ptr) ${promise.addr.raw} int32 "promise.addr"] ** 87 my store $result ${promise.addr} 88 89 # Resume the coroutine, and return to the trampoline to await 0x7ffff7fef1e9 <+57>: movq %rbx, %rdi 0x7ffff7fef1ec <+60>: callq *(%rbx) The entry sequence <+0>-<+2>, the copy of the Tcl status from the third arg to the callback <+3>, the extraction of the frame pointer from the client data array <+5>, the check for 'done' <+8>-<+12>, and the rescheduling of Tcl_NRAddCallback <+14>-<+51> all happen normally. The Tcl status gets moved to the start of the promise <+54>, and then <+57>-<+60> the actual body of 'mrtest::calc' is invoked. The lead-in of that procedure looks like: JIT(0x5555583b75e0)`tcl ::mrtest::calc -1929838593.resume: -> 0x7ffff7ff1fa0 <+0>: pushq %rbp 0x7ffff7ff1fa1 <+1>: pushq %r15 0x7ffff7ff1fa3 <+3>: pushq %r14 0x7ffff7ff1fa5 <+5>: pushq %r13 0x7ffff7ff1fa7 <+7>: pushq %r12 0x7ffff7ff1fa9 <+9>: pushq %rbx 0x7ffff7ff1faa <+10>: subq $0x38, %rsp 0x7ffff7ff1fae <+14>: movq %rdi, %r12 0x7ffff7ff1fb1 <+17>: movb 0x30(%r12), %al 0x7ffff7ff1fb6 <+22>: decb %al 0x7ffff7ff1fb8 <+24>: cmpb $0x4, %al 0x7ffff7ff1fba <+26>: ja 0x7ffff7ff20c8 ; <+296> 0x7ffff7ff1fc0 <+32>: movzbl %al, %eax 0x7ffff7ff1fc3 <+35>: movabsq $0x7ffff7fe0c60, %rcx ; imm = 0x7FFFF7FE0C60 0x7ffff7ff1fcd <+45>: jmpq *(%rcx,%rax,8) <+0>-<+10> are creating a stack frame to hold temporaries that do not live across coroutine suspension. <+14> is recovering the frame pointer that was created in the call thunk, and it's indeed at the same address and has the same content. Now <+17> is recovering the resume index from the coroutine frame. THIS INDEX WAS SAVED AT OFFSET +0X28 IN THE FRAME, BUT IS BEING RECOVERED FROM OFFSET 0X30 |
From: Kevin K. <kev...@gm...> - 2018-04-22 04:51:41
|
I found one case where the offending 'llvm.trap' is being generated on procedure exit. Even though the value coming into 'procReturn' (or ${tcl.procedure.return}) is a FAIL, that does not guarantee that the Tcl result code coming out of it is still nonzero! Look at https://core.tcl-lang.org/tclquadcode/artifact?udc=1&ln=4237-4240&name=a73528563f If some inner context has executed [return -level N somevalue], and we're now N levels out from that context, the call to TclUpdateReturnInfo in 'handleReturn' will return TCL_OK, and that code will now leave us with a FAIL object and an OK status code. That looks to be the only path that falls into the one particular trap that I traced back from. Clearly, nothing in the procs is doing [return -level], so that particular trap is likely not the issue, but we surely need to address it anyway. Any ideas what you want to do about it? It appears to me that if a procedure might possibly have a [return -level n] escape from it, then its nth-level caller needs to have the possibility of the [return -level] accounted for. Since it's a fairly unusual operation, I'd be comfortable with saying that if 'return -level' is present at a level of other than 0 or 1, treat the proc and all callers as returning FAIL STRING. We could, of course, attempt bookkeeping for other 'return -level' operations, but that's for when we really start worrying about compilation of user-defined control structures - in other words, after we've worked out inlining to the level where we can do simple things like [uplevel 1 $arg]. That's not the immediate issue at hand. A naked FAIL that is actually a 'success' is not much of a problem until it comes time to widen it to a FAIL SOMETHING - which, in the idea that I'm thinking of, would always be FAIL STRING. At the time we're raising FAIL to FAIL STRING, we should be processing the result of an operation that always throws, and so we should still have the interpreter's result available. That would mean that widening from FAIL to FAIL STRING will need to fetch the interpreter result, bump the ref count and stash it in the FAIL STRING value, along with the TCL_OK status. Am I totally mad here? (Wait. I know I'm totally mad. Is the *idea* totally mad?) |
From: Donal K. F. <don...@ma...> - 2018-04-21 13:27:11
|
On 20/04/2018 15:49, Kevin Kenny wrote: > Also the header of ${tcl.procedure.return} surely is describing its > return value incorrectly. The mapping of a pure FAIL is a plain int32 containing just the Tcl result code. The main complication at that point is that the result in the Tcl_Interp may be updated due to processing a [return -level $x $y] and… oh, yes, that could be very buggy. :-) The documentation of procLeave(INT,STRING) in build.tcl is more thoroughly misleading. Donal. |
From: Kevin K. <kev...@gm...> - 2018-04-20 14:49:26
|
meant to copy list, to capture 'developer stream of consciousness' ---------- Forwarded message ---------- From: Kevin Kenny <kev...@gm...> Date: Fri, Apr 20, 2018 at 10:42 AM Subject: Re: [tclquadcode] Still groping forward on the crash in CoroEarly To: "Donal K. Fellows" <don...@ma...> On Fri, Apr 20, 2018 at 9:43 AM, Donal K. Fellows < don...@ma...> wrote: Poking at a core dump with lldb after building the offending code as a > separate DLL (as that gives me some debugging info!) I see this instruction > sequence leading up the point the crash happened (+276) inside 'tcl > ::mrtest::calc -1929838593.resume': > > 0x1049aeb9c <+204>: callq 0x1049b0e04 ; symbol stub > for: Tcl_AppendObjToErrorInfo > 0x1049aeba1 <+209>: andl $-0x5, %ebx > 0x1049aeba4 <+212>: movl %ebx, 0x1a8(%r13) > 0x1049aebab <+219>: movl 0x4(%rsp), %ebx > 0x1049aebaf <+223>: movq 0x110(%rbp), %rdi > 0x1049aebb6 <+230>: movl (%rdi), %eax > 0x1049aebb8 <+232>: cmpl $0x1, %eax > 0x1049aebbb <+235>: jg 0x1049af08b ; <+1467> > 0x1049aebc1 <+241>: callq 0x1049b0df2 ; symbol stub > for: TclFreeObj > 0x1049aebc6 <+246>: movq 0x118(%rbp), %rdi > 0x1049aebcd <+253>: movl (%rdi), %eax > 0x1049aebcf <+255>: cmpl $0x1, %eax > 0x1049aebd2 <+258>: jle 0x1049aee57 ; <+903> > 0x1049aebd8 <+264>: jmp 0x1049aef77 ; <+1191> > 0x1049aebdd <+269>: movq 0x108(%rbp), %rdi > -> 0x1049aebe4 <+276>: movl 0x14(%rdi), %r12d > 0x1049aebe8 <+280>: movq 0x20(%rdi), %r15 > > I'm guessing that the crash area correlates to the optimised IR code: > > action.required.i162: ; preds = > %tcl.list.index1.exit > %tmp.1074 = extractvalue { i32, %Tcl_Obj* } %tmp.107130, 1, !dbg !13055 > %refCount.i.i157 = getelementptr %Tcl_Obj, %Tcl_Obj* %tmp.1074, i64 0, > i32 0, !dbg !13080 > %refCount1.i.i158 = load i32, i32* %refCount.i.i157, align 4, !dbg > !13083 > %15 = add nsw i32 %refCount1.i.i158, 1, !dbg !13083 > store i32 %15, i32* %refCount.i.i157, align 4, !dbg !13083 > %16 = icmp slt i32 %refCount1.i.i158, 1, !dbg !13084 > br i1 %16, label %freeObject.i.i163, label %leave.i.i164, !dbg !13084 > > Decoding the debug metadata, this appears to be part of the inlining of > tcl.procedure.return within tcl ::mrtest::rfib 252, and I've got lost after > that point. :-) > Me, too! Some weird things happen when LLVM inlines coroutine calls. It looks very much like there's a problem with the preconditions for the > calling of tcl.procedure.return. I've also no idea why we're using an NRE > call for mrtest in the first place; that code doesn't have a 'yield' or > 'coroutine' anywhere near it. > I've been using NRE at the moment for all recursive procedures - that follows Tcl's practice, so as not to blow the C stack. Supporting tailcall and coroutines was Miguel's offer to offset the cost of allocating activation records on the heap, but he told me that his primary motivation was to get Tcl's recursion off the C stack so that large numbers of threads become manageable. Another thing that will require NRE (although that's not quite right in the implementation yet) is any of the I/O calls. This is done specifically because of the existence of the 'coroutine' package in tcllib and of stacked channels. In the presence of channel stacking, it's highly likely that a call such as 'read' or 'write' will yield until completion, and I surely can't tell what the underlying I/O system will be at compile time! I think you're getting farther than I am. I may be deceiving myself, but at least I *was* fairly certain that I was not successfully resuming 'calc' after the initial suspend. With you building a DLL while I'm running JIT, I'm also not sure that we're seeing entirely the same set of passes. Are rectest* and qsort now working for you? They also appear to have the offending return sequences, but don't fail for me now. Anyway, this is indeed a place to look further. I'm trying to look upstream in quadcode, too, for where I might be violating a precondition of 'procReturn' - what are its preconditions, anyway? Also the header of ${tcl.procedure.return} surely is describing its return value incorrectly. |
From: Donal K. F. <don...@ma...> - 2018-04-20 13:43:42
|
On 20/04/2018 05:09, Kevin Kenny wrote: > The resume proc, mrtest::calc(STRING).resume, begins with a short > codeburst that LLVM inserts; I can't instrument this readily, or at > least I don't know how to do it. > > I had a printf inserted immediately following the call to > llvm.coro.suspend in the entry block of mrtest::calc. That printf was > never reached. > > And that's as far as I can get tonight. Poking at a core dump with lldb after building the offending code as a separate DLL (as that gives me some debugging info!) I see this instruction sequence leading up the point the crash happened (+276) inside 'tcl ::mrtest::calc -1929838593.resume': 0x1049aeb9c <+204>: callq 0x1049b0e04 ; symbol stub for: Tcl_AppendObjToErrorInfo 0x1049aeba1 <+209>: andl $-0x5, %ebx 0x1049aeba4 <+212>: movl %ebx, 0x1a8(%r13) 0x1049aebab <+219>: movl 0x4(%rsp), %ebx 0x1049aebaf <+223>: movq 0x110(%rbp), %rdi 0x1049aebb6 <+230>: movl (%rdi), %eax 0x1049aebb8 <+232>: cmpl $0x1, %eax 0x1049aebbb <+235>: jg 0x1049af08b ; <+1467> 0x1049aebc1 <+241>: callq 0x1049b0df2 ; symbol stub for: TclFreeObj 0x1049aebc6 <+246>: movq 0x118(%rbp), %rdi 0x1049aebcd <+253>: movl (%rdi), %eax 0x1049aebcf <+255>: cmpl $0x1, %eax 0x1049aebd2 <+258>: jle 0x1049aee57 ; <+903> 0x1049aebd8 <+264>: jmp 0x1049aef77 ; <+1191> 0x1049aebdd <+269>: movq 0x108(%rbp), %rdi -> 0x1049aebe4 <+276>: movl 0x14(%rdi), %r12d 0x1049aebe8 <+280>: movq 0x20(%rdi), %r15 I'm guessing that the crash area correlates to the optimised IR code: action.required.i162: ; preds = %tcl.list.index1.exit %tmp.1074 = extractvalue { i32, %Tcl_Obj* } %tmp.107130, 1, !dbg !13055 %refCount.i.i157 = getelementptr %Tcl_Obj, %Tcl_Obj* %tmp.1074, i64 0, i32 0, !dbg !13080 %refCount1.i.i158 = load i32, i32* %refCount.i.i157, align 4, !dbg !13083 %15 = add nsw i32 %refCount1.i.i158, 1, !dbg !13083 store i32 %15, i32* %refCount.i.i157, align 4, !dbg !13083 %16 = icmp slt i32 %refCount1.i.i158, 1, !dbg !13084 br i1 %16, label %freeObject.i.i163, label %leave.i.i164, !dbg !13084 Decoding the debug metadata, this appears to be part of the inlining of tcl.procedure.return within tcl ::mrtest::rfib 252, and I've got lost after that point. :-) It looks very much like there's a problem with the preconditions for the calling of tcl.procedure.return. I've also no idea why we're using an NRE call for mrtest in the first place; that code doesn't have a 'yield' or 'coroutine' anywhere near it. Donal. |
From: Kevin K. <kev...@gm...> - 2018-04-20 04:09:24
|
What's going on with mrtest::* really puzzles me, and I'd really appreciate another pair of eyeballs. The tester calls 'mrtest::calc' from Tcl, which enters the call thunk. As far as I can tell, the failing sequence looks to be the same code as the three test cases that pass. By inserting strategic printf's, what I discovered: The initial call from the thunk to 'mrtest::calc(STRING) succeeds. mrtest::calc Tcl_Alloc's the coroutine frame, fills the initial values in, and suspends, which returns the coro frame to the thunk. (This 'on ramp' portion of the proc is actually inlined into the thunk.) The coro frame starts with the resume address (I don't have any idea how to get a link map, so no idea if it's the RIIGHT resume address!), the destroy address (ditto!), the promise (correct at this point; TCL_OK for the Tcl status and an uninitialized INT return value), the resume index (-3), and the parameter 'x'. The call to Tcl_NRAddCallback for the return thunk is made, and returns. The call to Tcl_NRAddCallback for tcl.coro.runner is made, and returns. The thunk returns to the trampoline, and the callback into tcl.coro.runner fires. tcl.coro.runner sees the same pointer to the coro frame that mrtest::calc allocated. It calls whatever the resume address in the coro frame points to. That address is the same address that mrtest::calc put there. As I said, I have no idea whether it's the right address, but the code follows the same pattern as the procedures that work. The resume proc, mrtest::calc(STRING).resume, begins with a short codeburst that LLVM inserts; I can't instrument this readily, or at least I don't know how to do it. I had a printf inserted immediately following the call to llvm.coro.suspend in the entry block of mrtest::calc. That printf was never reached. And that's as far as I can get tonight. |
From: Kevin K. <kev...@gm...> - 2018-04-20 03:47:22
|
On Tue, Apr 17, 2018 at 6:27 PM, Donal K. Fellows < don...@ma...> wrote: > I'm getting reliable SIGSEGVs instead of SIGFPEs; those appear to emanate > from 'llvm.trap' intrinsics, which are the sort of things that turn up when > the issued code is actively wrong in some subtle way. I don't know which > one we might be hitting, but they shouldn't be turning up (assuming we're > not in the nasal-demon case of coro.trap). > > Places to look for trouble: > > * tcl ::rectest1 252.resume (in relation to label action.required.i58) > * tcl ::rectest1 -1929838593.resume (in relation to label > action.required.i145) > * tcl ::rectest1 -1929838593 (in relation to label action.required.i119) > * tcl ::rectest1 252 (in relation to label action.required.i44) > > The 'action.required' labels come from the code to add and drop reference > counts in various scenarios; there's ten of them in stdlib.tcl, all in > trivial functions. It looks (with very little analysis) like something > feeding a NULL into a STRING argument; the code issuer *REALLY* assumes > that that's impossible. Or maybe it is an undef that is leaking through; > I'm not sure, but those are the usual causes. > > Interestingly, the predecessor blocks of all of those labels with problems > have the same label: tcl.procedure.return.exit (a generated label resulting > from the exit of tcl.procedure.return, which is what applies the tricks > that happen to transform exception codes when a procedure terminates). That > hints that there might be problems with marshalling the result across the > boundary of the procedure now that the extra coroutine machinery is in > place? I don't know, but at least that points the finger at one place than > many, even if where that place is remains uncertain. In any case, we're now > able to reliably spit out intelligible optimised IR. It shows that there's > a bug. I have things to the point where rectest1, rectest2 and qsort all run perfectly - and still contain these questionable traps. I've figured out what's generating them, and it isn't NRE-specific. (mrtest::* is still failing - I'll discuss that in a separate message.) The offending sequence appears to be: 110: procLeave {temp 28 110} {temp 27 108} {literal rectest1} : FAIL ⇐ INT STRING 111: free {} {temp 27 108} : VOID ⇐ INT 112: {widenTo -1392967681 {FAIL STRING}} {temp 0 112} {temp 28 110} : {FAIL STRING} ⇐ FAIL 113: free {} {temp 28 110} : VOID ⇐ FAIL 114: NRE.return {} Nothing {temp 0 112} : VOID ⇐ NEXIST {FAIL STRING} In the initial translation, we get: ; procLeave - translates to a call to tcl.procedure.return %tmp.28 = call i32 @tcl.procedure.return(%INT %13, %Tcl_Obj* %str.18, i32* %tcl.errorCode), !dbg !4394 ; freeing an INT does nothing ; widening a pure FAIL to a FAIL STRING - the STRING is properly an UNDEF ; since we started with a pure FAIL %tmp.014 = insertvalue { i32, %Tcl_Obj* } undef, i32 %tmp.28, 0, !dbg !4394 ; add to the refcount of the FAIL STRING call void @tcl.addFailReference({ i32, %Tcl_Obj* } %tmp.014), !dbg !4394 ; gubbins to return from an NRE procedure %15 = getelementptr %"tcl ::rectest1 252.promise", %"tcl ::rectest1 252.promise"* %"coro,promise", i32 0, i32 0, !dbg !4394 store i32 0, i32* %15, !dbg !4394 %16 = getelementptr %"tcl ::rectest1 252.promise", %"tcl ::rectest1 252.promise"* %"coro,promise", i32 0, i32 1, !dbg !4394 store { i32, %Tcl_Obj* } %tmp.014, { i32, %Tcl_Obj* }* %16, !dbg !4394 br label %coro.exit, !dbg !4394 What the optimizer does with this: ; before this point is the inlined code from tcl.procedure.return, which ; isn't actually relevant tcl.procedure.return.exit: ; preds = %done.i, %handle.return.i ; this phi is collecting the Tcl status code from 'tcl.procedure.return' %tmp.2837 = phi i32 [ %36, %handle.return.i ], [ %19, %done.i ] ; now we make the FAIL STRING from the pure FAIL %tmp.014 = insertvalue { i32, %Tcl_Obj* } undef, i32 %tmp.2837, 0, !dbg !10673 ; The inlined 'addFailReference' apparently cannot prove that tcl.procedure.return ; always stored a nonzero value in the FAIL, so it has this test: %40 = icmp eq i32 %tmp.2837, 0, !dbg !10716 br i1 %40, label %action.required.i41, label %coro.exit, !dbg !10716 ; But if the value is zero, it can prove that promoting the FAIL to a ; FAIL STRING will cause reference of an 'undef': action.required.i41: ; preds = %tcl.procedure.return.exit call void @llvm.trap(), !dbg !10718 unreachable My guess is that we have a code path that will get into tcl.procedure.return with a zero code - or at least LLVM is unable to prove that we don't. Should I be worried? |
From: Kevin K. <kev...@gm...> - 2018-04-18 23:34:55
|
Good news: with the change to make the alignment constraint on llvm.coro.promise be [Const $::tcl_platform(pointerSize) int32], everything compiles, and the rectest1 and qsort cases even work. I'm pretty sure that I have a bug in returning error from an NRE proc, and that's what you're seeing. That 'trap; unreachable" sequence is what I expect if an LLVM coro is resumed after a Tcl proc returns. It's precisely the nasal daemons code from coro.trap, and I expect to see it soon after procedure return. I'll add a test case that throws an error from a recursive nest. The good news is that rectest1 and qsort now are working! mrtest::* still isn't, probably owing to some problem related to what you turned up. I fully expected to be generating bad code at this point. I was mildly surprised that anything worked! Onward to debugging, now that the optimizer no longer crashes and burns! |
From: Donal K. F. <don...@ma...> - 2018-04-18 14:13:32
|
On 18/04/2018 15:09, Kevin Kenny wrote: > LLVM gave up a long time ago on portability of bitcode among machines > with different DataLayouts. It's not-so-awful assembler, nothing more. :-) Donal. |
From: Kevin K. <kev...@gm...> - 2018-04-18 14:10:01
|
On Wed, Apr 18, 2018 at 10:02 AM, Donal K. Fellows < don...@ma...> wrote: > On 18/04/2018 13:16, Kevin Kenny wrote: > >> The 'alignment' arg to @llvm.coro.promise must be an integer constant. >> AlignOf doesn't return an integer constant, it returns a complex expression. >> > > It's a constant expression (involving shenanigans with GEP)… so if the > promise builder really wants an integer constant, we're unable to use > alignof. > > What's worse, the documentation for all this says just: > > The second argument is an alignment requirements of the promise. If a >> frontend designated %promise = alloca i32 as a promise, the alignment >> argument to coro.promise should be the alignment of i32 on the target >> platform. If a frontend designated %promise = alloca i32, align 16 as a >> promise, the alignment argument should be 16. This argument only accepts >> constants. >> > > Yes, that it's got to be an actual genuine constant integer (as opposed to > just being a constant with integer value) is entirely undocumented. > > You can probably just use $tcl_platform(wordSize) or > $tcl_platform(pointerSize) as alignments. (After converting to LLVM > constants.) > > I might give that a try. Swift (or was it Rust?) reaches into the DataLayout to construct sizeof and alignof, rather than using the SizeOf and AlignOf calls in the C API. Since we gave up on avoiding C++ quite some time ago, I was thinking of doing the same, but you're right that tcl_platform would be a quick and dirty fix. LLVM gave up a long time ago on portability of bitcode among machines with different DataLayouts. |
From: Donal K. F. <don...@ma...> - 2018-04-18 14:02:39
|
On 18/04/2018 13:16, Kevin Kenny wrote: > The 'alignment' arg to @llvm.coro.promise must be an integer constant. > AlignOf doesn't return an integer constant, it returns a complex expression. It's a constant expression (involving shenanigans with GEP)… so if the promise builder really wants an integer constant, we're unable to use alignof. What's worse, the documentation for all this says just: > The second argument is an alignment requirements of the promise. If a frontend designated %promise = alloca i32 as a promise, the alignment argument to coro.promise should be the alignment of i32 on the target platform. If a frontend designated %promise = alloca i32, align 16 as a promise, the alignment argument should be 16. This argument only accepts constants. > Yes, that it's got to be an actual genuine constant integer (as opposed to just being a constant with integer value) is entirely undocumented. You can probably just use $tcl_platform(wordSize) or $tcl_platform(pointerSize) as alignments. (After converting to LLVM constants.) Donal. |
From: Kevin K. <kev...@gm...> - 2018-04-18 12:16:22
|
On Tue, Apr 17, 2018 at 8:01 PM, Kevin Kenny <kev...@gm...> wrote: > > Wait, you can dump post-optimization code? > > I cannot - I'm getting the SEGV or FPE inside the optimizer. I'm not > entering the generated code at all; [$m optimize] does not return. > > If you're getting through the optimizer from tester.tcl, I'd very much > like to know how! > > I wasn't even expecting the code to *work* at this stage! I haven't been > able even to attempt to run it because of the compiler crashes. > > I'm slogging through the steps in https://llvm.org/docs/ > GettingStarted.html - in hopes that making a debugging build of a more > current version will shed light on the subject. > Having a debug build did indeed shed some light. The 'alignment' arg to @llvm.coro.promise must be an integer constant. AlignOf doesn't return an integer constant, it returns a complex expression. This particular crash should go away if I implement the calls into the DataLayout that I discussed a few days ago, to get sizeof and alignof from there rather than using pointer arithmetic. opt: /storage/kennykb/llvm.org/llvm/include/llvm/Support/Casting.h:255: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::ConstantInt; Y = llvm::Value; typename llvm::cast_retty<X, Y*>::ret_type = llvm::ConstantInt*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed. Stack dump: 0. Program arguments: /opt/llvm/bin/opt -S -O3 -enable-coroutines -o test1.ll test.bc 1. Running pass 'Lower early coroutine intrinsics' on function '@"tcl ::rectest1 -1929838593"' #0 0x000055d5cf30d66d llvm::sys::PrintStackTrace(llvm::raw_ostream&) /storage/kennykb/llvm.org/llvm/lib/Support/Unix/Signals.inc:398:0 #1 0x000055d5cf30d700 PrintStackTraceSignalHandler(void*) /storage/kennykb/ llvm.org/llvm/lib/Support/Unix/Signals.inc:462:0 #2 0x000055d5cf30b8d5 llvm::sys::RunSignalHandlers() /storage/kennykb/ llvm.org/llvm/lib/Support/Signals.cpp:49:0 #3 0x000055d5cf30ced9 SignalHandler(int) /storage/kennykb/ llvm.org/llvm/lib/Support/Unix/Signals.inc:252:0 #4 0x00007f39da6b9150 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x13150) #5 0x00007f39d93c00bb gsignal /build/glibc-itYbWN/glibc-2.26/signal/../sysdeps/unix/sysv/linux/raise.c:51:0 #6 0x00007f39d93c1f5d abort /build/glibc-itYbWN/glibc-2.26/stdlib/abort.c:92:0 #7 0x00007f39d93b7f17 __assert_fail_base /build/glibc-itYbWN/glibc-2.26/assert/assert.c:92:0 #8 0x00007f39d93b7fc2 (/lib/x86_64-linux-gnu/libc.so.6+0x2efc2) #9 0x000055d5ccea8e1d llvm::cast_retty<llvm::ConstantInt, llvm::Value*>::ret_type llvm::cast<llvm::ConstantInt, llvm::Value>(llvm::Value*) /storage/kennykb/ llvm.org/llvm/include/llvm/Support/Casting.h:255:0 #10 0x000055d5ceba52b5 llvm::CoroPromiseInst::getAlignment() const /storage/kennykb/llvm.org/llvm/lib/Transforms/Coroutines/CoroInstr.h:253:0 #11 0x000055d5ceba39d5 (anonymous namespace)::Lowerer::lowerCoroPromise(llvm::CoroPromiseInst*) /storage/kennykb/llvm.org/llvm/lib/Transforms/Coroutines/CoroEarly.cpp:68:0 #12 0x000055d5ceba44d8 (anonymous namespace)::Lowerer::lowerEarlyIntrinsics(llvm::Function&) /storage/kennykb/ llvm.org/llvm/lib/Transforms/Coroutines/CoroEarly.cpp:201:0 #13 0x000055d5ceba4922 (anonymous namespace)::CoroEarly::runOnFunction(llvm::Function&) /storage/kennykb/ llvm.org/llvm/lib/Transforms/Coroutines/CoroEarly.cpp:248:0 #14 0x000055d5cead5430 llvm::FPPassManager::runOnFunction(llvm::Function&) /storage/kennykb/llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1520:0 #15 0x000055d5cead51c1 llvm::legacy::FunctionPassManagerImpl::run(llvm::Function&) /storage/kennykb/llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1469:0 #16 0x000055d5cead4d96 llvm::legacy::FunctionPassManager::run(llvm::Function&) /storage/kennykb/ llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1394:0 #17 0x000055d5cce6745a main /storage/kennykb/ llvm.org/llvm/tools/opt/opt.cpp:712:0 #18 0x00007f39d93aa1c1 __libc_start_main /build/glibc-itYbWN/glibc-2.26/csu/../csu/libc-start.c:342:0 #19 0x000055d5cce309aa _start (/opt/llvm/bin/opt+0x14db9aa) |
From: Kevin K. <kev...@gm...> - 2018-04-18 00:01:50
|
On Tue, Apr 17, 2018 at 6:27 PM, Donal K. Fellows < don...@ma...> wrote: > On 17/04/2018 02:06, Kevin Kenny wrote: > >> WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* >> in 'f351(tcl.nr.add.callback)' >> WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* >> in '0' >> > > I've added in the (definitely weird and wonderful) incantation to make > these warnings go away and I can dump post-optimization code (which is > sometimes easier to read than pre-optimization; sometimes…). With that > done, I'm getting reliable SIGSEGVs instead of SIGFPEs; those appear to > emanate from 'llvm.trap' intrinsics, which are the sort of things that turn > up when the issued code is actively wrong in some subtle way. I don't know > which one we might be hitting, but they shouldn't be turning up (assuming > we're not in the nasal-demon case of coro.trap). > > Places to look for trouble: > > * tcl ::rectest1 252.resume (in relation to label action.required.i58) > * tcl ::rectest1 -1929838593.resume (in relation to label > action.required.i145) > * tcl ::rectest1 -1929838593 (in relation to label action.required.i119) > * tcl ::rectest1 252 (in relation to label action.required.i44) > > The 'action.required' labels come from the code to add and drop reference > counts in various scenarios; there's ten of them in stdlib.tcl, all in > trivial functions. It looks (with very little analysis) like something > feeding a NULL into a STRING argument; the code issuer *REALLY* assumes > that that's impossible. Or maybe it is an undef that is leaking through; > I'm not sure, but those are the usual causes. > > Interestingly, the predecessor blocks of all of those labels with problems > have the same label: tcl.procedure.return.exit (a generated label resulting > from the exit of tcl.procedure.return, which is what applies the tricks > that happen to transform exception codes when a procedure terminates). That > hints that there might be problems with marshalling the result across the > boundary of the procedure now that the extra coroutine machinery is in > place? I don't know, but at least that points the finger at one place than > many, even if where that place is remains uncertain. In any case, we're now > able to reliably spit out intelligible optimised IR. It shows that there's > a bug. :-) Wait, you can dump post-optimization code? I cannot - I'm getting the SEGV or FPE inside the optimizer. I'm not entering the generated code at all; [$m optimize] does not return. If you're getting through the optimizer from tester.tcl, I'd very much like to know how! I wasn't even expecting the code to *work* at this stage! I haven't been able even to attempt to run it because of the compiler crashes. I'm slogging through the steps in https://llvm.org/docs/GettingStarted.html - in hopes that making a debugging build of a more current version will shed light on the subject. |
From: Donal K. F. <don...@ma...> - 2018-04-17 22:27:59
|
On 17/04/2018 02:06, Kevin Kenny wrote: > WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* > in 'f351(tcl.nr.add.callback)' > WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* > in '0' I've added in the (definitely weird and wonderful) incantation to make these warnings go away and I can dump post-optimization code (which is sometimes easier to read than pre-optimization; sometimes…). With that done, I'm getting reliable SIGSEGVs instead of SIGFPEs; those appear to emanate from 'llvm.trap' intrinsics, which are the sort of things that turn up when the issued code is actively wrong in some subtle way. I don't know which one we might be hitting, but they shouldn't be turning up (assuming we're not in the nasal-demon case of coro.trap). Places to look for trouble: * tcl ::rectest1 252.resume (in relation to label action.required.i58) * tcl ::rectest1 -1929838593.resume (in relation to label action.required.i145) * tcl ::rectest1 -1929838593 (in relation to label action.required.i119) * tcl ::rectest1 252 (in relation to label action.required.i44) The 'action.required' labels come from the code to add and drop reference counts in various scenarios; there's ten of them in stdlib.tcl, all in trivial functions. It looks (with very little analysis) like something feeding a NULL into a STRING argument; the code issuer *REALLY* assumes that that's impossible. Or maybe it is an undef that is leaking through; I'm not sure, but those are the usual causes. Interestingly, the predecessor blocks of all of those labels with problems have the same label: tcl.procedure.return.exit (a generated label resulting from the exit of tcl.procedure.return, which is what applies the tricks that happen to transform exception codes when a procedure terminates). That hints that there might be problems with marshalling the result across the boundary of the procedure now that the extra coroutine machinery is in place? I don't know, but at least that points the finger at one place than many, even if where that place is remains uncertain. In any case, we're now able to reliably spit out intelligible optimised IR. It shows that there's a bug. :-) Donal. |
From: Kevin K. <kev...@gm...> - 2018-04-17 01:07:44
|
On Mon, Apr 16, 2018 at 12:50 PM, Kevin Kenny <kev...@gm...> wrote: > You're right, I was siphoning off the bitcode in LLVM::optimise (jit.tcl) > in the 'dump-pre' step, which comes just before verifying the module. > Once I'm in a place to do it, I'll move that after the 'verify' and report > back. (It surely doesn't explain the crash in '$module optimize', which > comes after verification and in any case finalizes the debug information, > but getting the assembly and bitcode after the info is finalized might > offer more insight.) OK, I've done this. (Also, uniqueUndef is exonerated - by virtue of the fact that it's no longer in the codebase. All of our variable references are now handled without having to call ReplaceAllUsesWith for any of them.) Still, as expected, running the optimizer in process crashes: $ tclsh8.7 demos/perftest/tester.tcl -just rectest1 Linker: /usr/lib/llvm-5.0/bin/clang restrict <- rectest1 CONFIGURE: -quadcode-log {} ==================TIME (ORIGINAL)=================== eval rectest1: "..." time rectest1: 1.0779985283296543 microseconds per iteration ======================COMPILING===================== get-command-list: 37 microseconds WARNING:varframe.tcl:2972:callframe.store.array:make.array should trigger traces WARNING:varframe.tcl:783:tcl.callframe.load of array should trigger traces WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* in 'f351(tcl.nr.add.callback)' WARNING:struct.tcl:144:no debugging type for i32 (i8**, %Interp*, i32)* in '0' init-module: 773642 microseconds register: 4928 microseconds specialise: 79990 microseconds generate: 36118 microseconds dump-pre: 17040 microseconds Segmentation fault (core dumped) I added a dump of the code after verification but before optimization, and it produces both bitcode and LLVM assembly. The debug information is finalized, and those 'temporary' items are gone. $ grep temporary test.ll (no output) Trying to translate either crashes 'opt'. The crashes are subtly different. $ opt -S test.bc -O3 -enable-coroutines -o optimized.ll #0 0x00007f8c9ee73dfa llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x7aadfa) #1 0x00007f8c9ee72016 llvm::sys::RunSignalHandlers() (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x7a9016) #2 0x00007f8c9ee7214b (/usr/lib/llvm-5.0/bin/../lib/ libLLVM-5.0.so.1+0x7a914b) #3 0x00007f8ca1ce4150 __restore_rt (/lib/x86_64-linux-gnu/ libpthread.so.0+0x13150) #4 0x00007f8c9faba6ea (/usr/lib/llvm-5.0/bin/../lib/ libLLVM-5.0.so.1+0x13f16ea) #5 0x00007f8c9ef4ca28 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883a28) #6 0x00007f8c9ef4cb1e llvm::legacy::FunctionPassManagerImpl::run(llvm::Function&) (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883b1e) #7 0x00007f8c9ef4cf26 llvm::legacy::FunctionPassManager::run(llvm::Function&) (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883f26) #8 0x00005614bea0f346 (opt+0x9e346) #9 0x00007f8c9d9fc1c1 __libc_start_main /build/glibc-itYbWN/glibc-2. 26/csu/../csu/libc-start.c:342:0 #10 0x00005614bea1d44a (opt+0xac44a) Stack dump: 0. Program arguments: opt -S test.bc -O3 -enable-coroutines -o optimized.ll 1. Running pass 'Lower early coroutine intrinsics' on function '@"tcl ::rectest1 -1929838593"' Segmentation fault (core dumped) $ opt -S test.ll -O3 -enable-coroutines -o optimized.ll #0 0x00007fa40ff53dfa llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x7aadfa) #1 0x00007fa40ff52016 llvm::sys::RunSignalHandlers() (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x7a9016) #2 0x00007fa40ff5214b (/usr/lib/llvm-5.0/bin/../lib/ libLLVM-5.0.so.1+0x7a914b) #3 0x00007fa412dc4150 __restore_rt (/lib/x86_64-linux-gnu/ libpthread.so.0+0x13150) #4 0x00007fa410b99ca7 (/usr/lib/llvm-5.0/bin/../lib/ libLLVM-5.0.so.1+0x13f0ca7) #5 0x00007fa41002ca28 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883a28) #6 0x00007fa41002cb1e llvm::legacy::FunctionPassManagerImpl::run(llvm::Function&) (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883b1e) #7 0x00007fa41002cf26 llvm::legacy::FunctionPassManager::run(llvm::Function&) (/usr/lib/llvm-5.0/bin/../lib/libLLVM-5.0.so.1+0x883f26) #8 0x000055a45a879346 (opt+0x9e346) #9 0x00007fa40eadc1c1 __libc_start_main /build/glibc-itYbWN/glibc-2. 26/csu/../csu/libc-start.c:342:0 #10 0x000055a45a88744a (opt+0xac44a) Stack dump: 0. Program arguments: opt -S test.ll -O3 -enable-coroutines -o optimized.ll 1. Running pass 'Lower early coroutine intrinsics' on function '@"tcl ::rectest1 -1929838593"' Floating point exception (core dumped) If I strip the debug information and then try to run the optimizer from generated assembly language, things are happier. Coroutine operations have been lowered. There are references to coroutine intrinsic functions, but no calls to them. $ opt -strip-debug test.bc -S -o test1.ll $ opt -O3 -enable-coroutines -S test1.ll -o optimized.ll $ grep llvm\\\.coro optimized.ll declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*) #17 declare i1 @llvm.coro.alloc(token) #4 declare i8* @llvm.coro.begin(token, i8* writeonly) #4 declare i8* @llvm.coro.subfn.addr(i8* nocapture readonly, i8) #17 Somewhat weirdly, I still get the crash if instead I work from bitcode with the debug information stripped: $ opt -strip-debug test.bc -o test1.bc $ opt -O3 -enable-coroutines -S test1.bc -o optimized.ll ... stack trace snipped .... Floating point exception (core dumped) Damned if I know! It's particularly puzzling in light of the fact that CoroEarly doesn't actually do very much - and none of it appears to be floating point. http://llvm.org/doxygen/CoroEarly_8cpp_source.html For what it's worth: $ fossil stat repository: /home/kennykb/FOSSIL/tclquadcode.fossil local-root: /home/kennykb/tcl-core/tclquadcode-nre/ config-db: /home/kennykb/.fossil checkout: 1855f5dc7a0840f77943c5952cadcdc8aafd5192 2018-04-17 00:11:41 UTC parent: 504ceb334bd134dbe77d61b508917bfec264bd37 2018-04-16 23:46:34 UTC merged-from: f71cdfbdaecc0247e5991602514c58ba795a1ecb 2018-04-17 00:09:09 UTC tags: kbk-nre comment: Merge trunk: expunge last uses of 'uniqueUndef' in constructing phi's (user: kbk) |
From: Kevin K. <kev...@gm...> - 2018-04-16 16:50:25
|
On Mon, Apr 16, 2018 at 8:24 AM, Donal K. Fellows <don...@ma...> wrote: >> I've probably overlooked something. This is where a second pair of >> eyeballs might come in handy. > > Indeed. (I'm guessing that the coro passes care because they're doing > surgery to the debug info. Which seems reasonable.) You're right, I was siphoning off the bitcode in LLVM::optimise (jit.tcl) in the 'dump-pre' step, which comes just before verifying the module. Once I'm in a place to do it, I'll move that after the 'verify' and report back. (It surely doesn't explain the crash in '$module optimize', which comes after verification and in any case finalizes the debug information, but getting the assembly and bitcode after the info is finalized might offer more insight.) |
From: Donal K. F. <don...@ma...> - 2018-04-16 12:24:50
|
On 16/04/2018 13:04, Kevin Kenny wrote: > CoroEarly gets the same crash from the public 'optimize' method, which > supposedly does the finalization, so I'm still confused. I knew about > the finalization requirement, and stuck a 'puts' next to the call, and > it's definitely being done. I'm not aware of changing anything in > 'jit.tcl' that would affect the finalization sequence. I'm pretty sure > that I'm seeing unfinished temporaries on the trunk as well. Unfinished debug temporaries are definitely a mark of the debug issuer still being alive; it's supposed to resolve them when it is destroyed. (The issue is that we're ultimately building circular references; the functions refer to the (list of) variables they contain, and the variables refer to their defining function.) > Peculiarly, only the coroutine passes seem to care. If I patch out > 'qsort', 'rectest' and 'mrtest' from tester.tcl, everything compiles in > spite of them. (The .ll code is still unusable, though.) > > I've probably overlooked something. This is where a second pair of > eyeballs might come in handy. Indeed. (I'm guessing that the coro passes care because they're doing surgery to the debug info. Which seems reasonable.) Donal. |
From: Kevin K. <kev...@gm...> - 2018-04-16 12:04:37
|
On 04/16/2018 04:31 AM, Donal K. Fellows wrote: > The variable list starts out as a temporary. It's probably[†] one of > the things that is only converted to a real value during finalisation > of the debug builder. That's done by calling the > *FinalizeDebuggingMetadata* method of module; after that, you can't > write more out. That in turn is called from the public *verify* and > *optimize* methods. I strongly suggest calling *verify* before getting > the *bitcode* of a module; it also catches quite a few other basic > errors and is pretty worthwhile. I'm not going to force finalization > before producing the LLVM IR (via *dump*) since that's a debugging > tool as well. > > Note that verifying a function is less effective and doesn't finalize > anything. (In particular, it doesn't say what's wrong; that's a fault > of the C API. I ought to fix that as there's never a good reason from > Tcl for a verifier to do anything other than produce a proper Tcl > error on failures. https://github.com/dkfellows/llvmtcl/issues/15 made > to track as the C++ API has no such restriction… Yet another example > of a poor-cousin partially-maintained C API causing us trouble. It's > not Jos's fault either; I get the impression that the LLVM C API > really doesn't get much love at all outside the most basic use cases.) CoroEarly gets the same crash from the public 'optimize' method, which supposedly does the finalization, so I'm still confused. I knew about the finalization requirement, and stuck a 'puts' next to the call, and it's definitely being done. I'm not aware of changing anything in 'jit.tcl' that would affect the finalization sequence. I'm pretty sure that I'm seeing unfinished temporaries on the trunk as well. Peculiarly, only the coroutine passes seem to care. If I patch out 'qsort', 'rectest' and 'mrtest' from tester.tcl, everything compiles in spite of them. (The .ll code is still unusable, though.) I've probably overlooked something. This is where a second pair of eyeballs might come in handy. |
From: Kevin K. <kev...@gm...> - 2018-04-16 11:49:20
|
On 04/16/2018 07:21 AM, Donal K. Fellows wrote: > On 16/04/2018 03:15, Kevin Kenny wrote: >> I have a sneaking suspicion that it's the unique undefs that are >> confusing it - this is because that's really all that I see that's >> different in the function that fails when I compare the input and >> output of 'mem2reg'. Since I want to get rid of those anyway, because >> that would make 'phi' generation a lot tidier, that may be what I work >> on next. > > I commented earlier, but the issue there was simply that I needed a > unique value (i.e., non-constant) that I could pass into > ReplaceAllUsesWith to have substituted with something real. (Replacing > all uses of a constant would “work”, but is a Really Bad Idea.) If > we're going to put the variables into a coroutine state presentation, > we don't actually need to do that; the location in the state > presentation would be entirely adequate (and LLVM is pretty good about > optimising those things given that it's a key for getting performance > from C and C++ code). > > It was an awful hack to make things work. If you've found a way to do > a better thing, that's nice. The trick is that the basic blocks are presented to the code issuer in topological order. There's a nice theorem that states that in SSA form, all definitions of variables dominate their uses, except for phis. So everything BUT a phi operation will have its inputs defined and won't need ReplaceAllUsesWith. For phi operations, LLVM provides AddIncoming to patch a value and source block into a phi after creating it. What the fix does is to create the phi nodes without any data sources at all, but just keep a list of {phi sources values}. Then, after processing all the quadcode, the issuer goes back and patches in all the connections. After I sent the emails last night, I added procedures called PHI and FixupPhis to the build helper, so that the assembly code in the library can just write [PHI [Type int] {$value1 $value2} {$source1 $source2} "varName"] without needing the sources and values defined at the time PHI is executed. It's the same trick: issue an empty 'phi' node and patch the data in at the end. |
From: Donal K. F. <don...@ma...> - 2018-04-16 11:22:03
|
On 16/04/2018 03:15, Kevin Kenny wrote: > I have a sneaking suspicion that it's the unique undefs that are > confusing it - this is because that's really all that I see that's > different in the function that fails when I compare the input and > output of 'mem2reg'. Since I want to get rid of those anyway, because > that would make 'phi' generation a lot tidier, that may be what I work > on next. I commented earlier, but the issue there was simply that I needed a unique value (i.e., non-constant) that I could pass into ReplaceAllUsesWith to have substituted with something real. (Replacing all uses of a constant would “work”, but is a Really Bad Idea.) If we're going to put the variables into a coroutine state presentation, we don't actually need to do that; the location in the state presentation would be entirely adequate (and LLVM is pretty good about optimising those things given that it's a key for getting performance from C and C++ code). It was an awful hack to make things work. If you've found a way to do a better thing, that's nice. Donal. |
From: Donal K. F. <don...@ma...> - 2018-04-16 08:32:07
|
On 16/04/2018 03:20, Kevin Kenny wrote: > (a) The variables of "tcl.int.32" - which should be the single > parameter 'x' - are not encoded. > > (b) There is a temporary object at !15, in place of what should be the > variable collection. The variable list starts out as a temporary. It's probably[†] one of the things that is only converted to a real value during finalisation of the debug builder. That's done by calling the *FinalizeDebuggingMetadata* method of module; after that, you can't write more out. That in turn is called from the public *verify* and *optimize* methods. I strongly suggest calling *verify* before getting the *bitcode* of a module; it also catches quite a few other basic errors and is pretty worthwhile. I'm not going to force finalization before producing the LLVM IR (via *dump*) since that's a debugging tool as well. Note that verifying a function is less effective and doesn't finalize anything. (In particular, it doesn't say what's wrong; that's a fault of the C API. I ought to fix that as there's never a good reason from Tcl for a verifier to do anything other than produce a proper Tcl error on failures. https://github.com/dkfellows/llvmtcl/issues/15 made to track as the C++ API has no such restriction… Yet another example of a poor-cousin partially-maintained C API causing us trouble. It's not Jos's fault either; I get the impression that the LLVM C API really doesn't get much love at all outside the most basic use cases.) > (c) Attempting to parse this with > opt -S test.ll > yields: > /usr/lib/llvm-5.0/bin/opt: test.ll:12038:7: error: Expected '!' here > !15 = <temporary!> !{} Yes. If you're wanting to pass things to opt, try using the bitcode method of the module. Though in this case it probably still wouldn't work because the debug info builder wasn't finalized. :-) > (d) 'opt' does not object to reading the bitcode file and can produce > a .ll that parses correctly. The resulting .ll crashes in > CoroEarly with a FloatingPointException. > > $ opt -S test.bc > test2.ll > $ opt -S -O3 -enable-coroutines test2.ll > test3.ll > <crash!> > > (e) If all debug info is stripped from the bitcode in 'opt', then > there is no crash when attempting to optimize the module. > > $ opt -S -strip-debug test.bc -o test2.ll > $ opt -S -O3 -enable-coroutines test2.ll > test3.ll OK, then that's definitely a failure to finalize the debug info. I've seen that sort of thing before and it was much more of a PITA to debug then. :-) Donal. [† I forget exactly what there was that was doing this, but it was something like this. ] |
From: Donal K. F. <don...@ma...> - 2018-04-16 05:06:15
|
On 16/04/2018 03:20, Kevin Kenny wrote: > I've managed on trunk to get rid of all the initialization of quadcode > variables with unique undefs (I still need to work on this in the > libraries). That's curious, because it means that you have a different strategy for pinning variable uses and definitions together. The original issue was that we didn't guarantee that definitions would precede uses in the order of processing of basic blocks during code issue (as opposed to on the code path taken) and this would make the main compiler code as it always assumed that variable references would exist at the point of use. I solved it by actually initially setting each variable to undef, and I obtained a unique one of those by reading it from an undefined unique memory location (as using the undef constant resulted in catastrophic excessive unification). Donal. |
From: Kevin K. <kev...@gm...> - 2018-04-16 02:20:39
|
I've managed on trunk to get rid of all the initialization of quadcode variables with unique undefs (I still need to work on this in the libraries). Unfortunately, it did nothing for buildability - although now mem2reg doesn't alter the code of a simple NRE test proc. There's still the segfault. But now I've got another suspect: the debug information. When I try simply to parse and reformat an LLVM assembly language source dumped from llvmtcl, I'm seeing: define hidden i32 @tcl.int.32(%INT %x) #0 !dbg !3 { ... } And then the debug info has: !3 = distinct !DISubprogram(name: "tcl.int.32", scope: !5, file: !4, type: !6, isLocal: true, isDefinition: true, isOptimized: true, unit: !1, variables: !15) !4 = !DIFile(filename: "mathlib.tcl", directory: "/home/kennykb/tcl-core/tclquadcode-nre/codegen") !5 = !DINamespace(name: "tcl.support", scope: null) !6 = !DISubroutineType(types: !7) !7 = !{!8, !9} !8 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !9 = !DICompositeType(tag: DW_TAG_structure_type, name: "INT", scope: !11, file: !10, line: 1545, size: 24, elements: !12) !10 = !DIFile(filename: "struct.tcl", directory: "/home/kennykb/tcl-core/tclquadcode-nre/codegen") !11 = !DINamespace(name: "types", scope: !5) !12 = !{!13, !8, !14} !13 = !DIBasicType(name: "bool", size: 1, encoding: DW_ATE_boolean) !14 = !DIBasicType(name: "int64", size: 64, encoding: DW_ATE_signed) !15 = <temporary!> !{} Observations: (a) The variables of "tcl.int.32" - which should be the single parameter 'x' - are not encoded. (b) There is a temporary object at !15, in place of what should be the variable collection. (c) Attempting to parse this with opt -S test.ll yields: /usr/lib/llvm-5.0/bin/opt: test.ll:12038:7: error: Expected '!' here !15 = <temporary!> !{} (d) 'opt' does not object to reading the bitcode file and can produce a .ll that parses correctly. The resulting .ll crashes in CoroEarly with a FloatingPointException. $ opt -S test.bc > test2.ll $ opt -S -O3 -enable-coroutines test2.ll > test3.ll <crash!> (e) If all debug info is stripped from the bitcode in 'opt', then there is no crash when attempting to optimize the module. $ opt -S -strip-debug test.bc -o test2.ll $ opt -S -O3 -enable-coroutines test2.ll > test3.ll I have absolutely no idea why the variable list is showing up as <temporary!>, or how to get it to be resolved to the actual list of variables. So far, my Google-fu has been failing me as well. I suspect that the <temporary!> thing is a bug in llvmtcl, but the state of documentation of the debug info API is such that tracking this down is going to be challenging. In the course of attempting to track this down, I've discovered that the inlining pass of LLVM inserts correct llvm.lifetime.start/end intrinsic calls surrounding the bodies of inlined functions, so that project is off my list. I still want to get rid of the rest of the 'unique undef' stuff, since that gives rise to extraordinarily unreadable assembly code. I think I'll tackle that next. |