|
From: Paul F. <pj...@wa...> - 2020-01-14 19:16:25
|
Hi I’m having quick look at building Valgrind on macOS Catalina I’m using this repo https://github.com/LouisBrunner/valgrind-macos.git <https://github.com/LouisBrunner/valgrind-macos.git> Plus I’ve merged in the changes up to head from sourceware. After a few mods to configure.ac and a few of the Darwin files it builds and I get ==39161== Lackey, an example Valgrind tool ==39161== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. ==39161== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info ==39161== Command: pwd ==39161== ==39161== valgrind: Unrecognised instruction at address 0x1006037bd. In the past when I’ve seen this sort of thing there was also a vex printf of the opcodes, but not in this case. Any suggestions what to try next? A+ Paul |
|
From: Rhys K. <rhy...@gm...> - 2020-01-15 12:47:58
|
On Wed, 15 Jan 2020 at 06:17, Paul Floyd <pj...@wa...> wrote: > Hi > > I’m having quick look at building Valgrind on macOS Catalina > > I’m using this repo > > https://github.com/LouisBrunner/valgrind-macos.git > > Plus I’ve merged in the changes up to head from sourceware. > > After a few mods to configure.ac and a few of the Darwin files it builds > and I get > > ==39161== Lackey, an example Valgrind tool > ==39161== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. > ==39161== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for > copyright info > ==39161== Command: pwd > ==39161== > ==39161== valgrind: Unrecognised instruction at address 0x1006037bd. > > In the past when I’ve seen this sort of thing there was also a vex printf > of the opcodes, but not in this case. > Try running with ./valgrind -v <$program> and provide what is output. > > Any suggestions what to try next? > > A+ > Paul > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Paul F. <pj...@wa...> - 2020-01-15 12:58:29
|
> On 15 Jan 2020, at 13:47, Rhys Kidd <rhy...@gm...> wrote: > > > Try running with ./valgrind -v <$program> and provide what is output. > > Hi Rhys I’ve attached the output. This is with the latest Xcode and macOS versions. |
|
From: Rhys K. <rhy...@gm...> - 2020-01-18 23:31:47
|
Thanks Paul. Will need a big more debugging on a macOS 10.15 Catalina system to get to the bottom of this one. I've created a bug report from your debug log to track this (https://bugs.kde.org/show_bug.cgi?id=416436) and marked it under our meta bug for all known macOS 10.15 issues. Sometimes these class of reports about illegal instructions actually have nothing to do with missing x86_64 ISA support, instead there's a system call which valgrind isn't hooking properly on new Mach kernel (the macOS kernel). Regards, Rhys On Wed, 15 Jan 2020 at 23:59, Paul Floyd <pj...@wa...> wrote: > > > On 15 Jan 2020, at 13:47, Rhys Kidd <rhy...@gm...> wrote: > > > Try running with ./valgrind -v <$program> and provide what is output. > > >> >> > > Hi Rhys > > I’ve attached the output. This is with the latest Xcode and macOS versions. > > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Paul F. <pj...@wa...> - 2020-01-19 13:41:45
|
> On 19 Jan 2020, at 00:31, Rhys Kidd <rhy...@gm...> wrote: > > Thanks Paul. > > Will need a big more debugging on a macOS 10.15 Catalina system to get to the bottom of this one. I've created a bug report from your debug log to track this (https://bugs.kde.org/show_bug.cgi?id=416436 <https://bugs.kde.org/show_bug.cgi?id=416436>) and marked it under our meta bug for all known macOS 10.15 issues. > > Sometimes these class of reports about illegal instructions actually have nothing to do with missing x86_64 ISA support, instead there's a system call which valgrind isn't hooking properly on new Mach kernel (the macOS kernel). I’ve debugged a bit more and it looks like a ud2 opcode is causing the error ==== SB 2822 (evchecks 301498) [tid 1] 0x1005f5ecb __pthread_init+898 /usr/lib/system/libsystem_pthread.dylib+0xecb 0x1005F5ECB: call 0x1005FD7A6 0x1005FD7A6: leaq 2759(%rip), %rcx 0x1005FD7AD: xorl %eax,%eax 0x1005FD7AF: movq %rcx,11002(%rip) 0x1005FD7B6: movq %rax,11043(%rip) 0x1005FD7BD: ud2 ==79936== valgrind: Unrecognised instruction at address 0x1005fd7bd. ==80006== at 0x1005FD7BD: __pthread_init.cold.2 (in /usr/lib/system/libsystem_pthread.dylib) Looking a bit at the disassembly of libsystem_pthread.dylib, pthread_init function 0000000000000db2 movq 0xa267(%rip), %rax ## literal pool symbol address: __os_xbs_chrooted 0000000000000db9 cmpb $0x0, (%rax) 0000000000000dbc je 0xecb This seems to be the path that gets taken (0xecb is the address of __pthread_init.cold.2) I can’t find much on _os_xbs_chrooted. A+ Paul |
|
From: John R. <jr...@bi...> - 2020-01-19 16:16:22
|
> ==== SB 2822 (evchecks 301498) [tid 1] 0x1005f5ecb __pthread_init+898 /usr/lib/system/libsystem_pthread.dylib+0xecb > 0x1005F5ECB: call 0x1005FD7A6 > 0x1005FD7A6: leaq 2759(%rip), %rcx > 0x1005FD7AD: xorl %eax,%eax > 0x1005FD7AF: movq %rcx,11002(%rip) > 0x1005FD7B6: movq %rax,11043(%rip) > 0x1005FD7BD: ud2 > > ==79936== valgrind: Unrecognised instruction at address 0x1005fd7bd. > ==80006== at 0x1005FD7BD: __pthread_init.cold.2 (in /usr/lib/system/libsystem_pthread.dylib) The pthread library has detected an impossible situation regarding system calls, and this is the calling sequence to report the fatal error to MacOS. The bad emulation happened some time ago. See https://bugs.kde.org/show_bug.cgi?id=383723#c23 of 2.5 years ago where a similar ud2 was found to result from an incomplete emulation of kevent_qos syscall. |
|
From: Paul F. <pj...@wa...> - 2020-01-22 08:15:40
|
> On 19 Jan 2020, at 16:04, John Reiser <jr...@bi...> wrote: > >> ==== SB 2822 (evchecks 301498) [tid 1] 0x1005f5ecb __pthread_init+898 /usr/lib/system/libsystem_pthread.dylib+0xecb >> 0x1005F5ECB: call 0x1005FD7A6 >> 0x1005FD7A6: leaq 2759(%rip), %rcx >> 0x1005FD7AD: xorl %eax,%eax >> 0x1005FD7AF: movq %rcx,11002(%rip) >> 0x1005FD7B6: movq %rax,11043(%rip) >> 0x1005FD7BD: ud2 >> ==79936== valgrind: Unrecognised instruction at address 0x1005fd7bd. >> ==80006== at 0x1005FD7BD: __pthread_init.cold.2 (in /usr/lib/system/libsystem_pthread.dylib) > > The pthread library has detected an impossible situation regarding system calls, > and this is the calling sequence to report the fatal error to MacOS. > The bad emulation happened some time ago. > > See https://bugs.kde.org/show_bug.cgi?id=383723#c23 of 2.5 years ago where a similar ud2 > was found to result from an incomplete emulation of kevent_qos syscall. Hmm. In this case I don’t think that the open source Darwin code is going to help much. Perhaps now macOS is doing some chroot shenanigans, like iOS? A+ Paul |
|
From: Paul F. <pj...@wa...> - 2020-02-07 15:39:19
|
> On 19 Jan 2020, at 16:04, John Reiser <jr...@bi...> wrote: > >> ==== SB 2822 (evchecks 301498) [tid 1] 0x1005f5ecb __pthread_init+898 /usr/lib/system/libsystem_pthread.dylib+0xecb >> 0x1005F5ECB: call 0x1005FD7A6 >> 0x1005FD7A6: leaq 2759(%rip), %rcx >> 0x1005FD7AD: xorl %eax,%eax >> 0x1005FD7AF: movq %rcx,11002(%rip) >> 0x1005FD7B6: movq %rax,11043(%rip) >> 0x1005FD7BD: ud2 >> ==79936== valgrind: Unrecognised instruction at address 0x1005fd7bd. >> ==80006== at 0x1005FD7BD: __pthread_init.cold.2 (in /usr/lib/system/libsystem_pthread.dylib) > > The pthread library has detected an impossible situation regarding system calls, > and this is the calling sequence to report the fatal error to MacOS. > The bad emulation happened some time ago. > > See https://bugs.kde.org/show_bug.cgi?id=383723#c23 of 2.5 years ago where a similar ud2 > was found to result from an incomplete emulation of kevent_qos syscall. > > I’ve spent a good while looking at this and am still more or less scratching my head. Here’s what I’ve done. 1. Setup a macOS 10.14 Mojave VM, installed XCode and the same source. It builds and seems to work (at least for my minimal test). 2. Generate traces with —trace-syscalls=yes and --trace-flags=10000000 to see what sys calls and VEX are doing. Obviously there are tons of diffs between these two logs, PIDs and hex addresses to start with. However the logs aren’t enormously long - just 3807 lines on macOS 10,15. In the logs, I see 3 sections a. stat/open/mmap/vm_allocate/close to load all of the system .dylibs. The dylibs aren’t loaded in the same order, and macOS 10.15 has added one new one (feature flags). b. Another big chunk, this time in dyld and ImageLoader functions c. __libkernel_init, which is towards the end of the 10.15 log. I’ve mostly been looking in this part, though I’ve no reason to suppose that the previous two sections don’t contain the issue(s). One thing that I’ve seen in this last section is this on macOS 10,14 ==== SB 2625 (evchecks 343156) [tid 1] 0x1005f85d6 _setcontext+144 /usr/lib/system/libsystem_platform.dylib+0x75d6 Whilst the corresponding VEX output on 10.15 is ==== SB 2795 (evchecks 300971) [tid 1] 0x1005e862a _os_semaphore_wait.cold.1+110 /usr/lib/system/libsystem_platform.dylib+0x762a I’m not sure what is happening here. In both cases the offset is beyond the end of lib system_platform that I see from disassemby (0x756a and 0x75d4 respectively). These are the names of the last functions in the library. The last thing that I see before the failing call to pthread_init is a syscall to sysctl KERN_USRSTACK64 which seems to work OK. Any other suggestions? A+ Paul |
|
From: Louis B. <lou...@gm...> - 2020-02-11 11:36:10
|
Hi, I have been investigating the problem and I have a fix (patch included). Some context: it is difficult to be sure without the libpthread sources for 10.15, but if you check libpthread-330.250.2 (for 10.14), you will find a parse_ptr_munge_params function that tries to get ptr_munge value from the environment (through the Apple environment or an actual environment variable, PTHREAD_PTR_MUNGE_TOKEN). In previous versions, __pthread_init would just carry on even if the value wasn't defined, but while stepping through the code, I have found that macOS 10.15 seems to crash using the ud2 instruction just after the environment variable check, hinting that this value is now required. My patch adds PTHREAD_PTR_MUNGE_TOKEN with a value of 1 everytime valgrind starts a program. Note that a value of 0 is considered an error by pthread. Disclaimer: while I found where ptr_munge is generated (kernel) and used (some kind of conversion in jmp instructions in libplatform), I don't understand what it does exactly. On the other hand, the dummy value doesn't seem to make a difference when executing a program with valgrind. Best regards, Louis Brunner |