|
From: Paul F. <pj...@wa...> - 2024-04-25 19:44:37
|
On 23-04-24 15:52, Carl Love wrote: > Paul: > > I have been digging some more with gdb. I have also put in some print statements to try and figure out when and what syscall the issue occurs on. The issue occurs after processing a system call and we return to running the user code. While running the user code we encounter the seg fault. > > Valgrind calls void VG_(client_syscall) in syswrap-main.c to process a system call. The function calls putSyscallStatusIntoGuestState, twice, as part of processing the system call. I see a variable number of calls to putSyscallStatusIntoGuestState before hitting the seg fault. Note, the number of calls to putSyscallStatusIntoGuestState before the seg fault varies when just running valgrind. I see 2474 or 2478, calls to the function while processing system call 90 or I see 3604 or 3608 function calls for system call 6 before we hit the seg fault. I am puzzled by the inconsistency in the number of calls/sys call number before the segmentation fault, it doesn't "feel" like it is the system call processing per say that is the issue but maybe some other issue, just guessing???? Also, when the failure occurs, it isn't the first time that system call has been handled. It looked like we have processed that system call 10 to 20 times previously without a failure. Don't know if that helps or not? > > If you have any thoughts as to a possible root cause, please let me know and I will look into it. Thanks. > Hi Carl For client syscalls Valgrind will make several syscalls itself. The sequeence is 1. gettid 2. write (that's the syscall 6) to release the fifo big lock 3. sigprocmask to set the guest signal mask 4. the actual client syscall 5. sigprocmask to set the Valgrind signal mask 6. read to acquire the fifo big lock On multithreaded apps it might be a different thread that performs the read. Going back to your first error ==2282131== Process terminating with default action of signal 11 (SIGSEGV) ==2282131== Bad permissions for mapped region at address 0x6B50000 ==2282131== at 0x43962B8: __memset_power10 (in /usr/lib64/glibc-hwcaps/power9/libc-2.28.so) ==2282131== by 0x1013FBFF: PyTuple_New (in /home/carll/anaconda3/envs/faiss/bin/python3.11) I would expect memcheck to redirect the __memset_power10 call which was why I was wondering if other tools were OK. Do the SyscallInfo and ThreadState data structures look OK in gdb when you hit the segfault? Does running with --sanity-level=4 change anything (other than making it a lot slower)? A+ Paul |