From: Tomas V. <tv...@fu...> - 2022-09-12 10:08:32
|
On 9/9/22 18:20, John Reiser wrote: > [[Aggressive snipping, but relevant details preserved.]] > >> No threading is used. Postgres is multi-process, and uses shared >> memory for the shared cache (through shm_open etc.). > > Multi-process plus shm_open() IS THREADING! Not pthreads, but multiple > execution contexts that read and write the same memory, which is > subject to the same types of synchronization errors as pthreads. > Perhaps --tool=drd and --tool=helgrind can help. > OK, but why would that break core files only with valgrind? Because when ran directly, the core files work perfectly fine. > > [[Another topic]] >> Sure, but that's more of a workaround - it does not make the core file >> useful, it provides alternative way to get to the same result. Plus it >> requires additional tooling/scripting, and I'd prefer keeping the >> tooling as simple as possible. > > I made a specific suggestion that takes less than one hour: build a > small test case > that performs a short chain of subroutine calls, with the last routine > generating > a deliberate SIGABRT. Run the test case under valgrind, get a core file > from valgrind, > and see if gdb gives the correct traceback from that core file. The > objective > is to provide a strong clue about whether *every* core file generated by > valgrind > (in your environment) fails to work well with gdb. Perhaps solving the > problem > that involves your larger and more-complex case can be subsumed by > analyzing > something that is much simpler. > > Please perform that experiment and report the results here. > I did this experiment - attached is a simple .c file, with a trivial example (3 functions) and segfault (or abort). When built like this: $ gcc valgrind-core-test.c -O0 -g then the core file produced without valgrind is perfectly fine: $ gdb ./a.out core ... Core was generated by `./a.out'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 6 *ptr = 'a'; (gdb) bt #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 #1 0x0000005594350750 in f2 () at valgrind-core-test.c:13 #2 0x0000005594350768 in f1 () at valgrind-core-test.c:18 #3 0x0000005594350780 in main () at valgrind-core-test.c:23 but when run under valgrind it looks like this: $ gdb ./a.out vgcore.1395835 ... Core was generated by `'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000108734 in ?? () (gdb) bt #0 0x0000000000108734 in ?? () #1 0x0000000000108780 in ?? () #2 0x0000000000108644 in ?? () However, when I do this on x86 (Fedora 34, gcc 11.3.1, valgrind 3.18.1) it works just fine and I get the same backtrace. So perhaps this is specific to (either) gcc 10.2, or aarch64 platform. regards Tomas |