Re: [Valgrind-users] armhf: illegal hardware instruction

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> I did not make up those strace logs in my head, all I am trying to do
> is Debian bug triaging. Turns out I did a pretty bad job at it:
> 
> 1. The original Debian bug report seems to be PEBCAK, and I'll close
> the bug as wontfix ASAP,

> 2. I was not paying attention to the gcc version I was using.

The original bug report  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=928224
specified the reproducing case "valgrind /bin/true".  That now works for me:
-----
$ valgrind /bin/true
==399== Memcheck, a memory error detector
==399== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==399== Using Valgrind-3.20.0.GIT and LibVEX; rerun with -h for copyright info
==399== Command: /bin/true
==399==
==399==
==399== HEAP SUMMARY:
==399==     in use at exit: 0 bytes in 0 blocks
==399==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==399==
==399== All heap blocks were freed -- no leaks are possible
==399==
==399== For lists of detected and suppressed errors, rerun with: -s
==399== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
-----
in the environment:
-----
$ valgrind --version
valgrind-3.20.0.GIT
$ gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110
$ uname -a
Linux rpi2-20220121 5.10.0-15-armmp #1 SMP Debian 5.10.120-1 (2022-06-09) armv7l GNU/Linux
-----
so the original bug report can be closed with "fixed in newer version"
or something like that.

> So if my understanding is correct I can make valgrind produce this
> "Illegal instruction" using either gcc-11 or gcc-12 (Debian package
> from sid), BUT I can make valgrind run using gcc-10 (again Debian
> package from sid). This also seems to be hardware specific since armhf
> binary + gcc-12 runs properly on arm64 (armhf chroot).

Is it easy to install several versions (gcc-10, gcc-11, gcc-12, clang-13)
at the same time, and switch among them by using something like
    CC=/path/to/gcc-12 ./configure
Where can I find hints about this?

> 
> Would you kindly indicate if you believe the bug should be reported
> back to valgrind bug tracker or gcc bug tracker ? If that matters,
> clang 13.0 seems to also mess up valgrind code and binaries produced
> return this "Illegal instruction".

SIGILL should be diagnosed using gdb to print the instruction stream
and register contents
-----
(gdb) run args...
Program received signal SIGILL, Illegal instruction.
(gdb) x/i $pc   ## the faulting instruction
(gdb) x/12i   pc-6*4   ## disassemble the surrounding instructions
(Gdb) x/12xw $pc-6*4   ## and in 32-bit raw hexadecimal
(gdb) info reg   ## content of all registers
(gdb) x/16xw $sp   ## dump the active end of the stack
(gdb) bt   ## source-level backtrace
-----
But with valgrind you must just "continue" the deliberate SIGILL and
SIGSEGV that valgrind uses.  Here is an actual run:
-----
$ gdb valgrind
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Reading symbols from valgrind...
(gdb) run /bin/true
Starting program: /usr/local/bin/valgrind /bin/true
process 426 is executing new program: /usr/local/libexec/valgrind/memcheck-arm-linux

Program received signal SIGILL, Illegal instruction.
vgPlain_machine_get_hwcaps () at m_machine.c:1719
1719	           __asm__ __volatile__(".word 0xF3044F54"); /* VMAXNM.F32 q2,q2,q2 */

    ## Notice that this SIGILL is from valgrind trying to determine
    ## the actual hardware capabilities.  Valgrind knows what it is doing,
    ## so just 'continue' to let valgrind handle the SIGILL.

(gdb) c
Continuing.
==426== Memcheck, a memory error detector
==426== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==426== Using Valgrind-3.20.0.GIT and LibVEX; rerun with -h for copyright info
==426== Command: /bin/true
==426==

Program received signal SIGSEGV, Segmentation fault.   ## valgrind deliberate
0x62c68cc0 in ?? ()
(gdb) x/i $pc
=> 0x62c68cc0:	str	r3, [r9]
(gdb) p $r9
$1 = 3187663772
(gdb) p/x $r9
$2 = 0xbdffe39c
(gdb) x/12i $pc-6*4
    0x62c68ca8:	ldr	r3, [r8, #424]	; 0x1a8
    0x62c68cac:	mov	r1, r3
    0x62c68cb0:	movw	r2, #62156	; 0xf2cc
    0x62c68cb4:	movt	r2, #22528	; 0x5800
    0x62c68cb8:	blx	r2
    0x62c68cbc:	ldr	r3, [r8, #24]
=> 0x62c68cc0:	str	r3, [r9]
    0x62c68cc4:	add	r7, r9, #4
    0x62c68cc8:	mov	r0, r7
    0x62c68ccc:	ldr	r3, [r8, #428]	; 0x1ac
    0x62c68cd0:	mov	r1, r3
    0x62c68cd4:	movw	r2, #62156	; 0xf2cc
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.   ## valgrind deliberate
0x62c6cc0c in ?? ()
(gdb) x/i $pc
=> 0x62c6cc0c:	str	r9, [r11]
(gdb) x/12i $pc-6*4
    0x62c6cbf4:	ldr	r9, [r8, #416]	; 0x1a0
    0x62c6cbf8:	mov	r1, r9
    0x62c6cbfc:	movw	r2, #62156	; 0xf2cc
    0x62c6cc00:	movt	r2, #22528	; 0x5800
    0x62c6cc04:	blx	r2
    0x62c6cc08:	ldr	r9, [r8, #16]
=> 0x62c6cc0c:	str	r9, [r11]
    0x62c6cc10:	add	r9, r11, #4
    0x62c6cc14:	mov	r0, r9
    0x62c6cc18:	ldr	r3, [r8, #420]	; 0x1a4
    0x62c6cc1c:	mov	r1, r3
    0x62c6cc20:	movw	r2, #62156	; 0xf2cc
(gdb) c
Continuing.
==426==
==426== HEAP SUMMARY:
==426==     in use at exit: 0 bytes in 0 blocks
==426==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==426==
==426== All heap blocks were freed -- no leaks are possible
==426==
==426== For lists of detected and suppressed errors, rerun with: -s
==426== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[Inferior 1 (process 426) exited normally]
-----

You also can use something like
    objdump --disassemble=subroutine_name
to be sure that the executing process matches the built software file.

Right now I cannot reproduce SIGILL, so I cannot dig in further.
Based on the software that I built and ran: valgrind is not to blame;
the problem lies with the compiler, operating system, or hardware.
(In the last two months I have had four hardware failures:
a 5-port ethernet switch, the sound output on a 12-year old
consumer desktop PC, the sound output on a 5-year old
self-built x86_64 desktop, and the power brick for a RaspberryPi.)