|
From: Yang Z. <yan...@in...> - 2022-04-20 08:18:22
|
Hello all,
Recently our QAs used valgrind tool(valgrind-3.20.0.GIT) to verify memory leak
with Qemu release on Intel new Sapphire Rapids platform. They only focused on
Sapphire Rapids' new features which are merged into Linux and Qemu release.
#The command
/usr/local/bin/valgrind --log-file=/root/valgrind.log --leak-check=full -v \
./qemu-system-x86_64 \
......
#Qemu will report below issue with valgrind tool:
qemu-system-x86_64: warning: prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure for feature bit 18
qemu-system-x86_64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Operation not permitted
W/o valgrind tool, the AMX can work normally in latest Linux and Qemu release.
I checked the Qemu code, below syscall will fail(rc=-1) with valgrind tool
int rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM, bit);
Notice: bit=18, which is AMX feature in xstate area.
I also use same valgrind command to check amx selftool on Sapphire Rapids platform.
/usr/local/bin/valgrind --log-file=/root/valgrind.log --leak-check=full -v linux/tools/testing/selftests/x86/amx_64
amx_64: [FAIL] xstate cpuid: invalid tile data size/offset: 0/0: Success
from the linux/tools/testing/selftests/x86/amx.c
eax = CPUID_LEAF_XSTATE;
ecx = XFEATURE_XTILEDATA;
cpuid(&eax, &ebx, &ecx, &edx);
/*
* eax: XTILEDATA state component size
* ebx: XTILEDATA state component offset in user buffer
*/
if (!eax || !ebx)
fatal_error("xstate cpuid: invalid tile data size/offset: %d/%d",
eax, ebx);
Above code only read AMX xtilestate's offset and size from xstate buffer by cpuid.
But from the error information, with valgrind tool, the amx selftest tool in Linux can't
read correct Sapphire Rapids platform's cpuid registers.
I also tried this same command w/o valgrind tool in intel older platform(without this feature),
we can get same error information, but this should be normal behavior.
root@icx:~/yangzhon/projects/amx/linux# tools/testing/selftests/x86/amx_64
amx_64: [FAIL] xstate cpuid: invalid tile data size/offset: 0/0: Success
So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible
with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't
read correct register value. thanks!
Regards,
Yang
|
|
From: Tom H. <to...@co...> - 2022-04-20 09:31:07
|
On 20/04/2022 09:01, Yang Zhong wrote: > So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible > with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't > read correct register value. thanks! When running under valgrind you are running on an emulated CPU not the real CPU and the results of cpuid will reflect the capabilities of that emulated CPU rather than the real CPU. Do the bits that you are trying to check reflect something (like new instructions) that valgrind will need to be concerned about? Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Yang Z. <yan...@in...> - 2022-04-20 12:34:35
|
On Wed, Apr 20, 2022 at 09:37:17AM +0100, Tom Hughes wrote: > On 20/04/2022 09:01, Yang Zhong wrote: > > >So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible > >with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't > >read correct register value. thanks! > > When running under valgrind you are running on an emulated CPU not > the real CPU and the results of cpuid will reflect the capabilities > of that emulated CPU rather than the real CPU. > > Do the bits that you are trying to check reflect something (like new > instructions) that valgrind will need to be concerned about? > Thanks Tom for your quickly response! The AMX is the NEW feature in Intel new platform and from host, we can find below cpu flags: amx_bf16, amx_tile, amx_int8 The SPEC can be found in: https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf The issue I mentioned should be related with AMX features missed in valgrind emulated CPU. If someone will implement this feature on valgrind, I can help verify. Thanks! Yang > Tom > > -- > Tom Hughes (to...@co...) > http://compton.nu/ |
|
From: John R. <jr...@bi...> - 2022-04-21 17:06:18
|
On 4/20/22 05:18, Yang Zhong wrote: > The AMX is the NEW feature in Intel new platform and from host, we can > find below cpu flags: > > amx_bf16, amx_tile, amx_int8 > > The SPEC can be found in: > https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf > > The issue I mentioned should be related with AMX features missed in > valgrind emulated CPU. If someone will implement this feature on valgrind, > I can help verify. Thanks! If you really want to help, then start today by collecting and/or writing actual code that emulates the hardware that implements the feature. Collect (or find, or write) the code from Chapter 3, "INTEL® AMX INSTRUCTION SET REFERENCE, A-Z", of that .pdf. Create actual subroutines and data declarations, and *test* it against your apps. Put the code into a public repository such as GitHub. The top-level function should be something like unsigned char const *emulate_amx( // returns next instruction pointer unsigned char const *ip, // pointer to first byte of instruction stream unsigned long *general_registers[16], // hardware state unsigned long long *zmm_registers[16], // zmm (ymm, xmm) registers struct Xsave *xsave_area, // tile registers etc. ... } which if successful returns a pointer to the next instruction, else an error code which is the negative of a small positive integer. Such code will go a long way towards getting AMX supported by valgrind, because it will enable valgrind-developers to focus on implementing valgrind instead of on finding, de-ciphering, and mentally interpreting documentation. |
|
From: Tom H. <to...@co...> - 2022-04-20 12:42:01
|
On 20/04/2022 13:18, Yang Zhong wrote: > On Wed, Apr 20, 2022 at 09:37:17AM +0100, Tom Hughes wrote: >> On 20/04/2022 09:01, Yang Zhong wrote: >> >>> So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible >>> with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't >>> read correct register value. thanks! >> >> When running under valgrind you are running on an emulated CPU not >> the real CPU and the results of cpuid will reflect the capabilities >> of that emulated CPU rather than the real CPU. >> >> Do the bits that you are trying to check reflect something (like new >> instructions) that valgrind will need to be concerned about? >> > > Thanks Tom for your quickly response! > > The AMX is the NEW feature in Intel new platform and from host, we can > find below cpu flags: > > amx_bf16, amx_tile, amx_int8 That tells me nothing. > The SPEC can be found in: > https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf No I'm not going to spend my day digging through thousands of pages of the latest instruction set reference trying to figure out what exactly this feature is... > The issue I mentioned should be related with AMX features missed in > valgrind emulated CPU. If someone will implement this feature on valgrind, > I can help verify. Thanks! Again until we know what "AMX features" are it's impossible to comment in any detail. If AMX features involved new instructions then yes it will definitely need somebody to do the work to add support for them. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Tom H. <to...@co...> - 2022-04-20 12:45:03
|
On 20/04/2022 13:41, Tom Hughes via Valgrind-users wrote: > Again until we know what "AMX features" are it's impossible to comment > in any detail. So apparently AMX is this: https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions So not only is it new instructions, it is new two dimensional registers so it's likely to be a huge task to add support. I think we're still trying to get the AVX512 support merged so that might give you some idea of the timelines on this sort of change. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |