|
From: Julian S. <js...@ac...> - 2015-08-12 11:26:25
|
In revs 3169, 15522, 15523, I finally implemented the XSAVE and XRSTOR instructions which are associated with the AVX instruction set. These provide saving and restoring for the new state introduced by AVX -- that is, the YMM high half registers. They also generalise the existing FXSAVE/FXRSTOR instructions (for SSE+X87) and FSAVE/FRSTOR instructions (X87 only). As part of this I added a new CPUID implementation that reflects AVX2 capable processors. As a result of this it should be possible to run code that requires XSAVE/XRSTOR, in particular AVX and AVX2 code generated by the Intel compiler. These instructions are a nightmare of complexity. I think I didn't break anything, but there is a risk of fallout, particularly for the OSX and Solaris ports. So it would be good to watch out for that. J |
|
From: Florian K. <fl...@ei...> - 2015-08-12 11:52:33
|
On 12.08.2015 13:26, Julian Seward wrote:
>
> In revs 3169, 15522, 15523, I finally implemented the XSAVE and
> XRSTOR instructions which are associated with the AVX instruction
> set. These provide saving and restoring for the new state introduced
> by AVX -- that is, the YMM high half registers. They also generalise
> the existing FXSAVE/FXRSTOR instructions (for SSE+X87) and FSAVE/FRSTOR
> instructions (X87 only).
>
> As part of this I added a new CPUID implementation that reflects AVX2
> capable processors.
>
> As a result of this it should be possible to run code that requires
> XSAVE/XRSTOR, in particular AVX and AVX2 code generated by the Intel
> compiler.
>
> These instructions are a nightmare of complexity. I think I didn't
> break anything, but there is a risk of fallout, particularly for the
> OSX and Solaris ports. So it would be good to watch out for that.
I see this:
Making check in memcheck
Making check in .
Making check in tests
Making check in .
Making check in x86
Making check in amd64
xsave-avx.c: In function 'do_setup_then_xsave':
xsave-avx.c:99:4: error: unknown register name 'ymm0' in 'asm'
__asm__ __volatile__("vmovups (%0), %%ymm0" : : "r"(&vec0[0]) :
"ymm0" );
^
xsave-avx.c:100:4: error: unknown register name 'ymm1' in 'asm'
__asm__ __volatile__("vmovups (%0), %%ymm1" : : "r"(&vec1[0]) :
"ymm1" );
^
cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz
stepping : 7
microcode : 0x18
cpu MHz : 800.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx
est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt
tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts
dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 4390.03
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
Florian
|
|
From: Rhys K. <rhy...@gm...> - 2015-08-12 14:23:25
|
As of r15528, no new regressions on OS X and the test suite runs through successfully on my hardware. Regards, Rhys On 12 August 2015 at 21:26, Julian Seward <js...@ac...> wrote: > > In revs 3169, 15522, 15523, I finally implemented the XSAVE and > XRSTOR instructions which are associated with the AVX instruction > set. These provide saving and restoring for the new state introduced > by AVX -- that is, the YMM high half registers. They also generalise > the existing FXSAVE/FXRSTOR instructions (for SSE+X87) and FSAVE/FRSTOR > instructions (X87 only). > > As part of this I added a new CPUID implementation that reflects AVX2 > capable processors. > > As a result of this it should be possible to run code that requires > XSAVE/XRSTOR, in particular AVX and AVX2 code generated by the Intel > compiler. > > These instructions are a nightmare of complexity. I think I didn't > break anything, but there is a risk of fallout, particularly for the > OSX and Solaris ports. So it would be good to watch out for that. > > J > |
|
From: Ivo R. <ivo...@gm...> - 2015-08-13 06:24:08
|
2015-08-12 13:26 GMT+02:00 Julian Seward <js...@ac...>: > > These instructions are a nightmare of complexity. I think I didn't > break anything, but there is a risk of fallout, particularly for the > OSX and Solaris ports. So it would be good to watch out for that. Valgrind builds fine and almost all the regression tests run successfully. In particular: $ perl tests/vg_regtest memcheck/tests/amd64/xsave-avx xsave-avx: valgrind -q ./xsave-avx x == 1 test, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == ================================ However I see the following failure: perl tests/vg_regtest none/tests/amd64/xacq_xrel xacq_xrel: valgrind ./xacq_xrel *** xacq_xrel failed (stdout) *** == 1 test, 0 stderr failures, 1 stdout failure, 0 stderrB failures, 0 stdoutB failures, 0 post failures == none/tests/amd64/xacq_xrel (stdout) $ cat none/tests/amd64/xacq_xrel.stdout.diff --- xacq_xrel.stdout.exp 2015-08-13 07:39:34.434676842 +0200 +++ xacq_xrel.stdout.out 2015-08-13 07:55:55.836576834 +0200 @@ -13,7 +13,7 @@ result for 'btr' is 5555555555554515 result for 'bts' is 57d555555f555d55 result for 'cmpxchg' is 271831415927d459 -result for 'cmpxchg8b' is 5566778800000000 +result for 'cmpxchg8b' is 55667788ffffaef0 result for 'xadd' is d1c2dbecb622f897 result for 'xchg' is 5555555555555555 result for 'xchg-no-lock' is 5555555555555555 Native run of xacq_xrel gives: ... result for 'cmpxchg8b' is 55667788bf754ef0 ... What could be wrong in xacq_xrel w.r.t. cmpxchg8b? I. |
|
From: Julian S. <js...@ac...> - 2015-08-13 09:55:55
|
On 13/08/15 08:23, Ivo Raisr wrote:
> However I see the following failure:
> perl tests/vg_regtest none/tests/amd64/xacq_xrel
I'm not sure why that would have failed as a result of this change.
But anyway. Looking at that test, I think the inline assembly has
always been wrong. It mentions rdx twice but doesn't mention rbx at
all; it should mention each exactly once.
Can you try the diff below and see if you get the same results
natively and on V ? In both cases I now get (on Linux)
result for 'cmpxchg8b' is 55667788bbaa9988
Thanks.
J
Index: none/tests/amd64/xacq_xrel.c
===================================================================
--- none/tests/amd64/xacq_xrel.c (revision 15530)
+++ none/tests/amd64/xacq_xrel.c (working copy)
@@ -165,7 +165,7 @@
"xorq %%rax, %%rax" "\n\t"
"xorq %%rdx, %%rdx" "\n\t"
"movabsq $0x1122334455667788, %%rcx" "\n\t"
- "movabsq $0xffeeddccbbaa9988, %%rdx" "\n\t"
+ "movabsq $0xffeeddccbbaa9988, %%rbx" "\n\t"
"xacquire lock cmpxchg8b (%0)" "\n\t"
"xrelease lock cmpxchg8b (%0)" "\n\t"
: : "r"(&n) : "cc", "memory", "rax", "rdx", "rcx", "rdx"
|
|
From: Ivo R. <ivo...@gm...> - 2015-08-14 17:17:02
|
2015-08-13 11:55 GMT+02:00 Julian Seward <js...@ac...>: > On 13/08/15 08:23, Ivo Raisr wrote: > > However I see the following failure: > > perl tests/vg_regtest none/tests/amd64/xacq_xrel > > I'm not sure why that would have failed as a result of this change. > But anyway. Looking at that test, I think the inline assembly has > always been wrong. It mentions rdx twice but doesn't mention rbx at > all; it should mention each exactly once. > > Can you try the diff below and see if you get the same results > natively and on V ? In both cases I now get (on Linux) > > result for 'cmpxchg8b' is 55667788bbaa9988 > Thank you for providing a fix for this test case. Indeed, it is not caused by the recent change in AVX xsave/xrstor. Now the test case passes successfully. I. |