Am 18.09.2013 19:01, schrieb Darren Salt:
> I demand that Roland Scheidegger may or may not have written...
>> Am 18.09.2013 16:46, schrieb Darren Salt:
>>> I demand that Chris Rankin may or may not have written...
>>>> I've applied this patch to the xine-lib branch, but my attempts also to
>>>> merge it across from there to the xine-lib-1.2 branch have instead
>>>> mangled my local repository in ways I would not have believed possible...
>>> I cherry-picked it and pushed it, assuming that all was well with it,
> [snip run-time breakage]
>>> (CCing the one responsible for the breakage regardless of subscription to
>>> the list.)
>> Ehh that doesn't make sense. Looks like vzeroupper got applied to
>> sse_memcpy instead of avx_memcpy.
> Ah. So the fault is Chris Rankin's...
>> No idea why or how but I've got nothing to do with it :-). My patch was
>> against xine-lib 1.2 fwiw.
> Okay, fixing.
> Incidentally, are there any CPUs out there with AVX but *without* this
No that's immpossible this is part of avx (though I'm unsure if this is
strictly necessary for something like Bulldozer which really executes
avx-256 as 2 128bit instructions, so it might not have a avx-sse
transition penalty, but in any case that instruction still works).
I think emitting vzeroupper after you've touched ymm regs is a very very
common pattern, pretty much the only way how you can deal with the
sse-avx transition penalty problem in some sane way. Unless you know all
your linked in functions are only using avx and not sse (or no simd
instructions at all of course). Compilers also emit that when they are
using avx-256 instructions themselves.
(You could use "vzeroall" instead of "vzeroupper" hence clearing the
registers completely instead of just the upper 128bits. Noone seems to
do that however.)