From: Roland S. <rsc...@hi...> - 2013-09-18 18:14:57
|
Am 18.09.2013 19:01, schrieb Darren Salt: > I demand that Roland Scheidegger may or may not have written... > >> Am 18.09.2013 16:46, schrieb Darren Salt: >>> I demand that Chris Rankin may or may not have written... >>>> I've applied this patch to the xine-lib branch, but my attempts also to >>>> merge it across from there to the xine-lib-1.2 branch have instead >>>> mangled my local repository in ways I would not have believed possible... >>> I cherry-picked it and pushed it, assuming that all was well with it, >>> but... > [snip run-time breakage] >>> (CCing the one responsible for the breakage regardless of subscription to >>> the list.) > >> Ehh that doesn't make sense. Looks like vzeroupper got applied to >> sse_memcpy instead of avx_memcpy. > > Ah. So the fault is Chris Rankin's... > >> No idea why or how but I've got nothing to do with it :-). My patch was >> against xine-lib 1.2 fwiw. > > Okay, fixing. > > Incidentally, are there any CPUs out there with AVX but *without* this > instruction? > No that's immpossible this is part of avx (though I'm unsure if this is strictly necessary for something like Bulldozer which really executes avx-256 as 2 128bit instructions, so it might not have a avx-sse transition penalty, but in any case that instruction still works). I think emitting vzeroupper after you've touched ymm regs is a very very common pattern, pretty much the only way how you can deal with the sse-avx transition penalty problem in some sane way. Unless you know all your linked in functions are only using avx and not sse (or no simd instructions at all of course). Compilers also emit that when they are using avx-256 instructions themselves. (You could use "vzeroall" instead of "vzeroupper" hence clearing the registers completely instead of just the upper 128bits. Noone seems to do that however.) Roland |