[Valgrind-developers] re: lazy FPU state save/restore

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Cobbling together a response to this from the archives, since I didn't
get it via the normal routes.

> This patch makes FPU state changes lazy, so there should only be one
> save/restore pair per basic block.  With this change in place,
> FPU-intensive programs (in my case, some 3D code using OpenGL) are
> significantly faster.

Interesting.  This is something I'd wondered about doing at the time 
I did the FPU stuff in the first place.  

How much faster is "significantly faster" ?

So, my main point.  I think this patch is unsafe and will lead to hard
to find problems down the line.  The difficulty is that it allows the
simulated FPU state to hang around in the real FPU for long periods,
up to a whole basic block's worth of execution (if I understand it
write).  

We only need a skin to call out to a helper function which modifies
the real FPU state on some obscure path, and we're hosed.  Since we don't
have any control over what skins people might plug in, this seems like
and unsafe modification to the core.

The modification I had in mind for a while was a lot more conservative,
and more along the lines of a peephole optimisation.  Essentially
if we see a FPU-no-mem op followed by another FPU-no-mem op we can
skip the save at the end of the first and the restore at the start of
the second.

Looking at the stable branch vg_from_ucode.c and the codegen cases
for FPU, FPU_R and FPU_W it's clear we can also do the same for 
FPU_R/W followed by FPU since there is no calls to helpers in the
gap between these two.

Or am I missing something?  It would definitely be good to speed up the
FPU stuff a bit, but I need to be convinced that you've got this 100% 
tied down in a not-too-complex way, in the face of arbitrary actions
carried out by skins-not-invented-yet.

J