From: Julian S. <js...@ac...> - 2002-10-04 10:44:56
|
Cobbling together a response to this from the archives, since I didn't get it via the normal routes. > This patch makes FPU state changes lazy, so there should only be one > save/restore pair per basic block. With this change in place, > FPU-intensive programs (in my case, some 3D code using OpenGL) are > significantly faster. Interesting. This is something I'd wondered about doing at the time I did the FPU stuff in the first place. How much faster is "significantly faster" ? So, my main point. I think this patch is unsafe and will lead to hard to find problems down the line. The difficulty is that it allows the simulated FPU state to hang around in the real FPU for long periods, up to a whole basic block's worth of execution (if I understand it write). We only need a skin to call out to a helper function which modifies the real FPU state on some obscure path, and we're hosed. Since we don't have any control over what skins people might plug in, this seems like and unsafe modification to the core. The modification I had in mind for a while was a lot more conservative, and more along the lines of a peephole optimisation. Essentially if we see a FPU-no-mem op followed by another FPU-no-mem op we can skip the save at the end of the first and the restore at the start of the second. Looking at the stable branch vg_from_ucode.c and the codegen cases for FPU, FPU_R and FPU_W it's clear we can also do the same for FPU_R/W followed by FPU since there is no calls to helpers in the gap between these two. Or am I missing something? It would definitely be good to speed up the FPU stuff a bit, but I need to be convinced that you've got this 100% tied down in a not-too-complex way, in the face of arbitrary actions carried out by skins-not-invented-yet. J |