|
From: Julian S. <js...@ac...> - 2002-12-07 00:28:14
|
On Friday 06 December 2002 11:33 pm, Jeremy Fitzhardinge wrote: > On Fri, 2002-12-06 at 15:11, Julian Seward wrote: > > Re my analysis of stack-clearing add, I can't be exactly right, since > > VG_(emit_add_lit_to_esp) begins with the correct call to new_emit. > > No, it isn't correct - True means "operate on Simd flags"; False means > "non-simd flags". > > Fixing this seems to work, and even has a slight performance improvement > (OO starts up in 47 rather than 48 seconds). At very least it seems > performance neutral. Well, changing that True to False makes OO and moz work fine for me, which is great. So I guess that's pretty much the end of the matter? It seems like the right fix to me; does it also to you? -------- Now that this looks like this is going to work, is it worth keeping the fancy test stuff you did in synth_jcond_lit ? I ask because you have a better understanding of the ramifications of this latest eflags stuff, and so can probably make a better decision. If it's worth keeping ... I did actually write code to also do the 4 missing cases: (SF xor OF) == 0 or 1, ((SF xor OF) or ZF) == 0 or 1, and it did seem to help sometimes. But that was before your latest patch. If it's not worth keeping ... let's nuke it. ---------------- I've been wondering a bit about the dismal performance on P4s. One thing that occurred to me is that the preamble sequence "decl bbs_to_go; jnz ..." is going to hit the P4's partial-flag-write penalty (page 2-55 of the p4 optimisation guide, "Use of the inc and dec instructions"). It'd be interesting try changing it to a subl $1, ... as they recommend. Or perhaps not ... according to their tables the latency difference is only 0.5 cycle. J |