|
From: Julian S. <js...@ac...> - 2002-12-04 00:34:39
|
Hi. Let me say at the outset that I think your patch (62-) is a much
better solution than mine; it makes it less likely to get things wrong
later on, and extends naturally to skins. So I like that.
And the performance improvements really are excellent.
However -- I'm getting almost all progs segfault at exit, if not before.
I ran my canonical inner-loop program and looked at the code. I'm worried
by this:
47: SUBL $0x3E7, %eax (-wOSZACP) [------]
159: 81 E8 E7 03 00 00
subl $0x3E7, %eax
48: INCEIPo $6 [------]
165: C6 45 24 37
movb $0x37, 0x24(%ebp)
49: Jleo $0x8048508 (-rOSZACP) [------]
169: 9C 8F 45 20
pushfl ; popl 32(%ebp)
173: 7F 0D
jnle-8 %eip+13
175: B8 08 85 04 08
movl $0x8048508, %eax
180: 89 45 24
movl %eax, 0x24(%ebp)
183: 0F 0B 0F 0B 90
ud2; ud2; nop
50: JMPo $0x8048539 ($2) [------]
188: B8 39 85 04 08
movl $0x8048539, %eax
193: 89 45 24
movl %eax, 0x24(%ebp)
196: 0F 0B 0F 0B 90
ud2; ud2; nop
The SUBL is the first simd-flag-affecting fn in the block. So I see what your
scheme does is to note that we are setting the simd flags here, so it lets
the generated subl set %eflags; the Jleo then copies this to %EFLAGS with
pushfl ; popl 32(%ebp).
Problem is (according to my analysis) is that subl sets O S Z A C and P, but
it doesn't set D (the string-op direction flag). Result is that the
subsequent %EFLAGS := %eflags copy means that the sim'd flags state winds
up holding the real machine'd D-flag state prior to the subl, which is
unknown to us.
Am I missing something here? I'd love to be, considering the speed gains :)
J
|