|
From: Julian S. <js...@ac...> - 2002-12-10 00:25:35
Attachments:
typescript-moz-loop.txt
|
Results from this evening's testing of the head: - Works OK on R H 7.2 (it builds, mozilla-1.0, OO-1.0.1 run on all skins) - Ditto R H 7.3, R H 8.0, SuSE 8.1 - After some futzing, got it to build again on R H 6.2 (our oldest supported platform). Two strange things: --skin=cachegrind causes an instant segfault at startup, before anything is printed. It's so quick I wonder if the dynamic linker is crashing. mozilla-1.2.1 (binary .tar.gz build downloaded from mozilla.org) runs OK on nulgrind, addrcheck, but spins after 100 million ish bbs on memcheck, so it draws part of a window and never progresses. This could be a R H 6.2 problem or a virtual CPU problem which only shows up with that 1.2.1 build -- haven't checked on any other distros. I bet its some kinda flag wierdness tho, considering it works ok on some skins. I got a quick trace with gdb and it's definitely in a loop. I'll have a look at it perhaps late tomorrow night; out of time now. Trace is attached. J |
|
From: Julian S. <js...@ac...> - 2002-12-10 23:28:07
Attachments:
typescript-moz-loop.txt
|
(mozilla-1.2.1 was looping with memcheck ...) > > It all _looks_ plausible. I'm a bit mystified. You sure this j[n]p > > trick in 69- has no strange side-effects? I can't think of any. Perhaps > > this is a red herring. > > Looks OK to me, but its a bit hard to tell without seeing the original > code. > > What happens if you change it back to the popf slow path? Still happen? I dunno; I removed the popf stuff. However, backing out 69- makes it work properly. I identified the original code: 0x40224f10 mov 0x4(%edi),%eax 0x40224f13 mov 0x10(%eax),%eax 0x40224f16 mov %eax,0x4(%edi) 0x40224f19 mov 0x10(%eax),%edx 0x40224f1c mov 0x4(%ecx),%eax 0x40224f1f cmp 0x4(%edx),%eax 0x40224f22 jl 0x40224f10 Attached is the cleaned-up and annotated memcheck translation. The stuff to do with cmp and jl looks OK to me; the %eflags value set by the cmp (simulation) is correctly copied off to safety before the stuff for the jl, and the relevant simd test for JL looks right. So I'm still mystified. One unedifying explaination is that this translation is correct, and the reason it is looping is that some earlier translation has written bogus values into memory, which the above loop is picking up and looping on. I don't fancy chasing that down. I'm going to back out 69- from cvs until we have a clearer picture what's going on. Do shout if you have any ideas at all. It makes me uneasy that I don't know what's going on here. > One possibility I've been thinking about is whether there's any code > which depends on the undefined flags behaviour of instructions. It > would be a (compiler?) bug, but it might change the behaviour of real > programs. Um, that's not good. Should I be concerned? > The simulated CPU will leak lots of real flags into the undefined > flags. The solution would be to add an undef_flags argument to > new_emit, and add a --paranoid-flags=yes|no command line option; > new_emit could then decide whether to fetch the flags or not. A quick > test to see if that's happening in this case is to force new_emit to > always make sure the simulated flags are current before every simulated > instruction. > > At one point I hacked none to "instrument" the code to trash the real > flags between every UInstr. Unfortunately I think I lost this (and it > was a bit of a blight on none's purity). Is there a good existing skin > for this kind of skulduggery? Umm, I'm not sure what you mean by good. Memcheck is probably the most demanding in that nearly every original ucode is preceded by instrumentation which very likely trashes (real) eflags. Is that what you meant? If there's some way in which you could hack a skin to do a stress-test of your flags machinery, that would be very helpful. J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 00:12:29
|
On Tue, 2002-12-10 at 15:35, Julian Seward wrote: > However, backing out 69- makes it work properly. Very mysterious. > So I'm still mystified. One unedifying explaination is that this translation > is correct, and the reason it is looping is that some earlier translation has > written bogus values into memory, which the above loop is picking up and > looping on. I don't fancy chasing that down. Since the only code which cares about flags are the last two instructions, and they look correct to me, it must be the data they're operating on... > I'm going to back out 69- from cvs until we have a clearer picture what's > going on. Do shout if you have any ideas at all. It makes me uneasy that > I don't know what's going on here. So, this only happens with Mozilla on RH6.2, compiled with some version of egcs? Can you reproduce anything similar with other egcs-generated code? > > One possibility I've been thinking about is whether there's any code > > which depends on the undefined flags behaviour of instructions. It > > would be a (compiler?) bug, but it might change the behaviour of real > > programs. > > Um, that's not good. Should I be concerned? Dunno. It's easy to fix: just add a line into VG_(new_emit)() saying something like: if (set_flags != FlagsEmpty) maybe_emit_get_flags(); which would always make sure that if anyone sets the flags, they start with the simulated flags state in the CPU. A lot of the arithmetic instructions have an undefined effect on some set of flags. I interpret that as being the same as setting them (that is to say, no correct program can rely on them being unchanged by the instruction, so don't bother to preserve their values). It may be that some code "knows" that undefined actually means unchanged, and relies on that behaviour. In which case the conservative thing for us to do is treat undefined as meaning unchanged, and emit considerably more flags fetches (which basically punts the problem to Intel/AMD/Via/Transmeta/etc, because the CPU still has to have an interpretation of what undefined actually means; there's probably a lot of lore about the detailed behaviour of the instructions which goes way beyond their formal description in Vol2). I'm not saying it has any bearing on the present problem, but it would be an interesting experiment to try. > Umm, I'm not sure what you mean by good. Memcheck is probably the most > demanding in that nearly every original ucode is preceded by instrumentation > which very likely trashes (real) eflags. Is that what you meant? > > If there's some way in which you could hack a skin to do a > stress-test of your flags machinery, that would be very helpful. Yes. I might put together a testbed skin. BTW, I'm having a go at implementing your addrcheck cache idea. It isn't working out quite as well as I'd like. J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 00:44:48
Attachments:
74-paranoid-flags.patch
|
On Tue, 2002-12-10 at 15:35, Julian Seward wrote: > (mozilla-1.2.1 was looping with memcheck ...) > > > > > It all _looks_ plausible. I'm a bit mystified. You sure this j[n]p > > > trick in 69- has no strange side-effects? I can't think of any. Perhaps > > > this is a red herring. > > > > Looks OK to me, but its a bit hard to tell without seeing the original > > code. > > > > What happens if you change it back to the popf slow path? Still happen? > > I dunno; I removed the popf stuff. > > However, backing out 69- makes it work properly. Try the attached (74-paranoid-flags) with 69- still applied see if it helps (try with --paranoid-flags=yes and no). I also found some code passing the old args to new_emit, which may have been causing a problem. J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 01:59:13
|
On Tue, 2002-12-10 at 15:35, Julian Seward wrote: > (mozilla-1.2.1 was looping with memcheck ...) > > > > > It all _looks_ plausible. I'm a bit mystified. You sure this j[n]p > > > trick in 69- has no strange side-effects? I can't think of any. Perhaps > > > this is a red herring. > > > > Looks OK to me, but its a bit hard to tell without seeing the original > > code. > > > > What happens if you change it back to the popf slow path? Still happen? > > I dunno; I removed the popf stuff. > > However, backing out 69- makes it work properly. > > I identified the original code: > > 0x40224f10 mov 0x4(%edi),%eax > 0x40224f13 mov 0x10(%eax),%eax > 0x40224f16 mov %eax,0x4(%edi) > 0x40224f19 mov 0x10(%eax),%edx > 0x40224f1c mov 0x4(%ecx),%eax > 0x40224f1f cmp 0x4(%edx),%eax > 0x40224f22 jl 0x40224f10 > > Attached is the cleaned-up and annotated memcheck translation. The stuff > to do with cmp and jl looks OK to me; the %eflags value set by the > cmp (simulation) is correctly copied off to safety before the stuff for > the jl, and the relevant simd test for JL looks right. OK, I get the same thing. I'll try playing around with it. J |
|
From: Julian S. <js...@ac...> - 2002-12-11 00:52:20
|
> So, this only happens with Mozilla on RH6.2, compiled with some version > of egcs? Can you reproduce anything similar with other egcs-generated > code? The mozilla I was running was a 1.2.1 binary build (the straight .tar.gz) from ftp.mozilla.org, so egcs is not in the picture, and I would expect this problem to occur using that binary build on any distro -- the loop is in some .so supplied in the .tar.gz, so it'll be the same for everyone (I guess). Thanks for 74-; I'll try it tomorrow evening. Almost out of time now. > BTW, I'm having a go at implementing your addrcheck cache idea. It > isn't working out quite as well as I'd like. You are?! I had better reply to your initial comments on it ... J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 01:44:10
|
On Tue, 2002-12-10 at 16:59, Julian Seward wrote: > The mozilla I was running was a 1.2.1 binary build (the straight .tar.gz) > from ftp.mozilla.org, so egcs is not in the picture, and I would expect > this problem to occur using that binary build on any distro -- the loop > is in some .so supplied in the .tar.gz, so it'll be the same for everyone > (I guess). But you only see a problem under RH6.2? Is it this build: http://ftp.mozilla.org/pub/mozilla/releases/mozilla1.2.1/mozilla-i686-pc-linux-gnu-1.2.1.tar.gz It could still be some interesting interaction between the system libraries and moz itself... I'll see if I can reproduce the problem. > > BTW, I'm having a go at implementing your addrcheck cache idea. It > > isn't working out quite as well as I'd like. > > You are?! I had better reply to your initial comments on it ... My first impression is that cache maintenance overwhelms any benefit of making the fast path faster. On the other hand, I may still be doing something wrong. I'll put the patch up for inspection. The much more interesting contribution is 72-jump, which adds a helper mechanism for computing relative jump offsets rather than always having to hand-compute them (and double-guess the emitters). I implemented it out of necessity because I wanted to do a jump over a sync_ccall site, but it turned out to work well in every other instance of a jcond_lit, and it cleans things up nicely. J |