You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
1
(5) |
2
(3) |
3
(1) |
4
(4) |
5
(1) |
6
(11) |
7
(5) |
|
8
|
9
(6) |
10
(2) |
11
(10) |
12
|
13
|
14
(4) |
|
15
(7) |
16
(1) |
17
(3) |
18
|
19
|
20
|
21
(1) |
|
22
(1) |
23
|
24
|
25
|
26
|
27
|
28
(4) |
|
29
|
30
|
31
|
|
|
|
|
|
From: Julian S. <js...@ac...> - 2002-12-06 20:19:25
|
Hmm, I tried the following, of course linked with libpthread and it works
perfectly. so no easy leads there :-(
J
#include <sys/poll.h>
#include <stdio.h>
int main ( void )
{
int fd;
struct pollfd tab[1];
fd = fileno(stdin);
tab[0].fd = fd;
tab[0].events = POLLIN;
printf("poll begins ... press return\n" );
poll( tab, 1, -1 );
printf("done\n");
return 0;
}
|
|
From: Julian S. <js...@ac...> - 2002-12-06 20:05:10
|
> Anyway, tell me what you get with the current versions of 61 and 62.
No improvement with OO.
I tried mozilla. It also won't start up, the simulated machine falling
into an endless sequence of poll() calls seperated by nanosleep(13
milliseconds), which afaics is the nonblocking poll() in vg_libpthread.c.
Trying OO with tracing on indicates it spins in the same place.
Hmm. This is very odd. I'm wondering if there is some problem with the
non-D flags (OSZACP) causing "if (res > 0) {" at line 2636 never to
get into the then-clause. Except that if there was such a problem,
most programs wouldn't work (I'd guess).
Opera works, so it's not threaded programs per se. Opera doesn't use
the nonblocking poll, tho.
J
|
|
From: Jeremy F. <je...@go...> - 2002-12-06 19:19:06
|
On Thu, 2002-12-05 at 17:15, Julian Seward wrote:
> I just tried 01- and 61- and 62- and things run faster. bzip2,
> xedit, kate work.
>
> However OO 1.0.1 no longer starts up. You know when you start it, there
> is a splash screen, which sits above all other windows. If I now run it
> with --skin=none --trace-children=yes, the same patch of screen behaves
> as if it is the splash screen, except that the picture of the bluish sky
> with the stylised birds does not appear -- that rectangle of screen contains
> whatever was there before.
>
> Does that make any sense? Can you repro it?
Well, I fiddled about a bit, and the symptom has gone away, without any
obvious changes on my part. The only significant thing I remember doing
was fixing a bug in 61-special-d which flipped the state of D over a
context switch (ie, when the dispatch counter dropped to 0). But when I
could reproduce it, it was only with 62 in place - it worked with the
buggy 61. Interestingly, I can still repro it with 63-chained-indirect
in place, even though that works for everything else (and when that
patch has bugs, they're rarely subtle).
So I'm somewhat confused. I wonder if its a timing/race problem in OO
itself, which we're sometimes hitting and sometimes not?
Anyway, tell me what you get with the current versions of 61 and 62.
J
|
|
From: Jeremy F. <je...@go...> - 2002-12-06 02:26:18
|
On Thu, 2002-12-05 at 17:15, Julian Seward wrote:
> However OO 1.0.1 no longer starts up. You know when you start it, there
> is a splash screen, which sits above all other windows. If I now run it
> with --skin=none --trace-children=yes, the same patch of screen behaves
> as if it is the splash screen, except that the picture of the bluish sky
> with the stylised birds does not appear -- that rectangle of screen contains
> whatever was there before.
>
> Does that make any sense? Can you repro it?
Hm, I don't even get the splash screen (but then I don't normally; maybe
I turned it off at one point). Looking into it. I have some suspicious
around flags and FP instructions.
J
|
|
From: Julian S. <js...@ac...> - 2002-12-06 01:08:10
|
Hi. That's great! I just tried 01- and 61- and 62- and things run faster. bzip2, xedit, kate work. However OO 1.0.1 no longer starts up. You know when you start it, there is a splash screen, which sits above all other windows. If I now run it with --skin=none --trace-children=yes, the same patch of screen behaves as if it is the splash screen, except that the picture of the bluish sky with the stylised birds does not appear -- that rectangle of screen contains whatever was there before. Does that make any sense? Can you repro it? J |
|
From: Jeremy F. <je...@go...> - 2002-12-05 09:16:14
|
I found the last bug in lazy-eflags, which was preventing most code from
working under memcheck. It turned out it was Tag_Left[124]'s habit of
using %ebp as a temp if there were no other dead registers. It was
crashing when it tried to use %ebp while saving the flags value during
this sequence. I changed it to push/pop some other register, thereby
keeping %ebp unchanged.
Performance is pretty good. --skin=none is now under 10 times slower
than native (for my gcc3 benchmark), and memcheck is another factor of 5
slower (addrcheck is 3-4 times slower, which is worse than I would have
expected).
The patch (62-lazy-eflags) keeps track of the flags in one of three
states: UPD_Simd (the baseblock is up to date), UPD_Real (the CPU's
%eflags is up to date) and UPD_Both (both are current). UPD_Both isn't
terribly useful, because the only instructions which read flags but
don't set them are SETcc and Jcc, and Jcc is always at the end of a
basic block anyway (and SETcc isn't that common).
Even if an instruction doesn't use any flags, if it doesn't set all the
flags it must read the flag state so that no unexpected flag state leaks
into the emulated state. The big problem here is the D flag, which is
stored in the eflags register, but is functionally completely different
from the status registers.
Patch 62 depends on 61-special-d, which factors out the D flag from the
rest, meaning the live state of D is not in eflags. Almost all
arithmetic instructions overwrite all the remaining flags, meaning that
mostly a flags fetch is not needed for instructions which don't
explicitly use flags input. The only common instructions which don't
affect all flags are INC and DEC, and they aren't terribly common as the
first flags-using instruction in a basic block.
I'm thinking a small improvement might be gained by making the CPUs
eflags register the default home between basic blocks. The dispatch
loop can be responsible for saving it in the base block when dropping
out of execution, and chaining means that the dispatch loop isn't hit
all that much. (Hm, just noticed a bug which was sometimes failing to
fetch eflags for instructions which do a partial flags update. Two
interesting points: it didn't seem to affect any programs I tried, but
more interestingly, when fixed it slowed things down a fair bit, which
makes me think long-life eflags might help more than I thought.)
I suspect it would only really help with --skin=none; when there's a
real instrumenting skin in place, the flags will be saved out pretty
regularly anyway, and most flags uses are Jcc, which, thanks to
fast-jcc, mostly don't need the flags to be present in the CPU to work
well.
J
|
|
From: Jeremy F. <je...@go...> - 2002-12-04 21:53:11
|
On Wed, 2002-12-04 at 12:52, Julian Seward wrote:
> On contemplation, that solution seems to be good to me. It would more or
> less remove the flag-move overhead for the bog-standard ALU ops -- those
> which set exactly OSZACP. Inc, dec, neg and not will still have to go
> via the expensive route, but hopefully they are not so common.
Yes. From looking at generated code, INC and DEC are the only even
vaguely common instructions which hit this, and they aren't that common.
Separating out D was reasonably easy. It complcates the implementation
of GETF/PUTF, but those are very rare (they're only used for pushf/popf,
and I couldn't find any instances of those being used in real programs;
I had to write a specific test; I guess that's why they're implemented
so slow in real silicon).
> I'm trying to approach a new-code freeze for 2.0. I'd like to take
> 44-symbolic-addr and its dependent 45-memcheck-symaddr. How stable
> is 44 -- is it good enough to ship?
It works for me. The main limitation is the lack of DWARF2 support. I
looked at it the other day, but it is fairly complex (I'll need to
implement or steal a forth interpreter for it). Oh, there is a #define
LAZYSIG 1 which should probably be 0 (it affects whether SIGSEGV is
caught for the whole tracing process or just for each pointer
dereference - at present it is disabled all the time, but this means
that any bugs turn into a silent sulk rather than a useful symptom).
> If this flags stuff can be bought to successful conclusion within the
> next week or so, that would be great to ship too.
That's what I'm currently working on. I think there's an issue around
string ops (which UInstrs are supposed to change Simd flags, and which
aren't?), but I'll send something more detailed later.
J
|
|
From: Julian S. <js...@ac...> - 2002-12-04 20:45:12
|
> I think D is special, and we should treat is as such. As far as I know, > the only instructions which use D are the string instructions, and the > only instructions which change it are STD/CLD. If that's the case, then > we can treat D as a special case. We needn't even store it in EFLAGS; > we just need a bit of state which which we inspect in the code we > generate for string instructions, and which STD/CLD can change. The only > slightly tricky thing about that is making sure that we insert the D > state back into EFLAGS when returning to native execution. > > Of course this doesn't work for other flags. NEG only touches C, so if > we generate: > > neg %eax > pushfl; pop 32(%ebp) > > we'll put random crap into the OSZAP flag state if eflags doesn't > already contain EFLAGS. > > The simple solution is to make all instructions which don't update all > the flags be said to use the flags they leave untouched, which would > generate a flags get. Since many instructions do touch all the flags > except D, this wouldn't be too bad in combination with the suggestion > above. On contemplation, that solution seems to be good to me. It would more or less remove the flag-move overhead for the bog-standard ALU ops -- those which set exactly OSZACP. Inc, dec, neg and not will still have to go via the expensive route, but hopefully they are not so common. I'm trying to approach a new-code freeze for 2.0. I'd like to take 44-symbolic-addr and its dependent 45-memcheck-symaddr. How stable is 44 -- is it good enough to ship? I'll also take 55-ac-clientreq (non-controversial). If this flags stuff can be bought to successful conclusion within the next week or so, that would be great to ship too. J |
|
From: Jeremy F. <je...@go...> - 2002-12-04 01:27:12
|
On Tue, 2002-12-03 at 16:41, Julian Seward wrote:
> Hi. Let me say at the outset that I think your patch (62-) is a much
> better solution than mine; it makes it less likely to get things wrong
> later on, and extends naturally to skins. So I like that.
Thanks.
> And the performance improvements really are excellent.
>
> However -- I'm getting almost all progs segfault at exit, if not before.
Yes, I'm still working on that (the patch I mailed you was more a sketch
than working; there some stuff on my site which works a little better,
but is still pretty buggy). It works fine with --skin=none, but there's
some problems with memcheck. One thing I'm still working on is what
flags state a helper function expects as input and generates as output.
Also, in that patch I kind of conflated two distinct notions. There's
the upd_cc flag, which means (if false) "we're running this instruction
with the intent that it update the simulated CPU state, but we don't
care about the eflags state changes it makes, because they're
overwritten before they're inspected". And there's the first argument
to VG_(new_emit) which means "this instruction does/does not affect the
simulated CPU's flags".
The way its currently implemented, if an instruction has upd_cc false
(ie, don't care about flags state), it ends up saving the simd flags
state so that it isn't affected by the instruction. I'm wondering how
much this actually happens in practice, and whether its worth adding
another flag argument to all those functions.
> The SUBL is the first simd-flag-affecting fn in the block. So I see what your
> scheme does is to note that we are setting the simd flags here, so it lets
> the generated subl set %eflags; the Jleo then copies this to %EFLAGS with
> pushfl ; popl 32(%ebp).
>
> Problem is (according to my analysis) is that subl sets O S Z A C and P, but
> it doesn't set D (the string-op direction flag). Result is that the
> subsequent %EFLAGS := %eflags copy means that the sim'd flags state winds
> up holding the real machine'd D-flag state prior to the subl, which is
> unknown to us.
>
> Am I missing something here? I'd love to be, considering the speed gains :)
Ah, yes, I remember your concern over D.
I think D is special, and we should treat is as such. As far as I know,
the only instructions which use D are the string instructions, and the
only instructions which change it are STD/CLD. If that's the case, then
we can treat D as a special case. We needn't even store it in EFLAGS;
we just need a bit of state which which we inspect in the code we
generate for string instructions, and which STD/CLD can change. The only
slightly tricky thing about that is making sure that we insert the D
state back into EFLAGS when returning to native execution.
Of course this doesn't work for other flags. NEG only touches C, so if
we generate:
neg %eax
pushfl; pop 32(%ebp)
we'll put random crap into the OSZAP flag state if eflags doesn't
already contain EFLAGS.
The simple solution is to make all instructions which don't update all
the flags be said to use the flags they leave untouched, which would
generate a flags get. Since many instructions do touch all the flags
except D, this wouldn't be too bad in combination with the suggestion
above.
The complex solution is to change the uses/sets flags into actual
FlagSets, and manage things on a per-flag level rather than all flags.
I don't think it helps at all, since we can't generate a partial flags
get/put anyway.
J
|
|
From: Julian S. <js...@ac...> - 2002-12-04 00:34:39
|
Hi. Let me say at the outset that I think your patch (62-) is a much
better solution than mine; it makes it less likely to get things wrong
later on, and extends naturally to skins. So I like that.
And the performance improvements really are excellent.
However -- I'm getting almost all progs segfault at exit, if not before.
I ran my canonical inner-loop program and looked at the code. I'm worried
by this:
47: SUBL $0x3E7, %eax (-wOSZACP) [------]
159: 81 E8 E7 03 00 00
subl $0x3E7, %eax
48: INCEIPo $6 [------]
165: C6 45 24 37
movb $0x37, 0x24(%ebp)
49: Jleo $0x8048508 (-rOSZACP) [------]
169: 9C 8F 45 20
pushfl ; popl 32(%ebp)
173: 7F 0D
jnle-8 %eip+13
175: B8 08 85 04 08
movl $0x8048508, %eax
180: 89 45 24
movl %eax, 0x24(%ebp)
183: 0F 0B 0F 0B 90
ud2; ud2; nop
50: JMPo $0x8048539 ($2) [------]
188: B8 39 85 04 08
movl $0x8048539, %eax
193: 89 45 24
movl %eax, 0x24(%ebp)
196: 0F 0B 0F 0B 90
ud2; ud2; nop
The SUBL is the first simd-flag-affecting fn in the block. So I see what your
scheme does is to note that we are setting the simd flags here, so it lets
the generated subl set %eflags; the Jleo then copies this to %EFLAGS with
pushfl ; popl 32(%ebp).
Problem is (according to my analysis) is that subl sets O S Z A C and P, but
it doesn't set D (the string-op direction flag). Result is that the
subsequent %EFLAGS := %eflags copy means that the sim'd flags state winds
up holding the real machine'd D-flag state prior to the subl, which is
unknown to us.
Am I missing something here? I'd love to be, considering the speed gains :)
J
|
|
From: Julian S. <js...@ac...> - 2002-12-03 23:59:18
|
Yes, I dithered for a while about whether to have it as a need or a detail (and I remember you said a need would be better, a while back :) Then I was a bit worried about skins which don't specify the size and get excessive TC flushing as a result, if the avg size turns out to be more than the default. Hence made it mandatory. But perhaps it isn't a big deal. I can see you'd be happier if I made it into a need, so I'll try and do that at the weekend. ------------- Josef W mailed [yesterday?] about a problem in the granularity of the code addr -> sourceloc mapping (I'm not claiming to understand it), possibly to do with some of Jeremy's symtab stuff, or possibly not. Any chance you could have a look at this before you disappear, if you have the time? If not, don't worry. I'd sort-of intended to ship 2.0 over xmas, but I don't think this is a good idea with you off-line for so long. I think a wiser idea is for me to assemble 2.0 as best I can, then emit 2.0pre1, pre2, etc, as I did for the 1.0 series, to stabilise it all. Then if there are outstanding problems with the framework or the cache profiler, you can look at them mid-late-Jan on your return, and we'll be in a good position to ship 2.0 final. How does that sound? J ---------------------------------------------------------------- On Tuesday 03 December 2002 12:03 pm, you wrote: > On Fri, 29 Nov 2002, Julian Seward wrote: > > Complete integration of the new code management (sectored FIFO) story. > > > > This commit adds stats gathering / printing (use -v -v), and selection > > of sector size decided by asking skins, via > > VG_(details).avg_translation_sizeB, the average size of their > > translations. > > > > + VG_(details_avg_translation_sizeB) ( 106 ); > > I wonder if this might be done better as a 'need' rather than a 'detail' > -- ie. make it optional for the skin to set it. Most of the needs are > bools, yes, but not all (sizeof_shadow_chunk isn't). I think it might be > nicer for a skin writer to be able to ignore this at first. What do you > think? > > N |
|
From: Josef W. <Jos...@gm...> - 2002-12-02 22:10:48
|
Hi, recently I checked cachegrind in CVS HEAD. It gives out its information on BB level. E.g. for function foobar(), it gives cost values for foobar, foobar+20, foobar+50. I don't think this is intended, because the results look strange together with the usage of source line granularity. And it skrews up efficiency in the used hashes (35 BBs per source file ??). Its because get_debug_info does a call to the V core which gives back function names with offset. Should I come up with a patch for this? Josef |
|
From: Jeremy F. <je...@go...> - 2002-12-02 10:02:18
|
On Mon, 2002-12-02 at 01:14, Julian Seward wrote:
> Wow, you have a train commute in which you can actually sit down so as
> to use the laptop. Cor. All the folks who commute here <-> London
> (55 miles each way) by train would be jealous. Fortunately not me.
I'm in California. I'm the only person in the state to use public
transport.
> One way to assess this is to run something like the simple loop prog
> I sent in earlier mail, and/or bzip2, which compute for a long time
> without generating any/many translations. If it's only TC flushes
> that hammer us on P4, then the ratios for those progs should match
> more closely the PIII ratios. I might look at this when I get a mo.
Well, I was trying it with a 50 second gcc run (4 seconds native). I
don't know how the working set changed during that run, so maybe it
isn't a good test.
> > I have a partial patch to do something like this. I hijacked VG_(new_emit)
> > by adding the args (Bool emu_flags, FlagSet uses, FlagSet sets). emu_args
> > [...]
> > selectively load and save flags. I haven't got very far with this yet.
>
> Perhaps hang fire on this a couple of days. Last night I started a scheme
> similar to your FPU-liveness thing for eflags, and it got as far as
> delivering me a constant stream of segfaults before I had to give up and
> sleep. I'll try and get it working tonight.
OK. I'll think about MMX some more.
J
|
|
From: Julian S. <js...@ac...> - 2002-12-02 09:06:48
|
[replying to 2 messages at once here] > My desktop machine is a 1.8GHz P4, but its running 2.5, so some things > don't work. Not sure that oprofile is set up, but it shouldn't be hard > to do. > > The machine I do most development on is a 600MHz P3 laptop (mostly > because I seem to do a lot of it on the train). Wow, you have a train commute in which you can actually sit down so as to use the laptop. Cor. All the folks who commute here <-> London (55 miles each way) by train would be jealous. Fortunately not me. > I also followed up your mention of writing within 1k (or 2k) of code in > the trace cache, and it does seem pretty dire. It looks like writing to > memory within a 1k sector of something in the trace cache can cause the > whole trace cache to be dumped. [...] That would also mean that P4 does poorly when running all sort of other dynamic code generation systems (java jits etc) and so if that's the case, which it could well be, there may be stuff out on the web which documents it. I'll have a look. One way to assess this is to run something like the simple loop prog I sent in earlier mail, and/or bzip2, which compute for a long time without generating any/many translations. If it's only TC flushes that hammer us on P4, then the ratios for those progs should match more closely the PIII ratios. I might look at this when I get a mo. > > You'll see I just committed a simple hack which seems to substantially > > mitigate the INCEIP thing; it's a combination of your partial-write idea > > and my always-write-never-add idea. > > So you found that doing the SYNCEIP only-update-when-someone-needs-EIP > wasn't worthwhile? It hardly seemed to make any difference, over several runs of several programs. What is committed now just does one write per INCEIP. I also experimented with a version which did analysis to skip redundant INCEIPs -- in which case the results would have been very similar to your SYNCEIP scheme -- and it made only minor improvements compared with the improvement had from turning read-modify-write (INCEIP) into write (SETEIP, in effect). > I have a partial patch to do something like this. I hijacked VG_(new_emit) > by adding the args (Bool emu_flags, FlagSet uses, FlagSet sets). emu_args > [...] > selectively load and save flags. I haven't got very far with this yet. Perhaps hang fire on this a couple of days. Last night I started a scheme similar to your FPU-liveness thing for eflags, and it got as far as delivering me a constant stream of segfaults before I had to give up and sleep. I'll try and get it working tonight. J |
|
From: Julian S. <js...@ac...> - 2002-12-01 19:52:18
|
> I did see a problem though. I get this if I try to start OO under V in > 2.5: > > ==15504== Nulgrind, a binary JIT-compiler for x86-linux. > ==15504== Copyright (C) 2002, and GNU GPL'd, by Nicholas Nethercote. > ==15504== Using valgrind-1.9.1, a program instrumentation system for > x86-linux. ==15504== Copyright (C) 2000-2002, and GNU GPL'd, by Julian > Seward. ==15504== Estimated CPU clock rate is 1848 MHz > ==15504== For more details, rerun with: -v > ==15504== > > valgrind: vg_scheduler.c:475 (vgPlain_save_thread_state): Assertion > `(void*)vgPlain_threads[tid].ldt == (void*)vgPlain_baseBlock[vgOff_ldt]' > failed. Urr. What distro are you running (== what infrastructure do I need in order to run with a 2.5 kernel?) > > My vague and half-baked plan was to figure out a minimal extension to > > ucode which would allow MMX (MMX, not SSE* or 3DNow!) insns to be > > expressed. I would add GETMMX/PUTMMX to copy a MMX reg to/from a pair of > > TempRegs, and then there is the question of the minimal set of extra uops > > needed to support the packed add/shuffle/etc, whatever that is needed. > > If you want to occupy your feverish imagination considering this, that > > would be cool; I'd like to ship MMX support if possible. It's going to > > have to happen sometime, and the lack of it is more and more a problem. > > I think a good first step would be to parse the MMX/SSE/3Dnow > instructions just enough to be able to skip them, and emit illegal > instructions. That way the executing binary will behave more like a > pre-MMX P5, and apps which want to see SIGILLs will get them. > > Actually emulating the instructions is much harder. I think we're going > to get more bang-for-effort from improving flag handling at the moment. Well, I'd like to deliver the MMX functionality, even if the result is slow. You'll see I just committed a simple hack which seems to substantially mitigate the INCEIP thing; it's a combination of your partial-write idea and my always-write-never-add idea. Unfortunately failed to mention in the commit message that an intended side effect of the change is that INCEIPs no longer change the host condition codes, so that lazy eflags updating, which I think will be helpful, is assisted. I'm contemplating doing a thing similar to your lazy FPU save/restore. Except it needs to be a marginally cleverer; instead of saying simply "the most recent %EFLAGS state is in memory or in the host %eflags", it is helpful to distinguish three cases: most recent in memory, most recent in %eflags, and both of the above (%flags == memory value). This 3rd option allows copy-backs to be NOPd if %eflags hasn't changed. (think: %eflags is a write-back cache for %EFLAGS, so it's useful to have a dirty bit). J |
|
From: Jeremy F. <je...@go...> - 2002-12-01 17:47:58
|
On Sun, 2002-12-01 at 02:31, Julian Seward wrote:
> > V refuses to configure on 2.5, but it seems to work fine if I hack the
> > config script. Is there any particular reason it wouldn't work with
> > 2.5?
>
> Not afaik -- I just have never tried it -- I don't have a 2.5 install.
> Do you know if 2.5 will work on VMware?
Don't know, but I suspect it would work.
I did see a problem though. I get this if I try to start OO under V in
2.5:
==15504== Nulgrind, a binary JIT-compiler for x86-linux.
==15504== Copyright (C) 2002, and GNU GPL'd, by Nicholas Nethercote.
==15504== Using valgrind-1.9.1, a program instrumentation system for x86-linux.
==15504== Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward.
==15504== Estimated CPU clock rate is 1848 MHz
==15504== For more details, rerun with: -v
==15504==
valgrind: vg_scheduler.c:475 (vgPlain_save_thread_state): Assertion `(void*)vgPlain_threads[tid].ldt == (void*)vgPlain_baseBlock[vgOff_ldt]' failed.
sched status:
Thread 1: status = Runnable, associated_mx = 0x0, associated_cv = 0x0
==15504== at 0x40A62A6B: pthread_create (vg_libpthread.c:691)
==15504== by 0x406340AF: oslCreateThread (in /usr/lib/OpenOffice.org1.0/program/libsal.so.3.0.1)
==15504== by 0x40634192: osl_createSuspendedThread (in /usr/lib/OpenOffice.org1.0/program/libsal.so.3.0.1)
==15504== by 0x4184F85E: store::OStorePageDaemon::insert(store::OStorePageBIOS*) (in /usr/lib/OpenOffice.org1.0/program/libstore.so.3.0.1)
Thread 2: status = Runnable, associated_mx = 0x0, associated_cv = 0x0
==15504== at 0x0: ???
Please report this bug to: js...@ac...
> My vague and half-baked plan was to figure out a minimal extension to ucode
> which would allow MMX (MMX, not SSE* or 3DNow!) insns to be expressed.
> I would add GETMMX/PUTMMX to copy a MMX reg to/from a pair of TempRegs, and
> then there is the question of the minimal set of extra uops needed to support
> the packed add/shuffle/etc, whatever that is needed. If you want to occupy
> your feverish imagination considering this, that would be cool; I'd like to
> ship MMX support if possible. It's going to have to happen sometime, and the
> lack of it is more and more a problem.
I think a good first step would be to parse the MMX/SSE/3Dnow
instructions just enough to be able to skip them, and emit illegal
instructions. That way the executing binary will behave more like a
pre-MMX P5, and apps which want to see SIGILLs will get them.
Actually emulating the instructions is much harder. I think we're going
to get more bang-for-effort from improving flag handling at the moment.
J
|
|
From: Julian S. <js...@ac...> - 2002-12-01 10:23:53
|
> V refuses to configure on 2.5, but it seems to work fine if I hack the > config script. Is there any particular reason it wouldn't work with > 2.5? Not afaik -- I just have never tried it -- I don't have a 2.5 install. Do you know if 2.5 will work on VMware? > Also, what's the state of working with the Nvidia drivers? I know one > of the hopes of the segment stuff was making them work, but it still > seems to die on MMX/SSE instructions. Is it that it assumes it will get > a SIGILL if it tries to use SSE on a CPU which doesn't have it? The seg-override stuff is in, but now it seems we need MMX at least. I've been thinking about that a bit. The problem is that MMX has a lot of arithmetic insns (packed add etc) and we need to do a good job in the value-tracking stuff for memcheck. People tell me that uninitialised data is routinely copied via the MMX regs, so the FPU hack (pretend that FPU state is fully defined, and complain about undefinededness at FPU loads) won't work here. My vague and half-baked plan was to figure out a minimal extension to ucode which would allow MMX (MMX, not SSE* or 3DNow!) insns to be expressed. I would add GETMMX/PUTMMX to copy a MMX reg to/from a pair of TempRegs, and then there is the question of the minimal set of extra uops needed to support the packed add/shuffle/etc, whatever that is needed. If you want to occupy your feverish imagination considering this, that would be cool; I'd like to ship MMX support if possible. It's going to have to happen sometime, and the lack of it is more and more a problem. J |
|
From: Jeremy F. <je...@go...> - 2002-12-01 06:50:01
|
V refuses to configure on 2.5, but it seems to work fine if I hack the
config script. Is there any particular reason it wouldn't work with
2.5?
Also, what's the state of working with the Nvidia drivers? I know one
of the hopes of the segment stuff was making them work, but it still
seems to die on MMX/SSE instructions. Is it that it assumes it will get
a SIGILL if it tries to use SSE on a CPU which doesn't have it?
J
|
|
From: Julian S. <js...@ac...> - 2002-12-01 02:02:50
|
> Great. How about the multiply patch? Maybe, but is not high priority. We'll see. > > chainings, # unchainings, and # of jumps via the dispatcher. It gives a > > worst-case indirect count of about 16% for KDE apps. > > Statically or dynamically? Dynamically. > > * you had VG_MAX_JUMPS set to 4; almost all bbs have 2 or less > > jumps. Is there a reason for having it at 4? I changed it to 2. > > Seems to work; it that OK ? Saved 1 word per TCEntry compared with 4. > > Initially it wouldn't work unless every BB had VG_MAX_JUMPS or fewer > jumpsites. I added the sanity check to generate the fallback path later > on. Ha, good point. > > Next on my hit list is 51-kill-inceip. > > So you've decided to go with the SYNCEIP idea? Well, something definitely needs to be improved here, and SYNCEIP seems a plausible start point, so I want to look at it. J |