You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
1
(1) |
2
|
|
3
|
4
|
5
(2) |
6
(3) |
7
|
8
(2) |
9
(3) |
|
10
(3) |
11
(5) |
12
(1) |
13
|
14
(21) |
15
(6) |
16
(4) |
|
17
(9) |
18
(13) |
19
(15) |
20
(15) |
21
(11) |
22
(16) |
23
(4) |
|
24
|
25
(8) |
26
(4) |
27
(3) |
28
(1) |
29
|
30
(2) |
|
From: Jeremy F. <je...@go...> - 2002-11-16 18:17:07
|
On Sat, 2002-11-16 at 04:09, Julian Seward wrote: > During a long train journey late this summer I worked through most of the > design details needed to support t-chaining cleanly. Then I forgot most of > them. I am inclined to agree, it's an obvious (perhaps overdue) optimisation > which should be looked into. Having said that, my priority is still to freeze > and ship 2.0 tho. > > The basic idea is that each translation exists in one of two states: > chained and unchained, and can be moved back and forth between them as > needed. > > - chained means that jumps out of it to known addresses jump directly > to the target translation. > > - unchained means we always do a lookup in the orig->new code address > mapping, ie we go via the dispatcher > > New translations are created in the unchained state. Permanently associated > with each translation is enough metadata to facilitate chaining or unchaining > it at will. > > When an unchained translation wants to make a jump to a known (orig)address, > it pushes the orig-address it wants to call, and *calls* "patch_me" > which is a short piece of assembly code. This pops the args (orig-addr) > and also pops the return address -- which points just after the call > insn on the original translation. patch_me can arrange to find the > translation and patch the caller to jump directly to it. > > There is some fiddly stuff to be sorted out here: > > - how to most cleanly and robustly store info to enable chaining/unchaining > > - how to minimise the number of magic assembly code sequences needed (these > amount, you'll notice, to an ultra-minimal runtime linker) Well, I suppose there's two: there's the sequence the codegen generates for jumps, and there's patch_me. Neither of those are complex. > > - how to cleanly deal with jumps to unknown addresses, which always require > a lookup > > - how to deal with jumps which have "extra semantics", ie a JumpSyscall or > JumpClientReq, etc. Ignore them - just generate the code for them we generate now. > - how to handle the event-counter falling to zero in chained translations I think generate the decrement inline and fall into the dispatcher if we hit 0. > Finally -- and this is the last part of the trick -- whenever we want to > move or discard any translations, we first unchain *all* of them. OK, that's nice and simple. > - Jeremy: you mentioned something about possibly calling _from_ translations > to the translator machinery if a target translation is missing. I prefer to > stick with the structure as it stands on the basis it's less rugged, in > which translations run as the highest point in the call stack. For all > exceptional situations (missing translation, JmpSyscall, etc) the > translations return to C land (the scheduler) which handles the situation. > Therefore the call stack looks like one of the following: > > scheduler(C) -> run_thread_for_a_while(C) -> run_innerloop(ASM) -> > (translations) > > or > > scheduler(C) -> translation-generating-machinery(C) > > But specifically I never have > > scheduler(C) -> run_thread_for_a_while(C) -> run_innerloop(ASM) -> > some-translation -> translation-generating-machinery(C) > > there is no case where C land runs a translation which calls back into C > land, and I think that is more robust. > > [ok, not entirely true; translations call helper fns, but these are > pretty simple and don't mess with the global translation state at all] Well, OK, but you didn't address what patch_me would do if the target address isn't present. Would it fall back into dispatcher loop, who would then trigger a codegen, and then the next time through this BB we'd do the chaining? That seems reasonable to me. > So: I have no time to chase up any of this stuff (apart from discuss possible > designs), but if you feel the need to do some feasability-assessment hacking, > please do! It would be very interesting to know if the extra performance gain > is worth the complication. I'll look at it if I get a moment. I want to finish up everything I've got open at the moment, with the expectation I'll have very little hacking time available in a month or two (whereupon my first-born appears and I get that harried new-parent look). > If it can be done simply and cleanly I'm in favour. Generally my approach is > to shoot for 80% of the available performance for 20% of the complication. > This strikes me as good engineering for a resource-constrained small group. > See http://www.cs.princeton.edu/software/lcc for a strikingly effective > demonstration of the same attitude. Yes, I like lcc's internals. J |
|
From: Jeremy F. <je...@go...> - 2002-11-16 17:59:18
|
On Sat, 2002-11-16 at 03:18, Julian Seward wrote:
> 43-nonblock-readwritev
> Does this duplicate wait_for_fd_to_be_writable_or_erring? it seems to be
> added by the patch and I'm sure that function exists already somewhere.
I looked around, but didn't see it. I was a bit surprised too.
> 27-nvalgrind
>
> Small request: could you possibly not reuse the sequence numbers, ever? It's
> confusing, and having a unique sequence is kinda useful in .. well, knowing
> the order in which the patches notionally exist. Thanks.
OK. The main reason is that its a little cumbersome for me to rename
patches, but that shouldn't be too hard to fix.
Also, I think I'll take you up on CVS access now that things are
beginning to settle down a bit.
> I'm documentation hacking again this weekend. Is there any progress on the
> helgrind docs? That would be nice :-)
No, I'm still working on symbolic addresses. It seems to work well for
my small test programs, but less often for big code. I'm still trying
to work out why, but I think it's because the compiler will see a
definition like:
struct foo {
struct bar *thing;
};
struct bar {
...
}
and generate an undefined reference to struct bar for foo.thing rather
than a reference to the full structural definition which follows.
But an example of when it does work:
==19268== Thread 3:
==19268== Possible data race writing variable at 0x8049A60 (g.0+8)
==19268== ...
==19268== Address 0x8049A60 is &(globals[4]->glob2) at mutex.c:50
I think it is close enough to be merged in as a somewhat experimental,
incomplete feature with a clo with defaults to off. Before that, I want
to split the stabs and DWARF2-specific parts of vg_symtab2.c into
separate files, because its becoming a bit unwieldy at the moment.
J
|
|
From: Julian S. <js...@ac...> - 2002-11-16 12:06:21
|
During a long train journey late this summer I worked through most of the
design details needed to support t-chaining cleanly. Then I forgot most of
them. I am inclined to agree, it's an obvious (perhaps overdue) optimisation
which should be looked into. Having said that, my priority is still to freeze
and ship 2.0 tho.
The basic idea is that each translation exists in one of two states:
chained and unchained, and can be moved back and forth between them as
needed.
- chained means that jumps out of it to known addresses jump directly
to the target translation.
- unchained means we always do a lookup in the orig->new code address
mapping, ie we go via the dispatcher
New translations are created in the unchained state. Permanently associated
with each translation is enough metadata to facilitate chaining or unchaining
it at will.
When an unchained translation wants to make a jump to a known (orig)address,
it pushes the orig-address it wants to call, and *calls* "patch_me"
which is a short piece of assembly code. This pops the args (orig-addr)
and also pops the return address -- which points just after the call
insn on the original translation. patch_me can arrange to find the
translation and patch the caller to jump directly to it.
There is some fiddly stuff to be sorted out here:
- how to most cleanly and robustly store info to enable chaining/unchaining
- how to minimise the number of magic assembly code sequences needed (these
amount, you'll notice, to an ultra-minimal runtime linker)
- how to cleanly deal with jumps to unknown addresses, which always require
a lookup
- how to deal with jumps which have "extra semantics", ie a JumpSyscall or
JumpClientReq, etc.
- how to handle the event-counter falling to zero in chained translations
My view was to allow chaining for boring ordinary jumps to known addrs, when
no special semantics are needed. This fast-cases the vast-majority of jumps.
All other cases go through a single slow dispatcher which does whatever is
needed, including a old->new lookup, and never get chained. This is pretty
much identical to the vg_dispatch.S stuff as it stands now.
Finally -- and this is the last part of the trick -- whenever we want to
move or discard any translations, we first unchain *all* of them. This
makes them completely self-contained, so we can then mess with them as
we desire. In practice this only really occurs when doing a LRU discard pass,
and those are pretty darned rare. When execution resumes there will be a
little extra work as translations are re-chained, but LRU passes are so rare
that this is assumed to be asymptotically insignificant overhead.
All in all, this is the simplest scheme I could think of which would probably
give good performance.
There are a couple of other points:
- I tried to minimise the amount of magic assembly code sequences needed.
It's much better to put as much of the fancy logic as possible in C-land.
It would be interesting to see how far this can be pushed. For example,
perhaps even the patch_me fragment and slow-patch dispatcher could be done
completely in C.
A design in which the target-CPU-specific-aspects are minimised would
get you extra brownie points. Specifically, if V ever gets ported to
x86-64 / IA64 (god forbid), it would be nice not to have to rewrite reams
of assembly code. And it would be nice not to hardwire too deeply
CPU-specific knowledge of how to patch (chain/unchain) translations and
the exact requirements for padding bytes etc at the end of translations.
Sure, some of this is probably unavoidable, but minimising it is a
worthy design goal.
- Jeremy: you mentioned something about possibly calling _from_ translations
to the translator machinery if a target translation is missing. I prefer to
stick with the structure as it stands on the basis it's less rugged, in
which translations run as the highest point in the call stack. For all
exceptional situations (missing translation, JmpSyscall, etc) the
translations return to C land (the scheduler) which handles the situation.
Therefore the call stack looks like one of the following:
scheduler(C) -> run_thread_for_a_while(C) -> run_innerloop(ASM) ->
(translations)
or
scheduler(C) -> translation-generating-machinery(C)
But specifically I never have
scheduler(C) -> run_thread_for_a_while(C) -> run_innerloop(ASM) ->
some-translation -> translation-generating-machinery(C)
there is no case where C land runs a translation which calls back into C
land, and I think that is more robust.
[ok, not entirely true; translations call helper fns, but these are
pretty simple and don't mess with the global translation state at all]
So: I have no time to chase up any of this stuff (apart from discuss possible
designs), but if you feel the need to do some feasability-assessment hacking,
please do! It would be very interesting to know if the extra performance gain
is worth the complication.
If it can be done simply and cleanly I'm in favour. Generally my approach is
to shoot for 80% of the available performance for 20% of the complication.
This strikes me as good engineering for a resource-constrained small group.
See http://www.cs.princeton.edu/software/lcc for a strikingly effective
demonstration of the same attitude.
J
|
|
From: Julian S. <js...@ac...> - 2002-11-16 11:11:43
|
Jeremy
3 more went in:
30-hg-fix
43-nonblock-readwritev
Does this duplicate wait_for_fd_to_be_writable_or_erring? it seems to be
added by the patch and I'm sure that function exists already somewhere.
27-nvalgrind
Small request: could you possibly not reuse the sequence numbers, ever? It's
confusing, and having a unique sequence is kinda useful in .. well, knowing
the order in which the patches notionally exist. Thanks.
I'm documentation hacking again this weekend. Is there any progress on the
helgrind docs? That would be nice :-)
J
|