You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
1
(1) |
2
|
|
3
|
4
|
5
(2) |
6
(3) |
7
|
8
(2) |
9
(3) |
|
10
(3) |
11
(5) |
12
(1) |
13
|
14
(21) |
15
(6) |
16
(4) |
|
17
(9) |
18
(13) |
19
(15) |
20
(15) |
21
(11) |
22
(16) |
23
(4) |
|
24
|
25
(8) |
26
(4) |
27
(3) |
28
(1) |
29
|
30
(2) |
|
From: Julian S. <js...@ac...> - 2002-11-25 23:50:14
|
Yes, doing a good job of iccs is the hardest part of dynamic binary translation, many would say. > The really bogus code is where one instruction sets up the flags for the > immediately following one, but the codegen puts a redundant save/load in > there: > > add %edx, %eax > pushf > popl 32(%ebp) > pushl 32(%ebp) > popf > adc %ecx, %ebx > > Which should be > > add %edx, %eax > adc %ecx, %ebx > > of course. Doesn't the mythical lazy-eflags-save/restore pass clean up this particular case? > I don't know how much time we're spending in bogus flag saving vs. > unavoidable flag saving. Once again, it seems the only way of getting > very good improvements is to work out a way of increasing the working > basic block size in order to make our local analysis a bit more global. > But that sounds like hard work. Good icc handling is known to be difficult in dynamic translators. I think we can say we're running up against the limits of our local analysis. Most systems which do better (WABI, Daisy, surely others) translate groups of bbs at a time and track/optimise icc liveness across the whole group. Also, that would allow register allocation across the whole group. If I had another spare year and reimplemented the JIT from scratch I'd think about something like this. However, reality being what it is ... J |
|
From: Julian S. <js...@ac...> - 2002-11-25 23:29:54
|
> > Spent ages trying to think of good LRU algorithms which don't use > > reference bits so they will work with translation chaining. It's very > > difficult -- many constraints. > > What are the constraints? The need for reference bits isn't so hard if > you're willing to periodically unchain in order to take a measurement. My understanding of this was enhanced by reading "Page Replacement and Reference Bit Emulation in Mach" (Richard P Draves), at CMU. Operating systems for the VAX are also of particular interest because apparently the VAX didn't support reference bits, so there is no straightforward way to do LRU. Anyway. In one sense our problem is simpler since we don't have to deal with clean vs dirty "pages"; they are always clean. However, our problem is more constrained in two ways. Firstly, we cannot just throw away translations individually; they have to be all unchained to make this safe. We even have to do this if we merely want to move them around a bit. This is a global action which we cannot afford to do too often. We could concievably keep track of which blocks point to which other blocks and so do incremental unchaining, but that sounds complex. Second problem is that our pages are of varying sizes, and are small. This makes dealing with queues of them awkward; and they are so small that keeping them in linked lists is a significant space overhead. ------------- Pretty much all the papers I looked at on Sunday suggest that the brain-dead throw-it-all-away approach is about as good as it gets. I did find a detailed presentation (37 slides) about WABI at Hot Chips (?). WABI got pretty sophisticated in the end, and they suggest an improvement (which is similar to VMSs way of doing paging on the VAX): operate the cache as a FIFO. When it gets full, unchain everything, throw away the oldest N% of them, and keep going (you can see the brain-dead algorithm as a special case where N=100). This is simple to do and according to them somewhat ameliorates the phenomenon of throwing away translations still in active use. This is simple enough to implement that I might try it. VMS does FIFO on its pages, but shifted-out pages are not immediately dumped. Instead they are put in a holding-pen style arrangement from where they can be bought back into active use should a reference arise. This protects VMS from the worst consequences of the FIFO scheme. (so-called Second-Chance FIFO). We could do this too, it is awkward but not impossible with variable- length pages, but I just can't be bothered. J |
|
From: Jeremy F. <je...@go...> - 2002-11-25 22:50:19
|
I've been thinking about the flags problem a bit, and it seems that we
can't do a great deal better than we currently do. Well, we can get rid
of some really bogus code, but we still need to keep some moderately
bogus code.
I was thinking that we could probably improve on sequences such as:
and %eax, %eax
pushf
popl 32(%ebp)
test 64, 32(%ebp)
jz XXX
jmp NEXT
Well, we can, but we still need the pushf/popl there, because we need to
save the flags for the next basic block (even though the chances are
high that the flags are considered dead from basic block to basic
block).
In this case, the best we can do is:
and %eax, %eax
pushf
popl 32(%ebp)
jnz XXXXX
jmp NEXT
The really bogus code is where one instruction sets up the flags for the
immediately following one, but the codegen puts a redundant save/load in
there:
add %edx, %eax
pushf
popl 32(%ebp)
pushl 32(%ebp)
popf
adc %ecx, %ebx
Which should be
add %edx, %eax
adc %ecx, %ebx
of course.
I don't know how much time we're spending in bogus flag saving vs.
unavoidable flag saving. Once again, it seems the only way of getting
very good improvements is to work out a way of increasing the working
basic block size in order to make our local analysis a bit more global.
But that sounds like hard work.
J
|
|
From: Jeremy F. <je...@go...> - 2002-11-25 22:38:29
|
On Mon, 2002-11-25 at 13:07, Nicholas Nethercote wrote:
> If INCEIP does eventually not generate any code (ie. if it only gets used
> as an x86 instruction boundary marker) then it would be nice to augment
> it with the address of the next instruction (or maybe the current
> instruction? not sure) as well as storing the instruction size, just so
> skins don't have to keep track of x86 instruction addresses themselves.
I'm not sure it will make much difference. They'll either have to have
code which says "current_eip = u->lit32" or "current_eip += u->lit32"
whenever they encounter an INCEIP, and they'll still have to
special-case JMP instructions.
It may be useful to have an absolute rather than relative INCEIP if we
start glomming multiple original basic blocks into larger ucode basic
blocks by following jumps.
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-11-25 21:07:19
|
On 22 Nov 2002, Jeremy Fitzhardinge wrote: > I just uploaded a patch which seems to do a good job of killing INCEIP > without being overly complex or putting undue burden on skins. Just a minor random point: INCEIP currently has a single op that indicates the x86 instruction size. If INCEIP does eventually not generate any code (ie. if it only gets used as an x86 instruction boundary marker) then it would be nice to augment it with the address of the next instruction (or maybe the current instruction? not sure) as well as storing the instruction size, just so skins don't have to keep track of x86 instruction addresses themselves. N |
|
From: Jeremy F. <je...@go...> - 2002-11-25 19:45:18
|
On Sun, 2002-11-24 at 22:05, James Maynard wrote: > I am having problems with your spam filtering so lets see if at least one of you gets this ;-) Hm, I got it, but I did find it in my spam folder. Oh, yeah, better tell my filter that js...@sf... is OK. A couple of comments: it would be nice if you could generate patches rather than send the whole source file. Also, it would be nice if you could say what version you started with, since that would avoid confusion. It would be particularly useful if you could grab the current CVS head version (http://sourceforge.net/cvs/?group_id=46268), and generate patches against that, since it is the current development version and is quite different from the 1.0 versions. Ah, OK, it's against 1.0.4. From having a look through the patch, a couple of things sprung to mind: 1. We definitely want the suppressions file to be in *mangled* form rather than demangled. The suppressions can have '*' as a wildcard match, and if the suppressions were demangled, every pointer would look like a wildcard. On the other hand, the mangled form is definitely painful to work with (Julian, Nick: what were the original motivations for using mangled names suppression matches?) 2. The code in vg_memory.c:suppressable() seems a bit clumsy. Can't you use a loop rather than assume there's only four levels in the suppression matching stack? And I'm pretty sure that functionality must already exist elsewhere (I guess; I still haven't really looked into the suppression machinery) - can you work out how to reuse that? Probably something to do with using VG_(maybe_record_error)() rather than VG_(message)() in VG_(detect_memory_leaks)(). J |
|
From: Jeremy F. <je...@go...> - 2002-11-25 19:15:46
|
On Sun, 2002-11-24 at 16:47, Julian Seward wrote:
> Spent ages trying to think of good LRU algorithms which don't use reference
> bits so they will work with translation chaining. It's very difficult
> -- many constraints.
What are the constraints? The need for reference bits isn't so hard if
you're willing to periodically unchain in order to take a measurement.
> Eventually measured the simple strategy of throw away all translations and
> try again.
>
> With a 16 MB cache (the default size is 32 M), running kate (69.7 M bbs)
> I get (running memcheck)
>
> LRU 124469 total translations (1.98 M -> 25.2 M), 46.91 seconds
> forget-all 144095 total translations (2.31 M -> 29.4 M), 48.40 seconds
>
> so perhaps this scheme is good enough. With the cache restored to 32 MB
> it will look even better. At 32M the vast majority of programs never manage
> to fill it up even once, so for those, this scheme is no loss at all.
This looks quite reasonable.
J
|
|
From: Julian S. <js...@ac...> - 2002-11-25 00:40:12
|
Spent ages trying to think of good LRU algorithms which don't use reference bits so they will work with translation chaining. It's very difficult -- many constraints. Eventually measured the simple strategy of throw away all translations and try again. With a 16 MB cache (the default size is 32 M), running kate (69.7 M bbs) I get (running memcheck) LRU 124469 total translations (1.98 M -> 25.2 M), 46.91 seconds forget-all 144095 total translations (2.31 M -> 29.4 M), 48.40 seconds so perhaps this scheme is good enough. With the cache restored to 32 MB it will look even better. At 32M the vast majority of programs never manage to fill it up even once, so for those, this scheme is no loss at all. J |