You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
1
(5) |
2
(3) |
3
(1) |
4
(4) |
5
(1) |
6
(11) |
7
(5) |
|
8
|
9
(6) |
10
(2) |
11
(10) |
12
|
13
|
14
(4) |
|
15
(7) |
16
(1) |
17
(3) |
18
|
19
|
20
|
21
(1) |
|
22
(1) |
23
|
24
|
25
|
26
|
27
|
28
(4) |
|
29
|
30
|
31
|
|
|
|
|
|
From: Jeremy F. <je...@go...> - 2002-12-11 23:40:23
|
I've created a new skin - testbed - for dumping useful test stuff into.
The first is an implementation of a XUInstr TRASHF, which trashes the
flags between every UInstr as a way of exercising the lazy flags
handling. It has two command line arguments: --trash-flags=yes|no
(generate logical flag trashing), and --really-trash=yes|no (physically
change the CPUs flags). --really-trash is really slow.
With --trash-flags=yes --really-trash=yes, big complex things like
mozilla and OO are working fine (albeit slowly).
J
|
|
From: Jeremy F. <je...@go...> - 2002-12-11 06:00:53
|
I realized we can take advantage of P only being generated on the lower
8 bits. For jle, we can generate something like:
movl EFLAGS(%ebp), %eax
andl $0x08C0, %eax
rorl $7, %eax
js 1f
jp 2f
1: movl $target, %eax
/* jump to target */
2: /* carry on */
and for jnle:
movl EFLAGS(%ebp), %eax
andl $0x08C0, %eax
rorl $7, %eax
js 1f
jnp 1f
/* jump to target */
1: /* carry on */
The insight here is that P only tests the lower 8 bits, so we can
independently test Z and O=S. Initially, eflags looks like:
----O--+SZ------
after the rorl it looks like:
Z------+-------+-------+---O---S
We can then use P to test O=S/O!=S and S to test the state of Z.
I'm pretty sure that these two jumps are cheaper than more arithmetic
ops, because they can take advantage of prediction rather than using up
ALU resources. Unfortunately the ROR is even more expensive than the
shift on the P4; with any luck prediction of the dependent jumps will
absorb the latency. I don't know if this is really much of an
improvement; I mainly did it for hack value.
The code generation for these jumps is much easier with 72-jump, which
automates the offset computation.
Moz 1.2.1 works fine with the improved (fixed) 69-simple-jlo and
75-simple-jle.
J
|
|
From: Jeremy F. <je...@go...> - 2002-12-11 02:42:32
|
Gah! Parity is only computed over the least significant 8 bits! Argh! I'd always thought the parity flag was a useless hold-over from the 8080 that was completely obsolete. Then I found a use for it. Then I find they've crippled it so that it isn't actually useful at all. I'm not happy about Intel at the moment. So, the only way of recovering the neat hack is some sequence like: movl EFLAGS(%ebp), %reg shrl $7, %reg testl $0x11, %reg j[n]p ... which is workable, but a lot less appealing. Also goes to show how robust programs are when you completely f*ck up a whole conditional test. J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 01:59:13
|
On Tue, 2002-12-10 at 15:35, Julian Seward wrote: > (mozilla-1.2.1 was looping with memcheck ...) > > > > > It all _looks_ plausible. I'm a bit mystified. You sure this j[n]p > > > trick in 69- has no strange side-effects? I can't think of any. Perhaps > > > this is a red herring. > > > > Looks OK to me, but its a bit hard to tell without seeing the original > > code. > > > > What happens if you change it back to the popf slow path? Still happen? > > I dunno; I removed the popf stuff. > > However, backing out 69- makes it work properly. > > I identified the original code: > > 0x40224f10 mov 0x4(%edi),%eax > 0x40224f13 mov 0x10(%eax),%eax > 0x40224f16 mov %eax,0x4(%edi) > 0x40224f19 mov 0x10(%eax),%edx > 0x40224f1c mov 0x4(%ecx),%eax > 0x40224f1f cmp 0x4(%edx),%eax > 0x40224f22 jl 0x40224f10 > > Attached is the cleaned-up and annotated memcheck translation. The stuff > to do with cmp and jl looks OK to me; the %eflags value set by the > cmp (simulation) is correctly copied off to safety before the stuff for > the jl, and the relevant simd test for JL looks right. OK, I get the same thing. I'll try playing around with it. J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 01:51:11
|
On Tue, 2002-12-10 at 17:31, Julian Seward wrote: > Hey? That seems like too many instructions to me. The idea is that the > cache entries are arranged so as to cause lookup failures on misalignment, > so that the testl and jnz are not needed. Yep, you're right. > This is not so good (trashes a second reg), so perhaps your code is better > here. OTOH, providing enough spare regs exist, all reasonable machines > have 2 ALUs capable of doing the andls in parallel, so the sequence should be > fast. I made the ACCESS UInstr take two args: the address and the rounded address, so that I didn't have to scrounge for a pair of temps. It would help if AND accepted a Lit32 argument though. > movl %vv, %temp > movl %vv, %temp2 > andl $MASK, %temp -- cache index, as before > andl $(~2), %temp2 -- dump bit 1 of address (~2 == 111...11101b) > cmpl cache(%temp), %temp2 > jz done > slow: > > The andl $(~2) is the subtlety. For the lowest two bits it gives the mapping > 00 -> 00, 01 -> 01, 10 -> 00, 11 -> 01 > So if the address was 2-aligned (00, 10) it produces 00, which can potentially > match the cache[] entry. Nice. J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 01:44:10
|
On Tue, 2002-12-10 at 16:59, Julian Seward wrote: > The mozilla I was running was a 1.2.1 binary build (the straight .tar.gz) > from ftp.mozilla.org, so egcs is not in the picture, and I would expect > this problem to occur using that binary build on any distro -- the loop > is in some .so supplied in the .tar.gz, so it'll be the same for everyone > (I guess). But you only see a problem under RH6.2? Is it this build: http://ftp.mozilla.org/pub/mozilla/releases/mozilla1.2.1/mozilla-i686-pc-linux-gnu-1.2.1.tar.gz It could still be some interesting interaction between the system libraries and moz itself... I'll see if I can reproduce the problem. > > BTW, I'm having a go at implementing your addrcheck cache idea. It > > isn't working out quite as well as I'd like. > > You are?! I had better reply to your initial comments on it ... My first impression is that cache maintenance overwhelms any benefit of making the fast path faster. On the other hand, I may still be doing something wrong. I'll put the patch up for inspection. The much more interesting contribution is 72-jump, which adds a helper mechanism for computing relative jump offsets rather than always having to hand-compute them (and double-guess the emitters). I implemented it out of necessity because I wanted to do a jump over a sync_ccall site, but it turned out to work well in every other instance of a jcond_lit, and it cleans things up nicely. J |
|
From: Julian S. <js...@ac...> - 2002-12-11 01:24:17
|
> So I guess the full code for size = 4 would be:
>
> testl $3, %a
> jnz slow
> movl %a, %r
> andl $MASK, %r
> cmpl cache(%r), %a
> jz done
> slow: call slow-path
> done:
Hey? That seems like too many instructions to me. The idea is that the
cache entries are arranged so as to cause lookup failures on misalignment,
so that the testl and jnz are not needed.
If a cache slot mentions (holds) some address a, this means that a .. a + 3
inclusive are addressible. Furthermore we require that a has 00 as its lowest
two bits. (**)
----------------
So a test for a 4-byte access at address vv is
movl %vv, %temp
andl $MASK, %temp
cmpl cache(%temp), %vv
jz done
slow:
where MASK is (CACHE_MASK << 2) and CACHE_MASK is ((1 << CACHE_BITS)-1).
If %vv ends in anything other than 00, it cannot match any cache[] value
as implied by ** above.
To mark the ith cache slot empty, we place in it the value ((~i) << 2).
That causes all checks to fail since the middle CACHE_BITS cannot ever
then match. It also observes (**).
----------------
The test for a 1-byte access at address vv is
movl %vv, %temp
movl %vv, %temp2
andl $MASK, %temp -- cache index, as before
andl $(~3), %temp2 -- dump bits 0 and 1 of address (~3 == 111...11100b)
cmpl cache(%temp), %temp2
jz done
slow:
This is not so good (trashes a second reg), so perhaps your code is better
here. OTOH, providing enough spare regs exist, all reasonable machines
have 2 ALUs capable of doing the andls in parallel, so the sequence should be
fast.
----------------
Finally 2-byte is a minor variant of the 1-byte version:
movl %vv, %temp
movl %vv, %temp2
andl $MASK, %temp -- cache index, as before
andl $(~2), %temp2 -- dump bit 1 of address (~2 == 111...11101b)
cmpl cache(%temp), %temp2
jz done
slow:
The andl $(~2) is the subtlety. For the lowest two bits it gives the mapping
00 -> 00, 01 -> 01, 10 -> 00, 11 -> 01
So if the address was 2-aligned (00, 10) it produces 00, which can potentially
match the cache[] entry.
If the address was not 2-aligned (01, 11) it produces 01, which can never
match and forces us to the slow case. It is true to say that this forces
addresses ending in 01 unneccesarily into the slow case whereas your test-
based code doesn't, but misaligned accesses are so rare I think its more
important to accelerate the common case.
J
|
|
From: Julian S. <js...@ac...> - 2002-12-11 00:52:20
|
> So, this only happens with Mozilla on RH6.2, compiled with some version > of egcs? Can you reproduce anything similar with other egcs-generated > code? The mozilla I was running was a 1.2.1 binary build (the straight .tar.gz) from ftp.mozilla.org, so egcs is not in the picture, and I would expect this problem to occur using that binary build on any distro -- the loop is in some .so supplied in the .tar.gz, so it'll be the same for everyone (I guess). Thanks for 74-; I'll try it tomorrow evening. Almost out of time now. > BTW, I'm having a go at implementing your addrcheck cache idea. It > isn't working out quite as well as I'd like. You are?! I had better reply to your initial comments on it ... J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 00:44:48
|
On Tue, 2002-12-10 at 15:35, Julian Seward wrote: > (mozilla-1.2.1 was looping with memcheck ...) > > > > > It all _looks_ plausible. I'm a bit mystified. You sure this j[n]p > > > trick in 69- has no strange side-effects? I can't think of any. Perhaps > > > this is a red herring. > > > > Looks OK to me, but its a bit hard to tell without seeing the original > > code. > > > > What happens if you change it back to the popf slow path? Still happen? > > I dunno; I removed the popf stuff. > > However, backing out 69- makes it work properly. Try the attached (74-paranoid-flags) with 69- still applied see if it helps (try with --paranoid-flags=yes and no). I also found some code passing the old args to new_emit, which may have been causing a problem. J |
|
From: Jeremy F. <je...@go...> - 2002-12-11 00:12:29
|
On Tue, 2002-12-10 at 15:35, Julian Seward wrote: > However, backing out 69- makes it work properly. Very mysterious. > So I'm still mystified. One unedifying explaination is that this translation > is correct, and the reason it is looping is that some earlier translation has > written bogus values into memory, which the above loop is picking up and > looping on. I don't fancy chasing that down. Since the only code which cares about flags are the last two instructions, and they look correct to me, it must be the data they're operating on... > I'm going to back out 69- from cvs until we have a clearer picture what's > going on. Do shout if you have any ideas at all. It makes me uneasy that > I don't know what's going on here. So, this only happens with Mozilla on RH6.2, compiled with some version of egcs? Can you reproduce anything similar with other egcs-generated code? > > One possibility I've been thinking about is whether there's any code > > which depends on the undefined flags behaviour of instructions. It > > would be a (compiler?) bug, but it might change the behaviour of real > > programs. > > Um, that's not good. Should I be concerned? Dunno. It's easy to fix: just add a line into VG_(new_emit)() saying something like: if (set_flags != FlagsEmpty) maybe_emit_get_flags(); which would always make sure that if anyone sets the flags, they start with the simulated flags state in the CPU. A lot of the arithmetic instructions have an undefined effect on some set of flags. I interpret that as being the same as setting them (that is to say, no correct program can rely on them being unchanged by the instruction, so don't bother to preserve their values). It may be that some code "knows" that undefined actually means unchanged, and relies on that behaviour. In which case the conservative thing for us to do is treat undefined as meaning unchanged, and emit considerably more flags fetches (which basically punts the problem to Intel/AMD/Via/Transmeta/etc, because the CPU still has to have an interpretation of what undefined actually means; there's probably a lot of lore about the detailed behaviour of the instructions which goes way beyond their formal description in Vol2). I'm not saying it has any bearing on the present problem, but it would be an interesting experiment to try. > Umm, I'm not sure what you mean by good. Memcheck is probably the most > demanding in that nearly every original ucode is preceded by instrumentation > which very likely trashes (real) eflags. Is that what you meant? > > If there's some way in which you could hack a skin to do a > stress-test of your flags machinery, that would be very helpful. Yes. I might put together a testbed skin. BTW, I'm having a go at implementing your addrcheck cache idea. It isn't working out quite as well as I'd like. J |