You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
1
(15) |
2
(15) |
3
(16) |
4
(16) |
5
(19) |
6
(15) |
|
7
(1) |
8
(4) |
9
|
10
(4) |
11
(14) |
12
(5) |
13
|
|
14
(1) |
15
|
16
|
17
(12) |
18
(25) |
19
(18) |
20
(18) |
|
21
(16) |
22
(1) |
23
(18) |
24
(15) |
25
|
26
(3) |
27
(18) |
|
28
(8) |
29
|
30
(4) |
|
|
|
|
|
From: Niall D. <ndo...@bl...> - 2013-04-11 20:40:04
|
> As part of delving into such, I have put together a whole program path profiler > which is: temporally-preserving, lossless, and capable of capturing long (single- > threaded) application traces. It is currently based upon PIN (www.pintool.org) : > x86_64 Linux, 64-bit only at present. One item on my personal 'todo' list is to > determine whether or not such could be hooked up to valgrind instead of PIN; if > so, I would love to contribute it to valgrind for general use. > > It is available at: http://gorton-machine.org/rick/PathProfiling - which generally > describes my collection approach. The C++ version is a from-scratch rewrite > done in Nov./Dec. 2012. Would that (or some > variant) help fit your needs? Sadly we'd need it running on QNX and ARM. But I'll certainly keep it in mind should profiling improvement suggestions be requested. Niall |
|
From: Niall D. <ndo...@bl...> - 2013-04-11 20:34:19
|
> Am 11.04.2013 17:10, schrieb Niall Douglas: > >>> desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec > > > Estimated cost times for a cache line, so a D1 cache access costs 153 > > picosec, whereas a LL cache access costs 1112 picosec. > > Ok. "0.153 ns hit latency" may be more descriptive. Is 153 picosec hit latency okay? I try to avoid floating point printing. I think valgrind's sprint now supports %f, but not %e. I had to write my own %e formatter which is a challenge without a C math library. Also, the read_picosecond_timer() suggests picosec. As that's the minimum resolution offered by POSIX, that seems the right unit. > But these numbers are so small... I assume there is prefetching going on, and > you are actually measuring the maximal bandwidth from core to a cache level. I > would assume that an unpredictable LL access has a latency more in the range > of 15-30 cycles (and not 1ns). As I mentioned, I do permute the cache line fetches by reversing the last eight bits of the cache line indexes to make them appear random-ish. I *really* wish I could just point you at the code, but it's still tied up in bureaucracy. > I get the impression that we always should do prefetch simulation, and add an > event "predicted LL miss" to be not totally off with the some time estimation. It's certainly coming to that with Sandy Bridge and later on Intel. We're not there yet on ARM. > So this somehow is the latency of an add instruction, using registers. I suppose > with a multiplication it would be slower, and with a divison even more so... I.e. > that does not really match with any definition of "max cost" for me ;-) There is a bit of method in the madness there. If I have read it correctly, according to http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.5816&rep=rep1&ty pe=pdf roughly speaking, as an average case, most general purpose code has an ILP of around 1.0, so if you can calculate the clock speed, you more or less get average instruction cost. So this asks the question, how do you portably calculate the clock speed? And by portably, I mean no assembler and no OS specific code (POSIX is okay). It turns out this isn't easy. The best approach I could think of was to calculate for the add instruction given that it is *probably* reasonably likely to be the same as the clock speed on most contemporary CPUs [1]. Therefore when I use max instruction cost in my calculations, it's really a proxy for average instruction cost based on a number of reasonable assumptions. [1]: The Qualcomm Krait is not one of these CPUs. It incurs a stochastic non-linear latency on arithmetic instructions. In the end, what can you do in this situation. > > You'll note the max instruction cost is ~300 picosec while the L1 > > cache line is ~150 picosec for 64 bytes. I would assume Intel > > deliberately chose that to ensure that a single arithmetic instruction > > adding across cache lines can run at single cycle speed. > > Ah, so these results are from an Intel processor? > 101 ps means 1 cycle with 10GHz. I wondered which ARM processor you were > using to get these impressive figures. So it's more like 3.3GHz and 3 add > instructions throughput per cycle. Correct. Sorry if I confused you. As I must prepare a patch for you guys, and one which must pass Legal, I ported the work from internal valgrind to Linux valgrind. Which runs inside a VM on my Intel work PC. Hence the figures are for that machine. Main memory cost is particularly incorrect inside a VM. I'll backport the patch to internal valgrind once internal peer review approves it. There's absolutely no reason it shouldn't work seamlessly on ARM though. BTW, this is the custom event I added: desc: I1 cache: 32768 B, 64 B, 8-way associative, 157 picosec desc: D1 cache: 32768 B, 64 B, 8-way associative, 157 picosec desc: LL cache: 6291456 B, 64 B, 12-way associative, 1120 picosec desc: main memory: 1362 picosec desc: max instruct cost: 316 picosec desc: min instruct cost: 107 picosec event: EPpsec = 316 Ir + 1120 I1mr + 1120 D1mr + 1120 D1mw + 1362 ILmr + 1362 DLmr + 1362 DLmw event: EPpsec : Estimated Possible Picosecs I think that's a reasonable calculation, more or less. I'll change "picosec" => "picosec hit latency" shortly. Niall |
|
From: Rick G. <rcg...@ve...> - 2013-04-11 19:49:17
|
On 4/11/2013 11:31 AM, Niall Douglas wrote: >> An alternative approach that comes to my mind is to collect the hot >> paths, save them, and then post-process with the chip specific >> instruction timings. >> >> .... >> >> That said, if you have further ideas regarding your approach above, I would >> love to hear those for future reference. Collecting useful non-intrusive >> performance data is a problem for all of us anywhere in the software >> industry. >> >> Niall >> Sure - An area of personal long-term interest has been using path profiling to permit better optimizations/more context information/application behavior information. As part of delving into such, I have put together a whole program path profiler which is: temporally-preserving, lossless, and capable of capturing long (single-threaded) application traces. It is currently based upon PIN (www.pintool.org) : x86_64 Linux, 64-bit only at present. One item on my personal 'todo' list is to determine whether or not such could be hooked up to valgrind instead of PIN; if so, I would love to contribute it to valgrind for general use. It is available at: http://gorton-machine.org/rick/PathProfiling - which generally describes my collection approach. The C++ version is a from-scratch rewrite done in Nov./Dec. 2012. Would that (or some variant) help fit your needs? Regards, Richard |
|
From: <sv...@va...> - 2013-04-11 17:55:50
|
mjw 2013-04-11 18:55:39 +0100 (Thu, 11 Apr 2013)
New Revision: 13367
Log:
read_unitinfo_dwarf2 DW_FORM_ref_addr is address size in DWARF version 2.
Bug #305513 contained a patch for some extra robustness checks. But
the real cause of crashing in the read_unitinfo_dwarf2 DWARF reader
seemed to have been this issue where DWARF version 2 DWZ partial_units
were read and DW_FORM_ref_addr had an unexpected size. This combination
is rare. DWARF version 4 is the current default version of GCC.
Modified files:
trunk/coregrind/m_debuginfo/readdwarf.c
Modified: trunk/coregrind/m_debuginfo/readdwarf.c (+3 -3)
===================================================================
--- trunk/coregrind/m_debuginfo/readdwarf.c 2013-04-11 17:17:45 +01:00 (rev 13366)
+++ trunk/coregrind/m_debuginfo/readdwarf.c 2013-04-11 18:55:39 +01:00 (rev 13367)
@@ -991,7 +991,7 @@
UInt acode, abcode;
ULong atoffs, blklen;
Int level;
- /* UShort ver; */
+ UShort ver;
UChar addr_size;
UChar* p = unitblock_img;
@@ -1008,7 +1008,7 @@
p += ui->dw64 ? 12 : 4;
/* version should be 2, 3 or 4 */
- /* ver = ML_(read_UShort)(p); */
+ ver = ML_(read_UShort)(p);
p += 2;
/* get offset in abbrev */
@@ -1122,7 +1122,7 @@
case 0x0c: /* FORM_flag */ p++; break;
case 0x0d: /* FORM_sdata */ read_leb128S( &p ); break;
case 0x0f: /* FORM_udata */ read_leb128U( &p ); break;
- case 0x10: /* FORM_ref_addr */ p += ui->dw64 ? 8 : 4; break;
+ case 0x10: /* FORM_ref_addr */ p += (ver == 2) ? addr_size : (ui->dw64 ? 8 : 4); break;
case 0x11: /* FORM_ref1 */ p++; break;
case 0x12: /* FORM_ref2 */ p += 2; break;
case 0x13: /* FORM_ref4 */ p += 4; break;
|
|
From: Josef W. <Jos...@gm...> - 2013-04-11 17:20:32
|
Am 11.04.2013 17:10, schrieb Niall Douglas: >>> desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec > Estimated cost times for a cache line, so a D1 cache access costs 153 > picosec, whereas a LL cache access costs 1112 picosec. Ok. "0.153 ns hit latency" may be more descriptive. But these numbers are so small... I assume there is prefetching going on, and you are actually measuring the maximal bandwidth from core to a cache level. I would assume that an unpredictable LL access has a latency more in the range of 15-30 cycles (and not 1ns). I get the impression that we always should do prefetch simulation, and add an event "predicted LL miss" to be not totally off with the some time estimation. > The max and min instruction cost is a test for CPU ILP, so basically it > determines max instruction cost like this: > > for(many) > a[x]+=a[x-1] > > This forces the CPU to stall waiting for the result of each prior > calculation. So this somehow is the latency of an add instruction, using registers. I suppose with a multiplication it would be slower, and with a divison even more so... I.e. that does not really match with any definition of "max cost" for me ;-) > Min instruction cost looks more like this: > > for(many) > a[x]+=a[x+n] And this then measures the throughput of an add instruction. > You'll note the max instruction cost is ~300 picosec while the L1 cache line > is ~150 picosec for 64 bytes. I would assume Intel deliberately chose that > to ensure that a single arithmetic instruction adding across cache lines can > run at single cycle speed. Ah, so these results are from an Intel processor? 101 ps means 1 cycle with 10GHz. I wondered which ARM processor you were using to get these impressive figures. So it's more like 3.3GHz and 3 add instructions throughput per cycle. Josef |
|
From: <sv...@va...> - 2013-04-11 16:17:52
|
sewardj 2013-04-11 17:17:45 +0100 (Thu, 11 Apr 2013)
New Revision: 13366
Log:
Update.
Modified files:
trunk/NEWS
trunk/docs/internals/3_8_BUGSTATUS.txt
Modified: trunk/NEWS (+19 -0)
===================================================================
--- trunk/NEWS 2013-04-11 14:58:48 +01:00 (rev 13365)
+++ trunk/NEWS 2013-04-11 17:17:45 +01:00 (rev 13366)
@@ -324,7 +324,26 @@
269599] Increase deepest backtrace
FIXED r??
+317444 amd64->IR: 0xC4 0x41 0x2C 0xC2 0xD2 0x8 (vcmpeq_uqps)
+ FIXED 2703 13342
+317461 Fix BMI assembler configure check and avx2/bmi/fma vgtest prereqs
+ FIXED 13343
+
+317463 bmi testcase IR SANITY CHECK FAILURE
+ FIXED 2704
+
+314718 ARM: implement integer divide instruction (sdiv and udiv)
+ FIXED 2706 13365
+
+315689 disInstr(thumb): unhandled instruction: 0xF852 0x0E10 (LDRT)
+ FIXED 2705 13364
+
+317506 memcheck/tests/vbit-test fails with unknown opcode after
+ introduction of new Iops for AVX2, BMI, FMA support
+ FIXED 13347
+
+
Release 3.8.1 (19 September 2012)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3.8.1 is a bug fix release. It fixes some assertion failures in 3.8.0
Modified: trunk/docs/internals/3_8_BUGSTATUS.txt (+12 -18)
===================================================================
--- trunk/docs/internals/3_8_BUGSTATUS.txt 2013-04-11 14:58:48 +01:00 (rev 13365)
+++ trunk/docs/internals/3_8_BUGSTATUS.txt 2013-04-11 17:17:45 +01:00 (rev 13366)
@@ -307,15 +307,9 @@
314365 enable VEX to run asm helpers that do callee register saving
In progress; not sure whether this is a good idea
-314718 ARM: implement integer divide instruction (sdiv and udiv)
- HAS_PATCH, but needs working through
-
315199 vgcore file for threaded app does not show which thread crashed
HAS_PATCH; needs review
-315689 disInstr(thumb): unhandled instruction: 0xF852 0x0E10 (LDRT)
- HAS_PATCH; needs looking at
-
315828 massif "internal error" vgPlain_arena_free when RPATH includes
/usr/local/lib
WONTFIX
@@ -376,8 +370,6 @@
inside pthread/mutex methods
FreeBSD+Helgrind weirdness
--- Tue Mar 26 09:52:13 CET 2013
-
317381 helgrind warns about xchg vs suppressed store
No action so far. Not sure there's an easy fix for this.
@@ -388,17 +380,19 @@
extension
Contains plausible infrastructure patch; no insns so far tho
-317444 amd64->IR: 0xC4 0x41 0x2C 0xC2 0xD2 0x8 (vcmpeq_uqps)
- FIXED 2703 13342
+317698 parse_var_DIE: confused by: DW_TAG_compile_unit using
+ Intel 13.0 update 3 compiler
+ Reporter has queried Intel since this might be an ICC bug
-317461 Fix BMI assembler configure check and avx2/bmi/fma vgtest prereqs
- FIXED 13343
+317893 massif terminates without any message
+ Probably just a memory limit thing. Close as a dup, but
+ of what?
-317463] New: bmi testcase IR SANITY CHECK FAILURE
+318030 addHRegUse takes a lot of CPU time; band-aid speedup
+ patch within
+ No action so far
-317506] New: memcheck/tests/vbit-test fails with unknown opcode after
-introduction of new Iops for AVX2, BMI, FMA support
+318050 libmpiwrap fails to compile with out-of-source build
+ Has simple-sounding fix; should commit.
-317698] New: parse_var_DIE: confused by: DW_TAG_compile_unit using
-Intel 13.0 update 3 compiler
-
+Thu Apr 11 18:16:04 CEST 2013
|
|
From: Niall D. <ndo...@bl...> - 2013-04-11 15:31:31
|
> An alternative approach that comes to my mind is to collect the hot > paths, save them, and then post-process with the chip specific > instruction timings. > > Would that work? With my existing patch you already can detach and recombine the chip specific timings from the callgrind/cachegrind counters. We intend to keep a set of known good cpucacheconfig.xml files, one per device, and we intend to execute callgrind only twice per BB10 test item, one for x86 and one for ARM with the latter on a very well endowed, non-mobile device ARM chip as it's painful on real mobile hardware. We then synthesise XML output, one per device. No one is claiming that the estimated times are anything but an abstraction of reality. But they do make useful charts. That said, if you have further ideas regarding your approach above, I would love to hear those for future reference. Collecting useful non-intrusive performance data is a problem for all of us anywhere in the software industry. Niall |
|
From: Niall D. <ndo...@bl...> - 2013-04-11 15:11:13
|
> Am 10.04.2013 22:02, schrieb Niall Douglas: > > The patch is fairly minimal. At the start of cachegrind/callgrind if > > the cache configuration parameters were not supplied, it looks for a > > file called 'cpucacheconfig.xml' if one was not supplied. If that is > > present, it loads it. If it is not present, it calls > > I think this behaviour should be guarded by a command line flag: > e.g. on x86, the timing tests should not run when no cache parameter file is > present, as it will be supplied via CPUID. CPUID can't tell you cache/memory access latencies. The code is clever enough to realize when CPUID has returned cache configuration and it automatically skips the generic tests for cache configuration, running only those tests determining latencies. You may have meant that you want a command line flag that skips the latency tests too? A cpucacheconfig.xml file with zeroed entries is sufficient. Is this too non-obvious though? Perhaps an explicit command line flag would be better. > > desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec > > desc: D1 cache: 32768 B, 64 B, 8-way associative, 153 picosec > > desc: LL cache: 6291456 B, 64 B, 12-way associative, 1112 picosec > > desc: main memory: 1427 picosec > > desc: max instruct cost:302 picosec > > desc: min instruct cost:101 picosec > > What are these times? It would be nice to be more descriptive here (or better in > the manual). Estimated cost times for a cache line, so a D1 cache access costs 153 picosec, whereas a LL cache access costs 1112 picosec. Or, put another way, a L1 cache miss costs 1112 picosec, a LL cache miss costs 1427 picosec etc. The max and min instruction cost is a test for CPU ILP, so basically it determines max instruction cost like this: for(many) a[x]+=a[x-1] This forces the CPU to stall waiting for the result of each prior calculation. Min instruction cost looks more like this: for(many) a[x]+=a[x+n] This lets the CPU execute as many adds in parallel as it can. You'll note the max instruction cost is ~300 picosec while the L1 cache line is ~150 picosec for 64 bytes. I would assume Intel deliberately chose that to ensure that a single arithmetic instruction adding across cache lines can run at single cycle speed. Regarding the documentation, I am happy to update this once we have the patch sorted out and into Legal. I don't need approval for documentation patches thankfully. > Putting all this information behind "desc:" lines is fine for the *_annotate scripts > and KCachegrind, as this information is just forwarded to the user without trying > to parse it. But this also means that tools can not rely on it never to change, or > to have a specific format... I think your idea about custom event counters is a great one for specifying in a guaranteed machine readable way what these numbers mean. I'll have a play with those, maybe today, hopefully tomorrow. I'm currently caught up in GSoC mentor petitioning :( Many thanks for your advice Josef. Niall |
|
From: Rick G. <rcg...@ve...> - 2013-04-11 14:52:05
|
Niall,
An alternative approach that comes to my mind is to collect the hot
paths, save them, and then post-process with the chip specific
instruction timings.
Would that work?
Regards,
Richard
|
|
From: Niall D. <ndo...@bl...> - 2013-04-11 14:45:14
|
> I've just finished the port of my overall patch to Linux valgrind, so
please find
> attached a patch implementing VG_(read_nanosecond_timer). It looks like
your
> bug tracker is for bugs and not feature patches, so please do mention if
you
> want this patch to go elsewhere.
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: 0002-Separate-patch-for-coregrind-adding-nanosecond-timer.patch
> Type: application/octet-stream
> Size: 3256 bytes
> Desc: not available
Well that's very unhelpful.
Here it is copy and pasted so:
>From 10f7e83c3d91baff94fae3f2ec4b13c7a9812bea Mon Sep 17 00:00:00 2001
From: Niall Douglas <ndo...@ri...>
Date: Wed, 10 Apr 2013 14:47:40 -0400
Subject: [PATCH 2/2] Separate patch for coregrind adding nanosecond timer
resolution.
---
coregrind/m_libcproc.c | 17 +++++++++++------
include/pub_tool_libcproc.h | 7 ++++++-
2 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/coregrind/m_libcproc.c b/coregrind/m_libcproc.c
index e801b25..16ebb19 100644
--- a/coregrind/m_libcproc.c
+++ b/coregrind/m_libcproc.c
@@ -608,9 +608,9 @@ Int VG_(fork) ( void )
Timing stuff
------------------------------------------------------------------ */
-UInt VG_(read_millisecond_timer) ( void )
+ULong VG_(read_nanosecond_timer) ( void )
{
- /* 'now' and 'base' are in microseconds */
+ /* 'now' and 'base' are in nanoseconds */
static ULong base = 0;
ULong now;
@@ -620,12 +620,12 @@ UInt VG_(read_millisecond_timer) ( void )
res = VG_(do_syscall2)(__NR_clock_gettime, VKI_CLOCK_MONOTONIC,
(UWord)&ts_now);
if (sr_isError(res) == 0) {
- now = ts_now.tv_sec * 1000000ULL + ts_now.tv_nsec / 1000;
+ now = ts_now.tv_sec * 1000000000ULL + ts_now.tv_nsec;
} else {
struct vki_timeval tv_now;
res = VG_(do_syscall2)(__NR_gettimeofday, (UWord)&tv_now,
(UWord)NULL);
vg_assert(! sr_isError(res));
- now = tv_now.tv_sec * 1000000ULL + tv_now.tv_usec;
+ now = tv_now.tv_sec * 1000000000ULL + tv_now.tv_usec * 1000;
}
}
@@ -638,7 +638,7 @@ UInt VG_(read_millisecond_timer) ( void )
struct vki_timeval tv_now = { 0, 0 };
res = VG_(do_syscall2)(__NR_gettimeofday, (UWord)&tv_now,
(UWord)NULL);
vg_assert(! sr_isError(res));
- now = sr_Res(res) * 1000000ULL + sr_ResHI(res);
+ now = sr_Res(res) * 1000000000ULL + sr_ResHI(res) * 1000;
}
# else
@@ -649,9 +649,14 @@ UInt VG_(read_millisecond_timer) ( void )
if (base == 0)
base = now;
- return (now - base) / 1000;
+ return (now - base);
}
+UInt VG_(read_millisecond_timer) ( void )
+{
+ ULong now = VG_(read_nanosecond_timer)();
+ return (UInt)(now / 1000000ULL);
+}
/* ---------------------------------------------------------------------
atfork()
diff --git a/include/pub_tool_libcproc.h b/include/pub_tool_libcproc.h
index 2ff3f83..a93e609 100644
--- a/include/pub_tool_libcproc.h
+++ b/include/pub_tool_libcproc.h
@@ -81,9 +81,14 @@ extern Int VG_(getegid) ( void );
Timing
------------------------------------------------------------------ */
-// Returns the number of milliseconds passed since the progam started
+// Returns the number of nanoseconds passed since the progam started
// (roughly; it gets initialised partway through Valgrind's initialisation
// steps).
+extern ULong VG_(read_nanosecond_timer) ( void );
+
+// Returns the number of milliseconds passed since the progam started
+// (roughly; it gets initialised partway through Valgrind's initialisation
+// steps). This is VG_(read_nanosecond_timer) divided by one million.
extern UInt VG_(read_millisecond_timer) ( void );
/* ---------------------------------------------------------------------
--
1.7.11.msysgit.1
Niall
|
|
From: <sv...@va...> - 2013-04-11 13:59:02
|
sewardj 2013-04-11 14:58:48 +0100 (Thu, 11 Apr 2013)
New Revision: 13365
Log:
Add test cases for SDIV and UDIV. Pertains to #314178.
Added files:
trunk/none/tests/arm/intdiv.c
trunk/none/tests/arm/intdiv.stderr.exp
trunk/none/tests/arm/intdiv.stdout.exp
trunk/none/tests/arm/intdiv.vgtest
Modified files:
trunk/none/tests/arm/Makefile.am
Added: trunk/none/tests/arm/intdiv.vgtest (+2 -0)
===================================================================
--- trunk/none/tests/arm/intdiv.vgtest 2013-04-11 11:58:18 +01:00 (rev 13364)
+++ trunk/none/tests/arm/intdiv.vgtest 2013-04-11 14:58:48 +01:00 (rev 13365)
@@ -0,0 +1,2 @@
+prog: intdiv
+vgopts: -q
Added: trunk/none/tests/arm/intdiv.stderr.exp (+0 -0)
===================================================================
Added: trunk/none/tests/arm/intdiv.stdout.exp (+14 -0)
===================================================================
--- trunk/none/tests/arm/intdiv.stdout.exp 2013-04-11 11:58:18 +01:00 (rev 13364)
+++ trunk/none/tests/arm/intdiv.stdout.exp 2013-04-11 14:58:48 +01:00 (rev 13365)
@@ -0,0 +1,14 @@
+000001f4 00000032 -> u:0000000a s:0000000a
+000001f4 ffffffce -> u:00000000 s:fffffff6
+fffffe0c 00000032 -> u:051eb847 s:fffffff6
+fffffe0c ffffffce -> u:00000000 s:0000000a
+00000064 00000007 -> u:0000000e s:0000000e
+ffffff9c 00000007 -> u:24924916 s:fffffff2
+00000064 fffffff9 -> u:00000000 s:fffffff2
+ffffff9c fffffff9 -> u:00000000 s:0000000e
+00000001 00000000 -> u:00000000 s:00000000
+00000000 00000000 -> u:00000000 s:00000000
+ffffffff 00000000 -> u:00000000 s:00000000
+80000000 00000000 -> u:00000000 s:00000000
+7fffffff 00000000 -> u:00000000 s:00000000
+80000000 ffffffff -> u:00000000 s:80000000
Added: trunk/none/tests/arm/intdiv.c (+55 -0)
===================================================================
--- trunk/none/tests/arm/intdiv.c 2013-04-11 11:58:18 +01:00 (rev 13364)
+++ trunk/none/tests/arm/intdiv.c 2013-04-11 14:58:48 +01:00 (rev 13365)
@@ -0,0 +1,55 @@
+
+#include <stdio.h>
+
+typedef signed int Int;
+typedef unsigned int UInt;
+
+__attribute__((noinline)) UInt do_udiv32 ( UInt x, UInt y )
+{
+ UInt res;
+ __asm__ __volatile__(
+ "mov r9, %1 ; mov r10, %2 ; udiv r3,r9,r10 ; mov %0, r3"
+ : "=r"(res) : "r"(x), "r"(y) : "r3", "r9", "r10"
+ );
+ return res;
+}
+
+__attribute__((noinline)) Int do_sdiv32 ( Int x, Int y )
+{
+ UInt res;
+ __asm__ __volatile__(
+ "mov r9, %1 ; mov r10, %2 ; sdiv r3,r9,r10 ; mov %0, r3"
+ : "=r"(res) : "r"(x), "r"(y) : "r3", "r9", "r10"
+ );
+ return res;
+}
+
+void test ( UInt x, UInt y )
+{
+ UInt ru = do_udiv32(x,y);
+ Int rs = do_sdiv32(x,y);
+ printf( "%08x %08x -> u:%08x s:%08x\n", x, y, ru, (UInt)rs);
+}
+
+int main ( void )
+{
+ // Check basic operation
+ test( 500, 50 );
+ test( 500, -50 );
+ test( -500, 50 );
+ test( -500, -50 );
+ // Check for rounding towards zero
+ test( 100, 7 ); // 14.285
+ test( -100, 7 );
+ test( 100, -7 );
+ test( -100, -7 );
+ // Division by zero produces zero
+ test( 1, 0 );
+ test( 0, 0 );
+ test( -1, 0 );
+ test( 0x80000000, 0 );
+ test( 0x7FFFFFFF, 0 );
+ // Test signed range ends
+ test( 0x80000000, -1 ); // unrepresentable as signed 32
+ return 0;
+}
Modified: trunk/none/tests/arm/Makefile.am (+4 -1)
===================================================================
--- trunk/none/tests/arm/Makefile.am 2013-04-11 11:58:18 +01:00 (rev 13364)
+++ trunk/none/tests/arm/Makefile.am 2013-04-11 14:58:48 +01:00 (rev 13365)
@@ -4,6 +4,7 @@
dist_noinst_SCRIPTS = filter_stderr
EXTRA_DIST = \
+ intdiv.stdout.exp intdiv.stderr.exp intdiv.vgtest \
ldrt.stdout.exp ldrt.stderr.exp ldrt.vgtest \
neon128.stdout.exp neon128.stderr.exp neon128.vgtest \
neon64.stdout.exp neon64.stderr.exp neon64.vgtest \
@@ -16,6 +17,7 @@
check_PROGRAMS = \
allexec \
+ intdiv \
ldrt \
neon128 \
neon64 \
@@ -56,4 +58,5 @@
-mfpu=neon \
-mthumb
-ldrt_CFLAGS = $(AM_CFLAGS) -g -O0 -mcpu=cortex-a8 -mthumb
+intdiv_CFLAGS = $(AM_CFLAGS) -g -mcpu=cortex-a15 -mthumb
+ldrt_CFLAGS = $(AM_CFLAGS) -g -mcpu=cortex-a8 -mthumb
|
|
From: <sv...@va...> - 2013-04-11 13:57:53
|
sewardj 2013-04-11 14:57:43 +0100 (Thu, 11 Apr 2013)
New Revision: 2706
Log:
Implement ARM SDIV and UDIV instructions. Fixes #314178. Partially
based on a patch by Ben Cheng, bc...@an.... Also renames two
misnamed PPC helpers.
Modified files:
trunk/priv/guest_arm_toIR.c
trunk/priv/host_arm_isel.c
trunk/priv/host_generic_simd64.c
trunk/priv/host_generic_simd64.h
trunk/priv/host_ppc_isel.c
trunk/priv/main_main.c
Modified: trunk/priv/host_generic_simd64.h (+6 -4)
===================================================================
--- trunk/priv/host_generic_simd64.h 2013-04-11 11:56:42 +01:00 (rev 2705)
+++ trunk/priv/host_generic_simd64.h 2013-04-11 14:57:43 +01:00 (rev 2706)
@@ -161,11 +161,13 @@
extern UInt h_generic_calc_CmpNEZ16x2 ( UInt );
extern UInt h_generic_calc_CmpNEZ8x4 ( UInt );
-extern ULong h_DPBtoBCD ( ULong dpb );
-extern ULong h_BCDtoDPB ( ULong bcd );
+extern ULong h_calc_DPBtoBCD ( ULong dpb );
+extern ULong h_calc_BCDtoDPB ( ULong bcd );
-ULong dpb_to_bcd(ULong chunk); // helper for h_DPBtoBCD
-ULong bcd_to_dpb(ULong chunk); // helper for h_BCDtoDPB
+// Signed and unsigned integer division, that behave like
+// the ARMv7 UDIV and SDIV instructions.
+extern UInt h_calc_udiv32_w_arm_semantics ( UInt, UInt );
+extern Int h_calc_sdiv32_w_arm_semantics ( Int, Int );
#endif /* ndef __VEX_HOST_GENERIC_SIMD64_H */
Modified: trunk/priv/host_generic_simd64.c (+36 -6)
===================================================================
--- trunk/priv/host_generic_simd64.c 2013-04-11 11:56:42 +01:00 (rev 2705)
+++ trunk/priv/host_generic_simd64.c 2013-04-11 14:57:43 +01:00 (rev 2706)
@@ -36,9 +36,10 @@
/* Generic helper functions for doing 64-bit SIMD arithmetic in cases
where the instruction selectors cannot generate code in-line.
These are purely back-end entities and cannot be seen/referenced
- from IR. */
+ from IR. There are also helpers for 32-bit arithmetic in here. */
#include "libvex_basictypes.h"
+#include "main_util.h" // LIKELY, UNLIKELY
#include "host_generic_simd64.h"
@@ -1433,7 +1434,7 @@
#define GET( x, y ) ( ( ( x ) & ( 0x1UL << ( y ) ) ) >> ( y ) )
#define PUT( x, y ) ( ( x )<< ( y ) )
-ULong dpb_to_bcd( ULong chunk )
+static ULong dpb_to_bcd( ULong chunk )
{
Short a, b, c, d, e, f, g, h, i, j, k, m;
Short p, q, r, s, t, u, v, w, x, y;
@@ -1473,7 +1474,7 @@
return value;
}
-ULong bcd_to_dpb( ULong chunk )
+static ULong bcd_to_dpb( ULong chunk )
{
Short a, b, c, d, e, f, g, h, i, j, k, m;
Short p, q, r, s, t, u, v, w, x, y;
@@ -1516,7 +1517,7 @@
return value;
}
-ULong h_DPBtoBCD( ULong dpb )
+ULong h_calc_DPBtoBCD( ULong dpb )
{
ULong result, chunk;
Int i;
@@ -1531,7 +1532,7 @@
return result;
}
-ULong h_BCDtoDPB( ULong bcd )
+ULong h_calc_BCDtoDPB( ULong bcd )
{
ULong result, chunk;
Int i;
@@ -1549,7 +1550,36 @@
#undef GET
#undef PUT
+
+/* ----------------------------------------------------- */
+/* Signed and unsigned integer division, that behave like
+ the ARMv7 UDIV ansd SDIV instructions. */
+/* ----------------------------------------------------- */
+
+UInt h_calc_udiv32_w_arm_semantics ( UInt x, UInt y )
+{
+ // Division by zero --> zero
+ if (UNLIKELY(y == 0)) return 0;
+ // C requires rounding towards zero, which is also what we need.
+ return x / y;
+}
+
+Int h_calc_sdiv32_w_arm_semantics ( Int x, Int y )
+{
+ // Division by zero --> zero
+ if (UNLIKELY(y == 0)) return 0;
+ // The single case that produces an unpresentable result
+ if (UNLIKELY( ((UInt)x) == ((UInt)0x80000000)
+ && ((UInt)y) == ((UInt)0xFFFFFFFF) ))
+ return (Int)(UInt)0x80000000;
+ // Else return the result rounded towards zero. C89 says
+ // this is implementation defined (in the signed case), but gcc
+ // promises to round towards zero. Nevertheless, at startup,
+ // in main_main.c, do a check for that.
+ return x / y;
+}
+
+
/*---------------------------------------------------------------*/
/*--- end host_generic_simd64.c ---*/
/*---------------------------------------------------------------*/
-
Modified: trunk/priv/guest_arm_toIR.c (+83 -0)
===================================================================
--- trunk/priv/guest_arm_toIR.c 2013-04-11 11:56:42 +01:00 (rev 2705)
+++ trunk/priv/guest_arm_toIR.c 2013-04-11 14:57:43 +01:00 (rev 2706)
@@ -13845,6 +13845,51 @@
/* fall through */
}
+ /* --------------------- Integer Divides --------------------- */
+ // SDIV
+ if (BITS8(0,1,1,1,0,0,0,1) == INSN(27,20)
+ && INSN(15,12) == BITS4(1,1,1,1)
+ && INSN(7,4) == BITS4(0,0,0,1)) {
+ UInt rD = INSN(19,16);
+ UInt rM = INSN(11,8);
+ UInt rN = INSN(3,0);
+ if (rD == 15 || rM == 15 || rN == 15) {
+ /* Unpredictable; don't decode; fall through */
+ } else {
+ IRTemp res = newTemp(Ity_I32);
+ IRTemp argL = newTemp(Ity_I32);
+ IRTemp argR = newTemp(Ity_I32);
+ assign(argL, getIRegA(rN));
+ assign(argR, getIRegA(rM));
+ assign(res, binop(Iop_DivS32, mkexpr(argL), mkexpr(argR)));
+ putIRegA(rD, mkexpr(res), condT, Ijk_Boring);
+ DIP("sdiv r%u, r%u, r%u\n", rD, rN, rM);
+ goto decode_success;
+ }
+ }
+
+ // UDIV
+ if (BITS8(0,1,1,1,0,0,1,1) == INSN(27,20)
+ && INSN(15,12) == BITS4(1,1,1,1)
+ && INSN(7,4) == BITS4(0,0,0,1)) {
+ UInt rD = INSN(19,16);
+ UInt rM = INSN(11,8);
+ UInt rN = INSN(3,0);
+ if (rD == 15 || rM == 15 || rN == 15) {
+ /* Unpredictable; don't decode; fall through */
+ } else {
+ IRTemp res = newTemp(Ity_I32);
+ IRTemp argL = newTemp(Ity_I32);
+ IRTemp argR = newTemp(Ity_I32);
+ assign(argL, getIRegA(rN));
+ assign(argR, getIRegA(rM));
+ assign(res, binop(Iop_DivU32, mkexpr(argL), mkexpr(argR)));
+ putIRegA(rD, mkexpr(res), condT, Ijk_Boring);
+ DIP("udiv r%u, r%u, r%u\n", rD, rN, rM);
+ goto decode_success;
+ }
+ }
+
// MLA, MLS
if (BITS8(0,0,0,0,0,0,1,0) == (INSN(27,20) & BITS8(1,1,1,1,1,0,1,0))
&& INSN(7,4) == BITS4(1,0,0,1)) {
@@ -18400,6 +18445,44 @@
}
}
+ /* -------------- SDIV.W Rd, Rn, Rm -------------- */
+ if (INSN0(15,4) == 0xFB9
+ && (INSN1(15,0) & 0xF0F0) == 0xF0F0) {
+ UInt rN = INSN0(3,0);
+ UInt rD = INSN1(11,8);
+ UInt rM = INSN1(3,0);
+ if (!isBadRegT(rD) && !isBadRegT(rN) && !isBadRegT(rM)) {
+ IRTemp res = newTemp(Ity_I32);
+ IRTemp argL = newTemp(Ity_I32);
+ IRTemp argR = newTemp(Ity_I32);
+ assign(argL, getIRegT(rN));
+ assign(argR, getIRegT(rM));
+ assign(res, binop(Iop_DivS32, mkexpr(argL), mkexpr(argR)));
+ putIRegT(rD, mkexpr(res), condT);
+ DIP("sdiv.w r%u, r%u, r%u\n", rD, rN, rM);
+ goto decode_success;
+ }
+ }
+
+ /* -------------- UDIV.W Rd, Rn, Rm -------------- */
+ if (INSN0(15,4) == 0xFBB
+ && (INSN1(15,0) & 0xF0F0) == 0xF0F0) {
+ UInt rN = INSN0(3,0);
+ UInt rD = INSN1(11,8);
+ UInt rM = INSN1(3,0);
+ if (!isBadRegT(rD) && !isBadRegT(rN) && !isBadRegT(rM)) {
+ IRTemp res = newTemp(Ity_I32);
+ IRTemp argL = newTemp(Ity_I32);
+ IRTemp argR = newTemp(Ity_I32);
+ assign(argL, getIRegT(rN));
+ assign(argR, getIRegT(rM));
+ assign(res, binop(Iop_DivU32, mkexpr(argL), mkexpr(argR)));
+ putIRegT(rD, mkexpr(res), condT);
+ DIP("udiv.w r%u, r%u, r%u\n", rD, rN, rM);
+ goto decode_success;
+ }
+ }
+
/* ------------------ {U,S}MULL ------------------ */
if ((INSN0(15,4) == 0xFB8 || INSN0(15,4) == 0xFBA)
&& INSN1(7,4) == BITS4(0,0,0,0)) {
Modified: trunk/priv/main_main.c (+18 -0)
===================================================================
--- trunk/priv/main_main.c 2013-04-11 11:56:42 +01:00 (rev 2705)
+++ trunk/priv/main_main.c 2013-04-11 14:57:43 +01:00 (rev 2706)
@@ -75,6 +75,14 @@
static const HChar* show_hwcaps ( VexArch arch, UInt hwcaps );
+/* --------- helpers --------- */
+
+__attribute__((noinline))
+static UInt udiv32 ( UInt x, UInt y ) { return x/y; }
+__attribute__((noinline))
+static Int sdiv32 ( Int x, Int y ) { return x/y; }
+
+
/* --------- Initialise the library. --------- */
/* Exported to library client. */
@@ -171,6 +179,16 @@
vassert(sizeof(IRStmt) == 32);
}
+ /* Check that signed integer division on the host rounds towards
+ zero. If not, h_calc_sdiv32_w_arm_semantics() won't work
+ correctly. */
+ /* 100.0 / 7.0 == 14.2857 */
+ vassert(udiv32(100, 7) == 14);
+ vassert(sdiv32(100, 7) == 14);
+ vassert(sdiv32(-100, 7) == -14); /* and not -15 */
+ vassert(sdiv32(100, -7) == -14); /* ditto */
+ vassert(sdiv32(-100, -7) == 14); /* not sure what this proves */
+
/* Really start up .. */
vex_debuglevel = debuglevel;
vex_valgrind_support = valgrind_support;
Modified: trunk/priv/host_ppc_isel.c (+4 -4)
===================================================================
--- trunk/priv/host_ppc_isel.c 2013-04-11 11:56:42 +01:00 (rev 2705)
+++ trunk/priv/host_ppc_isel.c 2013-04-11 14:57:43 +01:00 (rev 2706)
@@ -2077,7 +2077,7 @@
cc = mk_PPCCondCode( Pct_ALWAYS, Pcf_NONE );
- fdescr = (HWord*)h_BCDtoDPB;
+ fdescr = (HWord*)h_calc_BCDtoDPB;
addInstr(env, PPCInstr_Call( cc, (Addr64)(fdescr[0]),
argiregs, RetLocInt) );
@@ -2106,7 +2106,7 @@
cc = mk_PPCCondCode( Pct_ALWAYS, Pcf_NONE );
- fdescr = (HWord*)h_DPBtoBCD;
+ fdescr = (HWord*)h_calc_DPBtoBCD;
addInstr(env, PPCInstr_Call( cc, (Addr64)(fdescr[0]),
argiregs, RetLocInt ) );
@@ -3446,7 +3446,7 @@
addInstr( env, mk_iMOVds_RR( argregs[argreg], tmpLo ) );
cc = mk_PPCCondCode( Pct_ALWAYS, Pcf_NONE );
- target = toUInt( Ptr_to_ULong(h_BCDtoDPB ) );
+ target = toUInt( Ptr_to_ULong(h_calc_BCDtoDPB ) );
addInstr( env, PPCInstr_Call( cc, (Addr64)target,
argiregs, RetLoc2Int ) );
@@ -3486,7 +3486,7 @@
cc = mk_PPCCondCode( Pct_ALWAYS, Pcf_NONE );
- target = toUInt( Ptr_to_ULong( h_DPBtoBCD ) );
+ target = toUInt( Ptr_to_ULong( h_calc_DPBtoBCD ) );
addInstr(env, PPCInstr_Call( cc, (Addr64)target,
argiregs, RetLoc2Int ) );
Modified: trunk/priv/host_arm_isel.c (+4 -0)
===================================================================
--- trunk/priv/host_arm_isel.c 2013-04-11 11:56:42 +01:00 (rev 2705)
+++ trunk/priv/host_arm_isel.c 2013-04-11 14:57:43 +01:00 (rev 2706)
@@ -1374,6 +1374,10 @@
fn = &h_generic_calc_QSub32S; break;
case Iop_QSub16Ux2:
fn = &h_generic_calc_QSub16Ux2; break;
+ case Iop_DivU32:
+ fn = &h_calc_udiv32_w_arm_semantics; break;
+ case Iop_DivS32:
+ fn = &h_calc_sdiv32_w_arm_semantics; break;
default:
break;
}
|
|
From: <sv...@va...> - 2013-04-11 10:58:25
|
sewardj 2013-04-11 11:58:18 +0100 (Thu, 11 Apr 2013)
New Revision: 13364
Log:
Add test cases for (T1) LDRT reg+#imm8. See #315689.
Added files:
trunk/none/tests/arm/ldrt.c
trunk/none/tests/arm/ldrt.stderr.exp
trunk/none/tests/arm/ldrt.stdout.exp
trunk/none/tests/arm/ldrt.vgtest
Modified files:
trunk/none/tests/arm/Makefile.am
Added: trunk/none/tests/arm/ldrt.stdout.exp (+1 -0)
===================================================================
--- trunk/none/tests/arm/ldrt.stdout.exp 2013-04-05 14:19:12 +01:00 (rev 13363)
+++ trunk/none/tests/arm/ldrt.stdout.exp 2013-04-11 11:58:18 +01:00 (rev 13364)
@@ -0,0 +1 @@
+result is 0x87868584 (should be 0x87868584)
Added: trunk/none/tests/arm/ldrt.stderr.exp (+0 -0)
===================================================================
Modified: trunk/none/tests/arm/Makefile.am (+4 -0)
===================================================================
--- trunk/none/tests/arm/Makefile.am 2013-04-05 14:19:12 +01:00 (rev 13363)
+++ trunk/none/tests/arm/Makefile.am 2013-04-11 11:58:18 +01:00 (rev 13364)
@@ -4,6 +4,7 @@
dist_noinst_SCRIPTS = filter_stderr
EXTRA_DIST = \
+ ldrt.stdout.exp ldrt.stderr.exp ldrt.vgtest \
neon128.stdout.exp neon128.stderr.exp neon128.vgtest \
neon64.stdout.exp neon64.stderr.exp neon64.vgtest \
v6intARM.stdout.exp v6intARM.stderr.exp v6intARM.vgtest \
@@ -15,6 +16,7 @@
check_PROGRAMS = \
allexec \
+ ldrt \
neon128 \
neon64 \
v6intARM \
@@ -53,3 +55,5 @@
neon64_CFLAGS = $(AM_CFLAGS) -g -O0 -mcpu=cortex-a8 \
-mfpu=neon \
-mthumb
+
+ldrt_CFLAGS = $(AM_CFLAGS) -g -O0 -mcpu=cortex-a8 -mthumb
Added: trunk/none/tests/arm/ldrt.c (+29 -0)
===================================================================
--- trunk/none/tests/arm/ldrt.c 2013-04-05 14:19:12 +01:00 (rev 13363)
+++ trunk/none/tests/arm/ldrt.c 2013-04-11 11:58:18 +01:00 (rev 13364)
@@ -0,0 +1,29 @@
+
+// This should be compiled as Thumb code, since currently V only
+// handles the T1 encoding of ldrt.
+
+#include <stdio.h>
+#include <malloc.h>
+
+typedef unsigned int UInt;
+
+__attribute__((noinline)) UInt do_ldrt_imm_132 ( unsigned char* p )
+{
+ UInt res;
+ __asm__ __volatile__(
+ "mov r5, %1 ; ldrt r6, [r5, #132] ; mov %0, r6"
+ : "=r"(res) : "r"(p) : "r5", "r6"
+ );
+ return res;
+}
+
+int main ( void )
+{
+ UInt i;
+ unsigned char* b = malloc(256);
+ for (i = 0; i < 256; i++) b[i] = (unsigned char)i;
+ UInt r = do_ldrt_imm_132(b);
+ free(b);
+ printf("result is 0x%08x (should be 0x%08x)\n", r, 0x87868584);
+ return 0;
+}
Added: trunk/none/tests/arm/ldrt.vgtest (+2 -0)
===================================================================
--- trunk/none/tests/arm/ldrt.vgtest 2013-04-05 14:19:12 +01:00 (rev 13363)
+++ trunk/none/tests/arm/ldrt.vgtest 2013-04-11 11:58:18 +01:00 (rev 13364)
@@ -0,0 +1,2 @@
+prog: ldrt
+vgopts: -q
|
|
From: <sv...@va...> - 2013-04-11 10:56:51
|
sewardj 2013-04-11 11:56:42 +0100 (Thu, 11 Apr 2013)
New Revision: 2705
Log:
Implement (T1) LDRT reg+#imm8. Fixes #315689.
(Vasily, w.g...@ma...)
Modified files:
trunk/priv/guest_arm_toIR.c
Modified: trunk/priv/guest_arm_toIR.c (+23 -0)
===================================================================
--- trunk/priv/guest_arm_toIR.c 2013-03-27 22:15:36 +00:00 (rev 2704)
+++ trunk/priv/guest_arm_toIR.c 2013-04-11 11:56:42 +01:00 (rev 2705)
@@ -19017,6 +19017,29 @@
}
}
+ /* -------------- (T1) LDRT reg+#imm8 -------------- */
+ /* Load Register Unprivileged:
+ ldrt Rt, [Rn, #imm8]
+ */
+ if (INSN0(15,6) == BITS10(1,1,1,1,1,0,0,0,0,1) && INSN0(5,4) == BITS2(0,1)
+ && INSN1(11,8) == BITS4(1,1,1,0)) {
+ UInt rT = INSN1(15,12);
+ UInt rN = INSN0(3,0);
+ UInt imm8 = INSN1(7,0);
+ Bool valid = True;
+ if (rN == 15 || isBadRegT(rT)) valid = False;
+ if (valid) {
+ put_ITSTATE(old_itstate);
+ IRExpr* ea = binop(Iop_Add32, getIRegT(rN), mkU32(imm8));
+ IRTemp newRt = newTemp(Ity_I32);
+ loadGuardedLE( newRt, ILGop_Ident32, ea, llGetIReg(rT), condT );
+ putIRegT(rT, mkexpr(newRt), IRTemp_INVALID);
+ put_ITSTATE(new_itstate);
+ DIP("ldrt r%u, [r%u, #%u]\n", rT, rN, imm8);
+ goto decode_success;
+ }
+ }
+
/* ----------------------------------------------------------- */
/* -- VFP (CP 10, CP 11) instructions (in Thumb mode) -- */
/* ----------------------------------------------------------- */
|