|
From: <sv...@va...> - 2007-12-09 02:08:44
|
Author: sewardj
Date: 2007-12-09 02:08:42 +0000 (Sun, 09 Dec 2007)
New Revision: 7283
Log:
Don't do comparisons of (signed) Words by merely subtracting them, as
this does not always produce correct results. Instead use a slower
but correct method. Fixes #147545. (Nick Nethercote, Tom Hughes et
al)
Modified:
trunk/coregrind/m_oset.c
trunk/include/pub_tool_oset.h
Modified: trunk/coregrind/m_oset.c
===================================================================
--- trunk/coregrind/m_oset.c 2007-12-06 02:15:16 UTC (rev 7282)
+++ trunk/coregrind/m_oset.c 2007-12-09 02:08:42 UTC (rev 7283)
@@ -175,7 +175,17 @@
// Compare the first word of each element. Inlining is *crucial*.
static inline Word fast_cmp(void* k, AvlNode* n)
{
- return ( *(Word*)k - *(Word*)elem_of_node(n) );
+ Word w1 = *(Word*)k;
+ Word w2 = *(Word*)elem_of_node(n);
+ // In previous versions, we tried to do this faster by doing
+ // "return w1 - w2". But it didn't work reliably, because the
+ // complete result of subtracting two N-bit numbers is an N+1-bit
+ // number, and what the caller is interested in is the sign of
+ // the complete N+1-bit result. The branching version is slightly
+ // slower, but safer and easier to understand.
+ if (w1 > w2) return 1;
+ if (w1 < w2) return -1;
+ return 0;
}
// Compare a key and an element. Inlining is *crucial*.
@@ -490,22 +500,23 @@
while (True) {
if (curr == NULL) return NULL;
cmpres = slow_cmp(t, k, curr);
- if (cmpres < 0) curr = curr->left; else
- if (cmpres > 0) curr = curr->right; else
- return curr;
+ if (cmpres < 0) curr = curr->left;
+ else if (cmpres > 0) curr = curr->right;
+ else return curr;
}
} else {
// Fast-track special case. We use the no-check version of
// elem_of_node because it saves about 10% on lookup time. This
// shouldn't be very dangerous because each node will have been
// checked on insertion.
- Word kk = *(Word*)k;
+ Word w1 = *(Word*)k;
+ Word w2;
while (True) {
if (curr == NULL) return NULL;
- cmpres = kk - *(Word*)elem_of_node_no_check(curr);
- if (cmpres < 0) curr = curr->left; else
- if (cmpres > 0) curr = curr->right; else
- return curr;
+ w2 = *(Word*)elem_of_node_no_check(curr);
+ if (w1 < w2) curr = curr->left;
+ else if (w1 > w2) curr = curr->right;
+ else return curr;
}
}
}
Modified: trunk/include/pub_tool_oset.h
===================================================================
--- trunk/include/pub_tool_oset.h 2007-12-06 02:15:16 UTC (rev 7282)
+++ trunk/include/pub_tool_oset.h 2007-12-09 02:08:42 UTC (rev 7283)
@@ -72,7 +72,7 @@
typedef struct _OSet OSet;
-// - Cmp: returns -1, 0 or 1 if key is <=, == or >= elem.
+// - Cmp: returns -1, 0 or 1 if key is <, == or > elem.
// - Alloc: allocates a chunk of memory.
// - Free: frees a chunk of memory allocated with Alloc.
|
|
From: Dirk M. <dm...@gm...> - 2007-12-10 14:13:02
|
On Sunday 09 December 2007, sv...@va... wrote: > Don't do comparisons of (signed) Words by merely subtracting them, as > this does not always produce correct results. Instead use a slower > but correct method. Fixes #147545. (Nick Nethercote, Tom Hughes et > al) Did you see John`s followup that suggested a branch-free fast comparison? Greetings, Dirk |
|
From: Julian S. <js...@ac...> - 2007-12-10 14:51:36
|
> Did you see John`s followup that suggested a branch-free fast comparison? I tried it in Helgrind, and resulted in a slowdown compared to the obvious "if x < y then -1 else if x > y then 1 else 0", from 2m19 to 2m29 on one test. gcc generates poor code for it, which I suspect is getting many partial-register-write stalls. This is something we can come back to when the (inevitable) next round of performance tuning happens. J |
|
From: Konstantin S. <kon...@gm...> - 2007-12-10 20:23:24
|
Just curios, is it possible to replace AVL tree in helgrind with a hashtable? Will it perform better, esp on large inputs (i.e. where log(N) becomes too large)? --kcc On Dec 10, 2007 5:50 PM, Julian Seward <js...@ac...> wrote: > > > Did you see John`s followup that suggested a branch-free fast > comparison? > > I tried it in Helgrind, and resulted in a slowdown compared to the > obvious "if x < y then -1 else if x > y then 1 else 0", from 2m19 > to 2m29 on one test. gcc generates poor code for it, which I suspect > is getting many partial-register-write stalls. This is something we > can come back to when the (inevitable) next round of performance tuning > happens. > > J > > ------------------------------------------------------------------------- > SF.Net email is sponsored by: > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Nicholas N. <nj...@cs...> - 2007-12-10 21:20:36
|
On Mon, 10 Dec 2007, Julian Seward wrote: >> Did you see John`s followup that suggested a branch-free fast comparison? > > I tried it in Helgrind, and resulted in a slowdown compared to the > obvious "if x < y then -1 else if x > y then 1 else 0", from 2m19 > to 2m29 on one test. gcc generates poor code for it, which I suspect > is getting many partial-register-write stalls. This is something we > can come back to when the (inevitable) next round of performance tuning > happens. John's second suggestion was better. Basically, we're comparing A and B, and returning a negative/zero/positive X based on their relative values. Then we compare X against zero. With some reconfiguration of the code we could, for the fast case, just compare A and B directly in the relevant places. I'll try to take a look at it once the release is done. N |