Re: [Valgrind-users] New skin available: Annelid, a pointer-misuse checker

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 30 Sep 2003, Erik Corry wrote:

> As an alternative you can have 3 types:
>
> * Integer offset, I
> * Pointer associated with memory area X, P(X)
> * Integer that tranlates between memory areas Y and Z, T(Y,Z)
>
> I op I         -> I          // Normal offset math
> P(X) - P(X)    -> I          // Normal allowed pointer math
> -P(X)          -> P(X)       // Unary neg
> P(X) + P(X)    -> P(X)       // Together with above allows 'decoupled' sub
> P(Y) - P(Z)    -> I(Y,Z)     // Subtract gets magic translation integer
> P(Y) op P(Z)   -> I          // Other ops just get an integer
> P(X) op I(X,Z) -> P(Z)       // Magically moves to other area
> P(X) op I(Y,X) -> P(Y)       // Magically moves to other area
>
> This one is more restrictive (you can add more rules to make it less
> so), but also covers the cases we've seen until now (except xor linked
> lists).

This is the Right Way to do it.

I tried the simpler approach of forming cliques whenever two pointers were
subtracted, ie. on p1-p2, join the two pointers' segments such that a
pointer derived from p1 can access p2's block, and vice versa.  Problem
was that the cliques grew enormous, sometimes 1000's of entries, and
slowed things down terribly.  I guess it's because certain pointers were
involved in lots of subtractions.

As for your operations above, it's a bit more complicated than that.
Here's what types I think are necessary:

 * known non-pointer:        n
 * known pointer-difference: nXY
 * unknown:                  ?
 * known pointer:            pX

(apologies for the different notation).

Here's the rules for addition:

 -----------------------
  +   | n    nCD  ?  pY
 -----------------------
  n   | n    nCD  ?  pY
  nAB | nAB  (1)  ?  (2)
  ?   | ?    ?    ?  ?
  pX  | pX   (2)  ?  n!
 -----------------------

 (1) nMN + nNO --> nMO
     nMN + nNM --> n
     otherwise --> n

 (2) nMN + pM  --> pN
     pM  + nMN --> pN
     otherwise --> n

 '!' means "report type error"

As a refinement of this, any '?' result that is definitely not a pointer,
as judged by a range test (eg. x < 0x01000000 || x > 0xff000000), gets
converted to 'n'.

Subtraction:

 -----------------------
  -   | n    nCD  ?  pY
 -----------------------
  n   | n    n    ?  n+
  nAB | nAB  (1)  ?  n
  ?   | ?    ?    ?* ?*
  pX  | pX   (2)  ?* pYX
 -----------------------

 (1) nMN - nMO --> nNO
     nMN - nNM --> n
     otherwise --> n

 (2) pX  - nYX --> pY
     otherwise --> n

Again, there's a range test on '?' results, except for those marked with a
'*'.  Why not those?  Because we can't convert what might be a pointer
difference into a nonptr.  If you take a known pointer 'p' and an unknown
pointer 'u' and do this:

  diff = u - p

if you ever mark 'diff' as a nonptr due to a range check, then Annelid
would think this:

  p[diff]

is accessing p's block, when it's really accessing u's block, and give a
false positive.  (This exact case has happened to me before.  Believe me,
the typing of all these operations is significantly trickier to get right
than you would first think;  I have been surprised several times by such
things while implementing Annelid.)

Other operations (&, |, ^, etc) would treat the types 'n' and 'nAB' as
equivalent, but results would always be downgraded to 'n' (eg. nAB & n --> n).
(The exact rules for every different op vary, however -- you can't just
lump them all together, although this is what you might expect.)

I think this can work.

N