|
From: Josef W. <Jos...@gm...> - 2005-05-04 09:57:12
|
On Wednesday 04 May 2005 09:26, you wrote: > Hi Josef. > > I'm writing a valgrind skin that needs to track function entry and exit, > and Nicholas mentioned that you had written a good description of how it > is done in callgrind. I couldn't find the message from the list. Attached (you already found it). It would be cool to have a lightweight version integrated into valgrind, fo= r=20 (optional) use to other skins. Even for memcheck it would be interesting to= =20 have the option to switch on function tracing, to get detailed stack traces= =20 for error output from the shadow call stacks, even if the current stack is= =20 completely corrupted. Or if tail recursion optimization is used. An acceptance problem seems to be the call to a helper at start of every ba= sic=20 block. Perhaps this cost can be reduced. The main question here is on which control transfer you have to take dynami= c=20 action. If you could do function tracing only by statically instrumenting a= =20 few places, it would be much simpler: this is possible if you only want to= =20 track function enter, as function entries are static properties of the code. But static instrumentation is not possible with unpredictable control flow= =20 changes (also returns: there can be arbitrary addresses on the stack!).=20 Static instrumentation is only possible if a control flow from one BB to=20 another is always the same, like in "boring jumps", or in static calls. Another thing is that function tracing has to be robust, even in the case o= f=20 handcrafted assembler wildly jumping around. You have to take care of=20 arbitrary call stack transfers via SP changes (as used with longjmp), and s= o=20 on. For this, I always synchronize the shadow call stack (which contains th= e=20 SP values at function enter times) with the current SP, and unwind if neede= d. But perhaps you have better ideas. Josef > > Would you happen to have a copy you could send me? > > Thanks. > > -- ams 6.35.250.206) by mx0.gmx.net (mx012-rz3) with SMTP; 26 Jan 2004 15:03:22 +0100 Received: from localhost ([127.0.0.1] helo=3Dprojects.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1Al7KY-0005TC-5s; Mon, 26 Jan 2004 06:03:10 -0800 Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.12]=20 helo=3Dsc8-sf-mx2.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1Al7KK-0005NN-Li for val...@li...; Mon, 26 Jan 2004 06:02:56=20 =2D0800 Received: from mail.gmx.net ([213.165.64.20]) by sc8-sf-mx2.sourceforge.net with smtp (Exim 4.30) id 1Al7KJ-0005ih-VK for val...@li...; Mon, 26 Jan 2004 06:02:56=20 =2D0800 Received: (qmail 15023 invoked by uid 65534); 26 Jan 2004 14:02:48 -0000 Received: from atbode100.informatik.tu-muenchen.de (EHLO=20 atbode100.informatik.tu-muenchen.de) (131.159.32.72) by mail.gmx.net (mp008) with SMTP; 26 Jan 2004 15:02:48 +0100 X-Authenticated: #352111 =46rom: Josef Weidendorfer <Jos...@gm...> To: Nicholas Nethercote <nj...@ca...> User-Agent: KMail/1.5 References: <Pin...@ye...> In-Reply-To: <Pin...@ye...> Cc: val...@li... MIME-Version: 1.0 Content-Type: text/plain; charset=3D"iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200...@gm...> X-Spam-Score: 0.0 (/) X-Spam-Report: Spam Filtering performed by sourceforge.net. See http://spamassassin.org/tag/ for more details. Report problems to http://sf.net/tracker/?func=3Dadd&group_id=3D1&atid=3D2= 00001 Subject: [Valgrind-developers] Re: Tracking function entry/exit Sender: val...@li... Errors-To: val...@li... X-BeenThere: val...@li... X-Mailman-Version: 2.0.9-sf.net Precedence: bulk List-Unsubscribe:=20 <https://lists.sourceforge.net/lists/listinfo/valgrind-developers>, =09 <mailto:val...@li...?subject=3Dunsubsc= ribe> List-Id: Technical discussion for valgrind developers=20 <valgrind-developers.lists.sourceforge.net> List-Post: <mailto:val...@li...> List-Help:=20 <mailto:val...@li...?subject=3Dhelp> List-Subscribe:=20 <https://lists.sourceforge.net/lists/listinfo/valgrind-developers>, <mailto:val...@li...?subject=3Dsubscr= ibe> List-Archive:=20 <http://sourceforge.net/mailarchive/forum.php?forum=3Dvalgrind-developers> Date: Mon, 26 Jan 2004 15:02:17 +0100 X-GMX-Antivirus: -1 (not scanned, may not use virus scanner) X-GMX-Antispam: 0 (Mail was not recognized as spam) On Sunday 25 January 2004 16:53, Nicholas Nethercote wrote: > Josef, > > The topic of tracking function entry/exit has come up a few times on the > mailing lists recently. My usual answer is that it's difficult to do > correctly. However, you seem to do it with Calltree. I looked at the > source code a bit, and it looks like you are doing some reasonably > complicated things to get it right, eg. unwinding the stack. How robust > is your approach? Can you briefly explain how it works? A note before describing the mechanism: I need to have a helper call at sta= rt of every BB anyway, so I use this helper to do the tracking. This of course= =20 has some overhead, and perhaps can be avoided, but it seems to add to the=20 robustness. I have a bug fix here for reentrent entering of a signal handle= r=20 (2 bug reports). Otherwise I have no bug reports, so I assume that the=20 mechanism to be quite robust. I have a shadow call stack for every thread. For signal handlers of a threa= d,=20 I first PUSH a separation marker on the shadow stack, and use the stack as= =20 normal. The marker is used for unwinding when leaving the signal handler.=20 This is fine as there is no scheduling among signal handlers of one thread. Instrumentation of calltree: * Store at the end of each basic block the jmpkind into a tool-global, stat= ic=20 variable. * At the start of every BB, jump to a helper function. The helper function does the following regarding function call tracking: =2D for a control transfer to another ELF object/ELF section, override jmpk= ind=20 with a CALL (*1) =2D for a control transfer to the 1st basic block of a function, override=20 jmpkind with a CALL (*2) =2D do unwinding if needed (i.e, POPs of the shadow call stack) =2D if jmpkind is RET and there was no unwinding/POP: - if our call stack is empty, simulate a CALL lasting from beginning (with= =20 Valgrind 2.1.x, this is not needed any more, as we run on simulated CPU fro= m=20 first client instruction) - otherwise this is a JMP using a RET instruction (typically used in the=20 runtime linker). Do a POP, setting previous BB address to call site and=20 override jmpkind with a CALL. By this, you get 2 function calls from a=20 calling site. =2D when jmpkind is a CALL, push new function call from previous BB to curr= ent=20 BB on shadow call stack. =2D Save current BB address to be available for call to handler in next BB. Special care is needed at thread switches and enter/leave of signal handler= s,=20 as we need separate shadow call stacks. Known bug: We should check for the need of unwinding when ESP is explicitly= =20 written to. I hope this doesn't create too much overhead. Remarks: (*1) Jumps between ELF objects are function calls to a shared library. This= is=20 mainly done to catch the JMP from PLT code. (*2) This is what your function tracking skin/tool does. It is needed here= =20 mainly to catch tail recursion. In general, for functions doing a "return=20 otherfunction()", GCC produces JMPs with -O2.=20 Additional points: =2D If I need a name for a function, but there is no debug info, I use the= =20 instruction address minus the load offset of the corresponding ELF object (= if=20 there is one) to get a relative address for that ELF object. This offset ca= n=20 be used with objdump later in postprocessing tools (e.g. objdump). I would= =20 suggest this change even for cachegrind instead of a "???". =2D I introduced the ability to specify functions to be "skipped". This mea= ns=20 that execution of these functions is attributed to the calling function. Th= e=20 default is to skip all functions located in PLT sections. Thus, in effect,= =20 costs of PLT functions are attributed to callers, and the call to a shared= =20 library function starts directly with code in the other ELF object. =2D As Vg 2.1.x does pointerchecking, the instrumentation can't write to me= mory=20 space of Valgrind any longer. Currently, my tool needs "--pointercheck=3Dno= " to=20 be able to run. Jeremy and me already agreed on replacing current LD/ST wit= h=20 a CLD/CST (Client Load/Store) with pointer check and keep original LD/ST fo= r=20 tool usage without pointerchecking. Looking at these things, it seems possible to do function tracking at end o= f a=20 basic block instead of the beginning of the next BB. This way, we can perha= ps=20 avoid calls to helpers at every BB. =46rom my point of view, it would be great to integrate optional function=20 tracking into Valgrind core with some hooks. Josef =2D------------------------------------------------------ The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Valgrind-developers mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-developers Josef, I've finally (it only took 6 months!) had a close look at your description of how Calltree accurately tracks function entry/exit. I've paraphrased your description to help me understand it better, but I'm still not quite clear on some points. I looked at the code, but found it hard to understand. Could you help me? I've written my questions in square brackets. Here's the description. =2D------- Data structures: =2D have a shadow call stack for every thread [not sure exactly what goes on this] Action at BB start -- depends on jmp_kind from previous BB: =2D If jmp_kind is neither JmpCall nor JmpRet (ie. is JmpNone, JmpBoring, JmpCond or JmpSyscall) and we transferred from one ELF object/section to another, it must be a function call to a shared library -- treat as a call. This catches jmps from PLT code. =2D If this is the first BB of a function, treat as a call. This catches tail calls (which gcc uses for "return f()" with -O2). [What if a function had a 'goto' back to its beginning? Would that be interpreted as a call?] =2D Unwind the shadow call stack if necessary. [when is "necessary"? If the real %esp > the shadow stack %esp?] =2D If this is a function return and there was no shadow stack unwinding, this must be a RET control transfer (typically used in the runtime linker). Pop the shadow call stack, setting the previous BB address to call site and override jmpkind with a CALL. By this, you get 2 function calls from a calling site. [I don't understand this... What is a "RET control transfer"? Why do you end up with 2 function calls -- is that a bad thing?] =2D If we're treating the control transfer as a call, push new function call from previous BB to current BB on shadow call stack. [when is this information used?] =2D Save current BB address to be available for call to handler in next BB. Other actions: When entering a signal handler, first push a separation marker on the thread's shadow stack, then use it as normal. The marker is used for unwinding when leaving the signal handler. This is fine as there is no scheduling among signal handlers of one thread. Special care is needed at thread switches and enter/leave of signal handler= s, as we need separate shadow call stacks. [Do you mean "separate shadow call stacks for each thread"?] Known bugs: =2D should check if unwinding is necessary when ESP is explicitly written to. =2D------- What about stack switching -- does it cope with that? (Not that Valgrind in general does...) N =2D------------------------------------------------------ This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND _______________________________________________ Valgrind-developers mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-developers On Monday 21 June 2004 12:15, Nicholas Nethercote wrote: > Josef, > > I've finally (it only took 6 months!) had a close look at your description > of how Calltree accurately tracks function entry/exit. > > I've paraphrased your description to help me understand it better, but I'm > still not quite clear on some points. I looked at the code, but found it > hard to understand. Could you help me? I've written my questions in > square brackets. Here's the description. > > -------- > > Data structures: > > - have a shadow call stack for every thread > [not sure exactly what goes on this] That's the resizable array of struct _call_entry's. Probably most important for call tracking is the %ESP value directly after a CALL, and a pointer to some struct storing information about the call arc or the called function. The esp value is needed to be able to robustly unwind correctly at %esp=20 changes with %esp > stored esp on shadow stack. > Action at BB start -- depends on jmp_kind from previous BB: > > - If jmp_kind is neither JmpCall nor JmpRet (ie. is JmpNone, JmpBoring, > JmpCond or JmpSyscall) and we transferred from one ELF object/section to > another, it must be a function call to a shared library -- treat as a > call. This catches jmps from PLT code. > > - If this is the first BB of a function, treat as a call. This catches > tail calls (which gcc uses for "return f()" with -O2). > [What if a function had a 'goto' back to its beginning? Would that be > interpreted as a call?] Yes. IMHO, there is no way to distinguish between optimized tail recursion= =20 using a jump and regular jumping. But as most functions need parameters on= =20 the stack, a normal jump will rarely jump to the first BB of a function,=20 wouldn't it? > - Unwind the shadow call stack if necessary. > [when is "necessary"? If the real %esp > the shadow stack %esp?] Yes. Currently I do this at every BB boundary, but perhaps it should be=20 checked at every %esp change. Then, OTOH, it would look strange to attribut= e=20 instructions of one BB to different functions? > - If this is a function return and there was no shadow stack unwinding, > this must be a RET control transfer (typically used in the runtime > linker). Pop the shadow call stack, setting the previous BB address to > call site and override jmpkind with a CALL. By this, you get 2 function > calls from a calling site. > [I don't understand this... What is a "RET control transfer"? Why do > you end up with 2 function calls -- is that a bad thing?] If there is a RET instruction, this usually should unwind (i.e. leave a=20 function) at least one entry of the shadow call stack. But this doesn't nee= d=20 to be the case, i.e. even after a RET, %esp could be lower or equal to the= =20 one on the shadow stack. E.g. suppose PUSH addr RET This is only another way of saying "JMP addr", and doesn't add/remove any=20 stack frame at all. Now, if addr is (according to debug information) inside of another function= ,=20 this is a JMP between functions, let's say from B to C. Suppose B was calle= d=20 from A, I generate a RETURN event to A and a CALL event from A to C in this= =20 case. > - If we're treating the control transfer as a call, push new function call > from previous BB to current BB on shadow call stack. > [when is this information used?] I meant: Append a struct call_entry to the shadow stack (together with the= =20 current %esp value). As I said before, the shadow stack is used for robust= =20 unwinding. > - Save current BB address to be available for call to handler in next BB. > > > Other actions: > > When entering a signal handler, first push a separation marker on the > thread's shadow stack, then use it as normal. The marker is used for > unwinding when leaving the signal handler. This is fine as there is no > scheduling among signal handlers of one thread. > > Special care is needed at thread switches and enter/leave of signal > handlers, as we need separate shadow call stacks. > [Do you mean "separate shadow call stacks for each thread"?] Yes. > What about stack switching -- does it cope with that? (Not that Valgrind > in general does...) No. If you could give me a hint how to do it, I would be pleased. The problem h= ere=20 IMHO is: How to distinguish among a stack switch and allocating a huge arra= y=20 on the stack? Josef =2D------------------------------------------------------ This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 -=20 digital self defense, top technical experts, no vendor pitches,=20 unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Valgrind-developers mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-developers |