|
From: Sébastien S. <sa...@us...> - 2008-02-11 16:28:42
|
Hi Josef,
Thanks a lot for those explanations! It was exactly what I needed.
I have been able to push Python calls on the stack according to what you
explained and I can now correctly see them with Kcachegrind, which makes
profiling my application much easier.
I added a --python-mode option to callgrind, which will activate the
detection of PyEval_EvalFrameEx and use this function to get the python
context.
The reason why I prefer to use callgrind instead of
hotshot/cProfile/profile is that my application mixes C and Python
(about 30% python, 15% C generated by Pyrex and 55% legacy C); so
hotshot gives me some very superficial results.
I am interested by the time spent in the python runtime, but I need to
know to which python code it corresponds in order to be able to reduce
this time.
I also found that since hotshot is using timestamps rather than
instruction counts like callgrind, the result is much less reliable
(especially when profiling very fast loops as the time spent collecting
profiling data can generate a shift).
I attach a first patch. It needs a few changes as I would like to pass
get_fn_node_inseg a valid obj_node, so that I can more easily
distinguish python modules in Kcachegrind. But it works well as is.
Just compile valgrind with --enable-python then
valgrind --tool=callgrind --python-mode python foo.py
regards
--
Sébastien Sablé
Josef Weidendorfer a écrit :
> On Friday 08 February 2008, Sébastien Sablé wrote:
>> When running callgrind on such an application, I can clearly see the C
>> calls of the application but all the python part is represented by a
>> complex tree of Py* functions representing Python C internals.
>>
>> Using callgrind --fn-skip='Py*' option makes it possible to ignore most
>> of Python internals but then a lot of information is missing.
>>
>> So I would like to extend callgrind so that it can handle Python calls.
>
> Interesting.
> Why do you not use the profiler built into python itself, for example
> hotshot? There is a converter provided with the KCachegrind package.
>
> On the other hand, if you are interested in cache misses and number of
> instructions executed, your approach is reasonable. However, I am not
> really sure this is the right approach, as the results will depend on the
> implementation of the python interpreter/byte code compiler, and only
> indirectly on your code (hmm.. but so will the real runtime).
>
>> Python gives the possibility to attach a C function to be called each
>> time a python function is called with PyEval_SetTrace. So I made of very
>> simple callgrind_helper module to enable this callback.
>>
>> Then by modifying callgrind/callstack.c:push_call_stack I can get the
>> Python context (file, function and lineno) for which
>> callgrind_helper_callback is called (see provided patch).
>
> Hmm... not really the right place ;-)
>
> First:
> To get a function name into the output, you have to create a fn_node
> struct for it. A factory function for these structs is
> "get_fn_node_inseg( si, filename, fnname)" in callgrind/fn.c
> The first parameter specifies the image (shared lib/binary), a value
> of 0 should be OK here.
>
> Second:
> You have to be aware that there exist actually
> two shadow stacks in callgrind. The one which is modified in
> "callgrind/callstack.c:push_call_stack" is "CLG_(current_call_stack)".
> This one exactly mirrors the real stack and is synchronised via the
> real stack pointer, e.g. to get longjmp's and C++ exception handling right.
>
> The other one ("CLG_(current_fn_stack)"), which differs from the real
> shadow stack, is responsible for separating counter values, and for
> the profile output later on. E.g. by your use of "--fn-skip", you trigger
> changes between the two shadow stacks: some real functions never make it
> onto the second stack, and thus will not appear in the output.
> The second stack is changed e.g. with push_cxt(). The important line
> to change probably is callgrind/bbcc.c:756:
> ...
> CLG_(push_cxt)(CLG_(get_fn_node)(bb));
> ....
>
> Instead of pushing the fn_node struct for the current basic block (bb),
> you could push another fn_node struct onto the second call stack here,
> created dynamically (but please only once for every python function).
> Of course, this should only happen when "bb" maps to your helper callback.
> You should be able to access your python FrameObject via the "sp" pointer.
>
> Hope this helps,
> Josef
>
>
>> Now I need to push these calls into callgrind call stack. I have started
>> to analyze callgrind code to see how this can be done, but I would
>> appreciate some hints from someone familiar with callgrind internals on
>> how this can best be done.
>>
>> Thanks in advance
>>
>> --
>> Sébastien Sablé
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
|