Edward Loper - 2008-03-07

Logged In: YES
Originator: NO

I'll try to write up some better docs on this when I get a chance. In the mean time, I'll give a short explanation..

A pstat file is a file that contains profiling statistics. You can generate it using one of three modules: hotshot, cProfile, or profile. The profile module (and cProfile, I think) is part of the python distribution, as downloaded from the python site; but some distributions (eg debian) split it off into a separate package (python-profiler for debian).

Using hotshot:
>>> import hotshot, hotshot.stats
>>> prof = hotshot.Profile('hotshot.out')
>>> prof = prof.runctx('some_function()', globals(), {})
>>> hotshot.stats.load('hotshot.out').dump_stats('my.pstat')

Using profile or cprofile:
>>> from profile import Profile # (or from cProfile..)
>>> prof = Profile()
>>> prof = prof.runctx('some_function()', globals(), {})
>>> prof.dump_stats('my.pstat')

If you're using profile, then note that there was a bug in Python 2.4's implementation of the profile module. It was fixed in 2.4maint: <http://mail.python.org/pipermail/python-checkins/2005-September/047099.html>. You can check for it, and fix it if it's present, with the following code:

>>> if (hasattr(Profile, 'dispatch') and
... Profile.dispatch['c_exception'] is
... Profile.trace_dispatch_exception.im_func):
... trace_dispatch_return = Profile.trace_dispatch_return.im_func
... Profile.dispatch['c_exception'] = trace_dispatch_return

With any of these methods, "some_function()" should be a function that will provide fairly good coverage of your code. Typically, this will run your regression test suite. In epydoc's case, I just run epydoc on itself. The profiler will collect information about what functions get called from where, and that information will be used to construct the call graphs. So function f() *can* call function g(), but doesn't call it in the course of executing some_function(), then epydoc won't show a call graph link from f() to g(). That's why it's important for some_function() to be something with very good code coverage.

It's necessary to do things this way because of Python's dynamic nature -- you can't really tell what function is going to be called until run time. E.g., if your code contains the function:

>>> def foo(x):
... return x.bar() + x.baz()

Then epydoc can't tell by inspection what x's type is, so it can't tell what's actually getting called here. It would be possible to make good guesses for some of these cases, and with an advanced type inferencing engine it might even be possible to get most cases right; but that would involve fairly advanced analysis. If anyone wants to contribute code that does this, I'd be happy to use it. But for now, epydoc just constructs call graphs based on profiling information.

As for which of the three profiling modules to use, hotshot will probably run fastest. Other than that, there shouldn't be much difference -- they should all give identical results. For more information on the Python profiler modules, see: http://docs.python.org/lib/profile.html

Let me know if you have any more questions, either by posting here or by email. I'm cc'ing this as private email to the address you gave in your bug report.