pyprof-devel Mailing List for Threading-Aware Profiler for Python
Status: Alpha
Brought to you by:
apexo
You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(8) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|
From: Christian S. <ma...@ap...> - 2009-05-04 22:12:43
|
Dear Geremy, On 2009-05-04 22:02:06 you wrote: > I just saw your post on python-dev and set up your module, but I'm not sure > how to go about > actually using it. Mind posting a walkthrough someplace? > > thanks for your time, > Geremy Condra > Currently there's no documentation except for the initial posting. Some more notes: - the sampling profiler can only handle threads in which Profiler.thread_init() has been invoked (on linux we need the thread ID (tid) of the thread to query per-thread metrics, but we can only query the tid from within the thread); if the application you're trying to profile only creates threads via the threading.Thread interface, then this can be easily accomplished by hooking the Thread.__bootstrap_inner method, e.g.: from Profiler import sampling_profiler, thread_init import threading old_bootstrap = threading.Thread._Thread__bootstrap_inner def new_bootstrap(self): thread_init() return old_bootstrap(self) threading.Thread._Thread__bootstrap_inner = new_bootstrap - the deterministic profiler only "sees" thread that are created while the profiled function is being executed For the moment I'd recommend patching the application you're trying to profile. Most of the time, you'll have some main() function, e.g.: def main(): ... if __name__ == '__main__': main() to use the profiler, the call to main() needs to be delegated to the profiler function, e.g. sampling_profiler (and for this profiler you also need the thread_init() hook if you're trying to profile a multi-threaded program): from Profiler import sampling_profiler, thread_init import threading old_bootstrap = threading.Thread._Thread__bootstrap_inner def new_bootstrap(self): thread_init() return old_bootstrap(self) threading.Thread._Thread__bootstrap_inner = new_bootstrap def main(): ... if __name__ == '__main__': sampling_profiler(main) In a single-threaded program the hook is unnecessary but thread_init() must still be invoked for the main thread: from Profiler import sampling_profiler, thread_init def main(): ... if __name__ == '__main__': thread_init() sampling_profiler(main) Building the module itself should not be too difficult, depending on the python version in use you might have to alter PYTHON_VERSION in Makefile, additionally you will need at least python-dev, libboost-dev, g++, linux-headers, make (debian package names). If everything is in place simply typing "make" should do the job of building the native module. Best regards, Christian |
From: Christian S. <ma...@ap...> - 2009-05-04 21:53:22
|
Hi, On 2009-05-04 17:56:04 AAhz wrote: > If you want to discuss this, please subscribe to python-ideas and repost > your message. Generally speaking, in order to include modules like this, > they need to prove themselves over time and may require PEP approval. If > you choose to move the discussion to python-ideas, it would help if you > mention known uses of your module. At the moment there are no known users except myself. So basically I'm trying to figure out whether this is something which is needed by others (thus worth putting further work in), or whether it's a complete waste of time. I might reconsider reposting this at python-* if there is enough interest - time will show. Regards, Christian |
From: Christian S. <ma...@ap...> - 2009-05-04 21:48:10
|
Hi Paul, On 2009-05-04 21:52:02 you wrote: > 2009/5/4 Bill Janssen <ja...@pa...>: > > Hi, Christian. > > > > Christian Schubert <ma...@ap...> wrote: > > > >> I've created an alternative profiler module which queries per-thread > >> CPU usage via netlink/taskstats, which limits the applicability to > >> Linux (which shouldn't be much of an issue, profiling is usually not > >> done by end users). > > > > A surprisingly large # of developers are running on OS X these days, > > though. I suggest make it work there, too. > > And Windows. I doubt that the various Windows-specific modules > available were developed on Linux. And I wouldn't assume that all of > the platform-neutral modules are developed on Linux, or even that the > developers have access to Linux. (I know I don't, short of building a > brand new virtual machine...) I don't see a problem there. As long as OS X and Windows provide functions to query the appropriate metrics, this should be not much of an issue. Except, that I don't have access to an OS X machine on which to test on (and I didn't to development on Windows in ages), so patches from interested parties might shorten the way to a semi-interoperable solution here. Regards, Christian |
From: geremy c. <deb...@gm...> - 2009-05-04 20:02:17
|
I just saw your post on python-dev and set up your module, but I'm not sure how to go about actually using it. Mind posting a walkthrough someplace? thanks for your time, Geremy Condra -- OpenMigration LLC- Open Source solutions for your business. Visit us at http://OpenMigration.net. |
From: Paul M. <p.f...@gm...> - 2009-05-04 19:52:13
|
2009/5/4 Bill Janssen <ja...@pa...>: > Hi, Christian. > > Christian Schubert <ma...@ap...> wrote: > >> I've created an alternative profiler module which queries per-thread >> CPU usage via netlink/taskstats, which limits the applicability to >> Linux (which shouldn't be much of an issue, profiling is usually not >> done by end users). > > A surprisingly large # of developers are running on OS X these days, > though. I suggest make it work there, too. And Windows. I doubt that the various Windows-specific modules available were developed on Linux. And I wouldn't assume that all of the platform-neutral modules are developed on Linux, or even that the developers have access to Linux. (I know I don't, short of building a brand new virtual machine...) Paul. |
From: Bill J. <ja...@pa...> - 2009-05-04 16:39:18
|
Hi, Christian. Christian Schubert <ma...@ap...> wrote: > I've created an alternative profiler module which queries per-thread > CPU usage via netlink/taskstats, which limits the applicability to > Linux (which shouldn't be much of an issue, profiling is usually not > done by end users). A surprisingly large # of developers are running on OS X these days, though. I suggest make it work there, too. Bill |
From: Aahz <aa...@py...> - 2009-05-04 15:56:10
|
On Mon, May 04, 2009, Christian Schubert wrote: > > Python ships with a profiler module which, unfortunately, is almost > useless in a multi-threaded environment. * > > I've created an alternative profiler module which queries per-thread > CPU usage via netlink/taskstats, which limits the applicability to > Linux (which shouldn't be much of an issue, profiling is usually > not done by end users). It implements two modes: a "sampling" (does > CPU time accounting based on stack fraames 100 times per second, by > default) and a "deterministic" profiler (does CPU time accounting > on each function call/return, based on sys.profiler interface). The > deterministic profiler is currently implemented in pure python (except > for taskstats interface) and much slower than the sampling profiler. If you want to discuss this, please subscribe to python-ideas and repost your message. Generally speaking, in order to include modules like this, they need to prove themselves over time and may require PEP approval. If you choose to move the discussion to python-ideas, it would help if you mention known uses of your module. -- Aahz (aa...@py...) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan |
From: Christian S. <ma...@ap...> - 2009-05-04 15:41:44
|
Hi, Python ships with a profiler module which, unfortunately, is almost useless in a multi-threaded environment. * I've created an alternative profiler module which queries per-thread CPU usage via netlink/taskstats, which limits the applicability to Linux (which shouldn't be much of an issue, profiling is usually not done by end users). It implements two modes: a "sampling" (does CPU time accounting based on stack fraames 100 times per second, by default) and a "deterministic" profiler (does CPU time accounting on each function call/return, based on sys.profiler interface). The deterministic profiler is currently implemented in pure python (except for taskstats interface) and much slower than the sampling profiler. Usage (don't forget make to build the c module): python >> from Profiler import * >> def f(): do_something() >> sampling_profiler(f) or >> deterministic_profiler(f) Output is currently in the form of annotated source code (xyz.py.html, in the same directory where xyz.py resides). Before the *_profiler function returns, it iterates over all code objects it encountered and annotates the source files with 2 columns in front: - 1st column: real time - 2nd column: CPU time numbers are log2(time_in_ns), colors are green-to-yellow for below-average and yellow-to-red for above-average metrics (relative to the average metric for all lines of the code object with a metric > 0). Is there common need for such a module? Is it possible to have this included in the standard cPython distribution? Which functional changes (besides a modification of the annotation output which shouldn't spread its result all over the FS) would be required to get this included? Which non-functional changes would be required to get this included? Please direct traffic regarding this subject to pyp...@li... (no I'm not subscribed to python-dev). SF project page: https://sourceforge.net/projects/pyprof/ git repository: git://pyprof.git.sourceforge.net/gitroot/pyprof Regards, Christian *) to be more exact there are at least three profiler modules: profile, cProfile, and hotshot, while I did only try (and failed) to use profile in a multi-threaded environment (by manually setting threading.profile to the profiling function), glancing at the source, I'm pretty sure that cProfile behaves similarly; I didn't test the hotshot module, but it does some other trade-offs (space-for-time), so I think that "pyprof" still adds some value |