|
From: Karol L. <kar...@kn...> - 2007-05-07 08:28:09
|
On Thursday 03 May 2007 13:21, Noel O'Boyle wrote:
> > > How we test each line has a large effect on efficiency. I point out
> > > again that using line[x:y]=="jklj" is much faster than using "word in
> > > line", or line.find(), and so these should be some of the first
> > > targets for improving efficiency.
>>
> > langner@slim:~/tmp/python/cclib/trunk/data/GAMESS/basicGAMESS-US$ python
> > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> >
> > >>> import cclib
> > >>> a = cclib.parser.ccopen("C_bigbasis.out")
> > >>> import profile
> > >>> profile.run("a.parse()", "parse.prof")
> > >>> import pstats
> > >>> s = pstats.Stats("parse.prof")
> > >>> s.sort_stats("time")
> > >>> s.print_stats(.12)
> >
> > Thu May 3 14:43:04 2007 parse.prof
> >
> > 199815 function calls in 9.069 CPU seconds
> >
> > Ordered by: internal time
> > List reduced from 96 to 12 due to restriction <0.12>
> >
> > ncalls tottime percall cumtime percall filename:lineno(function)
> > 8581 4.548 0.001 8.625
> > 0.001
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/gamess
> >parser.py:90 (extract)
> > 137355 3.080 0.000 3.080 0.000 :0(find)
> > 20310 0.480 0.000 0.480 0.000 :0(len)
> > 1 0.316 0.316 9.069
> > 9.069
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:165 (parse)
> > 8600 0.184 0.000 0.184 0.000 :0(rstrip)
> > 2143 0.140 0.000 0.140 0.000 :0(split)
> > 2055 0.124 0.000 0.124 0.000 :0(range)
> > 9145 0.076 0.000 0.076 0.000 :0(strip)
> > 8868 0.060 0.000 0.060
> > 0.000
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:375 (updateprogress)
> > 370 0.016 0.000 0.016 0.000 :0(append)
> > 218 0.004 0.000 0.004 0.000 :0(replace)
> > 31 0.004 0.000 0.032
> > 0.001
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:153 (__setattr__)
>
> I've never used the profiler. Can you interpret this for me in simple
> language?
The profiler measures the time used for function calls when executing a
command. In the columns you have:
ncalls - number of times a function ws called
tottime - time spent in the given function (excluding time in sub-functions)
percall - tottime/ncalls
cumtime - time in function including subfunctions (from invocation to exit)
percall (2nd) - cumtime/ncalls
Now that I think about all this, though, statements such as "word in line"
and "line[i:j] = word" are not measured here, since they are not function
calls (the time is cumulated into the time of extract).
A simple little test shows that find() is in fact the worst, but "word in
line" is at least comparable to "line[i:j] == word":
>>> import timeit
>>> t1 = timeit.Timer("'a' in 'abcdefg'")
>>> t2 = timeit.Timer("'abcdefg'[:1] == 'a'")
>>> t3 = timeit.Timer("'abcdefg'.find('a')")
>>> min(t1.repeat(repeat=100, number=1000000))
0.18727612495422363
>>> min(t2.repeat(repeat=100, number=1000000))
0.3044281005859375
>>> min(t3.repeat(repeat=100, number=1000000))
0.7338860034942627
- Karol
--
written by Karol Langner
Mon May 7 11:47:55 CEST 2007
|