Re: [cclib-devel] further parser refactoring

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thursday 03 May 2007 13:21, Noel O'Boyle wrote:
> > > How we test each line has a large effect on efficiency. I point out
> > > again that using line[x:y]=="jklj" is much faster than using "word in
> > > line", or line.find(), and so these should be some of the first
> > > targets for improving efficiency.
>>
> > langner@slim:~/tmp/python/cclib/trunk/data/GAMESS/basicGAMESS-US$ python
> > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> >
> > >>> import cclib
> > >>> a = cclib.parser.ccopen("C_bigbasis.out")
> > >>> import profile
> > >>> profile.run("a.parse()", "parse.prof")
> > >>> import pstats
> > >>> s = pstats.Stats("parse.prof")
> > >>> s.sort_stats("time")
> > >>> s.print_stats(.12)
> >
> > Thu May  3 14:43:04 2007    parse.prof
> >
> >          199815 function calls in 9.069 CPU seconds
> >
> >    Ordered by: internal time
> >    List reduced from 96 to 12 due to restriction <0.12>
> >
> >    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> >      8581    4.548    0.001    8.625
> > 0.001
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/gamess
> >parser.py:90 (extract)
> >    137355    3.080    0.000    3.080    0.000 :0(find)
> >     20310    0.480    0.000    0.480    0.000 :0(len)
> >         1    0.316    0.316    9.069
> > 9.069
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:165 (parse)
> >      8600    0.184    0.000    0.184    0.000 :0(rstrip)
> >      2143    0.140    0.000    0.140    0.000 :0(split)
> >      2055    0.124    0.000    0.124    0.000 :0(range)
> >      9145    0.076    0.000    0.076    0.000 :0(strip)
> >      8868    0.060    0.000    0.060
> > 0.000
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:375 (updateprogress)
> >       370    0.016    0.000    0.016    0.000 :0(append)
> >       218    0.004    0.000    0.004    0.000 :0(replace)
> >        31    0.004    0.000    0.032
> > 0.001
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:153 (__setattr__)
>
> I've never used the profiler. Can you interpret this for me in simple
> language?

The profiler measures the time used for function calls when executing a 
command. In the columns you have:
ncalls - number of times a function ws called
tottime - time spent in the given function (excluding time in sub-functions)
percall - tottime/ncalls
cumtime - time in function including subfunctions (from invocation to exit)
percall (2nd) - cumtime/ncalls

Now that I think about all this, though, statements such as "word in line" 
and "line[i:j] = word" are not measured here, since they are not function 
calls (the time is cumulated into the time of extract).

A simple little test shows that find() is in fact the worst, but "word in 
line" is at least comparable to "line[i:j] == word":
>>> import timeit
>>> t1 = timeit.Timer("'a' in 'abcdefg'")
>>> t2 = timeit.Timer("'abcdefg'[:1] == 'a'")
>>> t3 = timeit.Timer("'abcdefg'.find('a')")
>>> min(t1.repeat(repeat=100, number=1000000))
0.18727612495422363
>>> min(t2.repeat(repeat=100, number=1000000))
0.3044281005859375
>>> min(t3.repeat(repeat=100, number=1000000))
0.7338860034942627

- Karol

-- 
written by Karol Langner
Mon May  7 11:47:55 CEST 2007