cclib-devel Mailing List for cclib (Page 39)

Brought to you by: atenderholt, baoilleach, langner

cclib-devel — Developer's mailing list

This list is closed, nobody may subscribe to it.

2006	Jan	Feb	Mar (7)	Apr (30)	May (42)	Jun (24)	Jul (17)	Aug (11)	Sep (37)	Oct (39)	Nov (17)	Dec (10)
2007	Jan (64)	Feb (90)	Mar (89)	Apr (24)	May (23)	Jun (44)	Jul (74)	Aug (40)	Sep (32)	Oct (31)	Nov (27)	Dec
2008	Jan	Feb (7)	Mar (10)	Apr (7)	May (16)	Jun (4)	Jul (8)	Aug	Sep (13)	Oct (6)	Nov	Dec
2009	Jan (1)	Feb (9)	Mar (5)	Apr (6)	May (5)	Jun (13)	Jul (11)	Aug (17)	Sep (3)	Oct (11)	Nov (9)	Dec (15)
2010	Jan (14)	Feb (15)	Mar (10)	Apr (14)	May	Jun (10)	Jul	Aug (12)	Sep (4)	Oct (3)	Nov	Dec (3)
2011	Jan (20)	Feb (7)	Mar (22)	Apr (14)	May (2)	Jun	Jul (13)	Aug (4)	Sep (1)	Oct	Nov (6)	Dec (3)
2012	Jan (7)	Feb (5)	Mar (7)	Apr (23)	May	Jun	Jul (5)	Aug	Sep (2)	Oct (12)	Nov (13)	Dec (3)
2013	Jan (8)	Feb (17)	Mar (3)	Apr	May	Jun	Jul (2)	Aug (5)	Sep (6)	Oct (9)	Nov (5)	Dec (22)
2014	Jan (4)	Feb	Mar	Apr (2)	May	Jun (3)	Jul	Aug (15)	Sep (3)	Oct (1)	Nov (18)	Dec
2015	Jan	Feb	Mar (2)	Apr	May (1)	Jun (1)	Jul	Aug	Sep (7)	Oct	Nov (1)	Dec (1)
2016	Jan (1)	Feb (2)	Mar (3)	Apr (5)	May (3)	Jun (1)	Jul (3)	Aug (1)	Sep	Oct (3)	Nov (11)	Dec (12)
2017	Jan (4)	Feb (7)	Mar	Apr (5)	May (5)	Jun	Jul	Aug (5)	Sep (2)	Oct (3)	Nov (2)	Dec (1)
2018	Jan (1)	Feb (6)	Mar (17)	Apr (8)	May	Jun	Jul (2)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec
2019	Jan (2)	Feb (5)	Mar (18)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2020	Jan	Feb (1)	Mar (2)	Apr	May	Jun (1)	Jul	Aug	Sep	Oct (1)	Nov (1)	Dec
2021	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 37 38 39 40 41 .. 59 > >> (Page 39 of 59)

Re: [cclib-devel] Numeric -> NumPy transition

From: Karol L. <kar...@kn...> - 2007-05-15 23:16:19

On Wednesday 16 May 2007 00:50, Noel O'Boyle wrote:
> It's likely that all algorithms have been broken - I assume I'm going
> to have similar problems with GaussSum. It's best to make a clean
> break with Numeric as this point though, as it's no longer available
> for Python 2.5 on windows for example, which about 50% of people are
> now using.
>
> We will fix all these issues before the next release (promise!).
Not all algorithms are broken, the MPA bug is related to me copy-pasting the 
wrong function name. I'm all for a clean break, although supporting Numeric 
is not an issue in cclib, since no functionality unique for numpy is used 
yet. And if it's not a problem, why not keep it for some time?

> On 15/05/07, Adam Tenderholt <a-t...@st...> wrote:
> > The Mulliken and C-squared population analysis are broken with
> > revision 624. I haven't really explored it other than noticing that
> > the numbers are way off and that it looks like every aoresult for a
> > given MO have the same number (~1).
> >
> > Adam
Thanks for pointing these out... the bug with MPA is obvious to me now that I 
look at the diff - a misclick. I'm just about to commit the fix, along with a 
test for MPA (this probably wouldn't have been overlooked if there had been 
an MPA test before). I'll look at the CSPA problem after that.

- Karol

-- 
written by Karol Langner
Wed May 16 01:08:30 CEST 2007

Re: [cclib-devel] Numeric -> NumPy transition

From: Noel O'B. <bao...@gm...> - 2007-05-15 22:50:05

It's likely that all algorithms have been broken - I assume I'm going
to have similar problems with GaussSum. It's best to make a clean
break with Numeric as this point though, as it's no longer available
for Python 2.5 on windows for example, which about 50% of people are
now using.

We will fix all these issues before the next release (promise!).

Noel

On 15/05/07, Adam Tenderholt <a-t...@st...> wrote:
> > As fas as I can tell, everything in cclib still works with Numeric.
> > I haven't
> > actaully run the tests until now (just commited a little hack to
> > get the test
> > code working, though), but they do finish fine. The regression
> > tests are also
> > OK. It is wise, however, to drop Numeric at some point, since it is
> > not
> > supported anymore.
>
> The Mulliken and C-squared population analysis are broken with
> revision 624. I haven't really explored it other than noticing that
> the numbers are way off and that it looks like every aoresult for a
> given MO have the same number (~1).
>
> Adam
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> cclib-devel mailing list
> ccl...@li...
> https://lists.sourceforge.net/lists/listinfo/cclib-devel
>

Re: [cclib-devel] Numeric -> NumPy transition

From: Adam T. <a-t...@st...> - 2007-05-15 20:37:34

> As fas as I can tell, everything in cclib still works with Numeric.  
> I haven't
> actaully run the tests until now (just commited a little hack to  
> get the test
> code working, though), but they do finish fine. The regression  
> tests are also
> OK. It is wise, however, to drop Numeric at some point, since it is  
> not
> supported anymore.

The Mulliken and C-squared population analysis are broken with  
revision 624. I haven't really explored it other than noticing that  
the numbers are way off and that it looks like every aoresult for a  
given MO have the same number (~1).

Adam

Re: [cclib-devel] broken gaussian file

From: Noel O'B. <bao...@gm...> - 2007-05-15 09:01:06

On 07/05/07, Adam Tenderholt <a-t...@st...> wrote:
> One of my labmates has a Gaussian 98 frequency calculation that cclib
> chokes on. The issue is the assert nbasis == self.nbasis found on
> line 532 of the gaussian parser. For some reason, it says it uses 290
> basis functions at one point and then says there are 292 basis
> functions. It has the following route section:
>
> #P UBP86/LanL2DZ 5D 7F SCF(MaxCycle=500,conver=8) Pop=(NPA) Freq IOP
> (7/33=1) guess=read geom=check
>
> Have we seen this before? I have no idea what's going on and the 7/33
> IOP doesn't exist in the Gaussian 03 Documentation.

It would be useful to have the actual input file, although I suspect
that that is not going to help much. Also the original input file for
the geometry optimisation, e.g. did the geo-opt use exactly the same
settings for basis set (there is a "guess=read" so it's possible that
this might have an effect)?

The quick fix for your friend is to remove the second NBasis line from
the log file (note: I haven't tested this).

The G98 docs are available on the web, but don't contain 7/33.
However, some googling shows that it is simply related to the output
of the force constant matrix.

Actually, I now notice that the error occurs in the "Dens" section,
probably due to "DoDens=T"@3432, and there's a "ToCart=T". The
following matrix is 292x290 and is referred to as a transformation
matrix. So it sounds like there's a transformation from something to
Cartesian, which causes the basis set number to increase from 290 to
292. This may be du to NBO or due to the fact that 5D and 7F were used
up till that point, and they cannot be applied to the density (??).

What to do about this problem I'm not sure...if your colleague had
used iop(3/33=1,3/36=-1) and pop=full which figure would have been
used for the matrices? 290 or 292. I think we need to reproduce this
for dvb, and see what the consequences are for the data that we
extract.

> Adam
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> cclib-devel mailing list
> ccl...@li...
> https://lists.sourceforge.net/lists/listinfo/cclib-devel
>

Re: [cclib-devel] Numeric -> NumPy transition

From: Karol L. <kar...@kn...> - 2007-05-10 18:29:42

On Monday 07 May 2007 20:44, Adam Tenderholt wrote:
> So does this mean that Numeric will no longer work and NumPy is now
> required? Just curious...
>
> Adam

As fas as I can tell, everything in cclib still works with Numeric. I haven't 
actaully run the tests until now (just commited a little hack to get the test 
code working, though), but they do finish fine. The regression tests are also 
OK. It is wise, however, to drop Numeric at some point, since it is not 
supported anymore.

- Karol

-- 
written by Karol Langner
Thu May 10 20:25:26 CEST 2007

[cclib-devel] broken gaussian file

From: Adam T. <a-t...@st...> - 2007-05-07 21:29:36

One of my labmates has a Gaussian 98 frequency calculation that cclib  
chokes on. The issue is the assert nbasis == self.nbasis found on  
line 532 of the gaussian parser. For some reason, it says it uses 290  
basis functions at one point and then says there are 292 basis  
functions. It has the following route section:

#P UBP86/LanL2DZ 5D 7F SCF(MaxCycle=500,conver=8) Pop=(NPA) Freq IOP 
(7/33=1) guess=read geom=check

Have we seen this before? I have no idea what's going on and the 7/33  
IOP doesn't exist in the Gaussian 03 Documentation.

Adam

Re: [cclib-devel] Numeric -> NumPy transition

From: Adam T. <a-t...@st...> - 2007-05-07 18:44:31

So does this mean that Numeric will no longer work and NumPy is now  
required? Just curious...

Adam

On Apr 28, 2007, at 3:17 AM, Noel O'Boyle wrote:

> Nice work!
>
> On 28/04/07, Karol Langner <kar...@kn...> wrote:
>> Sure. I'll just mention the more important changes in names:
>>
>> Numeric.matrixmultiply -> numpy.dot
>> Numeric.outerproduct -> numpy.outer
>> LinearAlgebra.inverse -> numpy.linalg.inv
>> Numeric.typecode -> numpy.dtype
>>
>> The last was the problematic one, since Numeric.typecode is a  
>> function, while
>> numpy.dtype is a type.
>>
>> On Friday 27 April 2007 15:47, Noel O'Boyle wrote:
>>> Let's leave a few days "cooling off period" to look through the code
>>> for anything that doesn't seem right.
>>>
>>> On 27/04/07, Karol Langner <kar...@kn...> wrote:
>>>> I changed all references to NumPy, but left the Numeric imports  
>>>> (when
>>>> importing numpy fails). Some method names had to be changed. All  
>>>> the
>>>> tests and regressions go fine, although I'm worried we don't  
>>>> have tests
>>>> for all the methods, since that's where the most Numeric  
>>>> functions are
>>>> used.
>>>>
>>>> --
>>>> written by Karol Langner
>>>> Fri Apr 27 17:36:53 CEST 2007
>>
>> --
>> written by Karol Langner
>> Sat Apr 28 11:36:59 CEST 2007
>>
>
> ---------------------------------------------------------------------- 
> ---
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> cclib-devel mailing list
> ccl...@li...
> https://lists.sourceforge.net/lists/listinfo/cclib-devel

Re: [cclib-devel] further parser refactoring

From: Noel O'B. <bao...@gm...> - 2007-05-07 08:45:27

> A simple little test shows that find() is in fact the worst, but "word in
> line" is at least comparable to "line[i:j] == word":
> >>> import timeit
> >>> t1 = timeit.Timer("'a' in 'abcdefg'")
> >>> t2 = timeit.Timer("'abcdefg'[:1] == 'a'")
> >>> t3 = timeit.Timer("'abcdefg'.find('a')")
> >>> min(t1.repeat(repeat=100, number=1000000))
> 0.18727612495422363
> >>> min(t2.repeat(repeat=100, number=1000000))
> 0.3044281005859375
> >>> min(t3.repeat(repeat=100, number=1000000))
> 0.7338860034942627
I was surprised by this, but it's the same for me. However, my earlier
experiments showed 'in' to be the worst, and that is because most
lines don't match the expression. I will show some timings for a large
log file when I get a chance.

Noel

Re: [cclib-devel] further parser refactoring

From: Karol L. <kar...@kn...> - 2007-05-07 08:28:09

On Thursday 03 May 2007 13:21, Noel O'Boyle wrote:
> > > How we test each line has a large effect on efficiency. I point out
> > > again that using line[x:y]=="jklj" is much faster than using "word in
> > > line", or line.find(), and so these should be some of the first
> > > targets for improving efficiency.
>>
> > langner@slim:~/tmp/python/cclib/trunk/data/GAMESS/basicGAMESS-US$ python
> > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> >
> > >>> import cclib
> > >>> a = cclib.parser.ccopen("C_bigbasis.out")
> > >>> import profile
> > >>> profile.run("a.parse()", "parse.prof")
> > >>> import pstats
> > >>> s = pstats.Stats("parse.prof")
> > >>> s.sort_stats("time")
> > >>> s.print_stats(.12)
> >
> > Thu May  3 14:43:04 2007    parse.prof
> >
> >          199815 function calls in 9.069 CPU seconds
> >
> >    Ordered by: internal time
> >    List reduced from 96 to 12 due to restriction <0.12>
> >
> >    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> >      8581    4.548    0.001    8.625
> > 0.001
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/gamess
> >parser.py:90 (extract)
> >    137355    3.080    0.000    3.080    0.000 :0(find)
> >     20310    0.480    0.000    0.480    0.000 :0(len)
> >         1    0.316    0.316    9.069
> > 9.069
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:165 (parse)
> >      8600    0.184    0.000    0.184    0.000 :0(rstrip)
> >      2143    0.140    0.000    0.140    0.000 :0(split)
> >      2055    0.124    0.000    0.124    0.000 :0(range)
> >      9145    0.076    0.000    0.076    0.000 :0(strip)
> >      8868    0.060    0.000    0.060
> > 0.000
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:375 (updateprogress)
> >       370    0.016    0.000    0.016    0.000 :0(append)
> >       218    0.004    0.000    0.004    0.000 :0(replace)
> >        31    0.004    0.000    0.032
> > 0.001
> > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil
> >eparser.py:153 (__setattr__)
>
> I've never used the profiler. Can you interpret this for me in simple
> language?

The profiler measures the time used for function calls when executing a 
command. In the columns you have:
ncalls - number of times a function ws called
tottime - time spent in the given function (excluding time in sub-functions)
percall - tottime/ncalls
cumtime - time in function including subfunctions (from invocation to exit)
percall (2nd) - cumtime/ncalls

Now that I think about all this, though, statements such as "word in line" 
and "line[i:j] = word" are not measured here, since they are not function 
calls (the time is cumulated into the time of extract).

A simple little test shows that find() is in fact the worst, but "word in 
line" is at least comparable to "line[i:j] == word":
>>> import timeit
>>> t1 = timeit.Timer("'a' in 'abcdefg'")
>>> t2 = timeit.Timer("'abcdefg'[:1] == 'a'")
>>> t3 = timeit.Timer("'abcdefg'.find('a')")
>>> min(t1.repeat(repeat=100, number=1000000))
0.18727612495422363
>>> min(t2.repeat(repeat=100, number=1000000))
0.3044281005859375
>>> min(t3.repeat(repeat=100, number=1000000))
0.7338860034942627

- Karol

-- 
written by Karol Langner
Mon May  7 11:47:55 CEST 2007

Re: [cclib-devel] strange logger bug

From: Karol L. <kar...@kn...> - 2007-05-06 23:12:24

 It seems that logging.getlogger() returns the same instance when called with 
the same "logname", supposedly to not have to pass loggers around in an 
application that uses them. So, all parser instances in cclib of the same 
class were duplicating their own handlers (since each instance added another 
one). So a quick check that LogFile.logger.handlers is empty fixes this.

 I wonder if there is any advantage in having at maximum one logger for each 
kind of parser class there is - that's the way it is now. If not, it might be 
good to establish a clearer logging strategy:
1) create a global logger when cclib is loaded (or rather cclib.parser), and 
use only this one throughout by all parsers
2) create a new logger for any parser instance, each one with a unique name 
(the docs say that dots determine the hierarchy for loggers - such as a.b.c).

Cheers,
- Karol

On Thursday 03 May 2007 12:26, Noel O'Boyle wrote:
> Sorry, just checked. clean() doesn't remove the logger, nor should it.
> If you parse the same file a second time (e.g. this is common with
> GaussSum users following a geometry optimisation) you still want a
> logger.
>
> The problem you found occurs if you create two instances of parsers
> with the same name. We either ignore this problem, or implement some
> complicated way of tracking file names. Perhaps if we only create the
> logger when parse() is called, then we can remove it when clean() is
> called.
>
> On 03/05/07, Noel O'Boyle <bao...@gm...> wrote:
> > This is a "feature" of the logger (yes, I agree it's annoying, but
> > it's not a bug). The problem is that now you have two loggers both
> > doing the exact same thing. Loggers are distinguished by their names;
> > so if you parse the same file twice, without using clear() (which I
> > hope deletes the logger), you get this effect.
> >
> > On 03/05/07, Karol Langner <kar...@kn...> wrote:
> > > While working interactively with cclib on many files, I noticed a bug
> > > related to the logger. For some reason, it prints as many messages for
> > > each attribute as there were instances created of the specific parser
> > > (GAMESS, Gaussian, counted separately), so you get an increasingly long
> > > output form the logger. The problems is not related to any recent
> > > changes, it goes as far back as version 0.5, as you can see here:
> > >
> > > langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/basicGaussian03$
> > > python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> > > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> > > Type "help", "copyright", "credits" or "license" for more information.
> > >
> > > >>> import cclib
> > > >>> print cclib.__version__
> > >
> > > 0.5
> > >
> > > >>> print cclib.parser.logfileparser.__revision__
> > >
> > > $Revision: 240 $
> > >
> > > >>> print cclib.parser.gamessparser.__revision__
> > >
> > > $Revision: 240 $
> > >
> > > >>> cclib.parser.Gaussian("water_mp2.log").parse()
> > >
> > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> > > [Gaussian water_mp2.log INFO] Creating attribute homos[]
> > >
> > > >>> cclib.parser.Gaussian("water_mp2.log").parse()
> > >
> > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> > > [Gaussian water_mp2.log INFO] Creating attribute homos[]
> > > [Gaussian water_mp2.log INFO] Creating attribute homos[]
> > >
> > > >>> cclib.parser.GAMESS("../../GAMESS/basicGAMESS-US/water_mp2.out").pa
> > > >>>rse()
> > >
> > > [GAMESS ../GAMESS/water_mp2.out INFO] Creating attribute atomcoords,
> > > atomnos [GAMESS ../water_mp2.out INFO] Creating attribute nbasis
> > > [GAMESS ../water_mp2.out INFO] Creating attribute homos
> > > [GAMESS ../water_mp2.out INFO] Creating attribute natom
> > > [GAMESS ..//water_mp2.out INFO] Creating attribute scftargets
> > > [GAMESS ../water_mp2.out INFO] Creating attribute scfvalues
> > > [GAMESS ../water_mp2.out INFO] Creating attribute scfenergies[]
> > > [GAMESS ..//water_mp2.out INFO] Creating attributes moenergies, mosyms
> > > [GAMESS ../water_mp2.out INFO] Creating attribute nmo with default
> > > value [GAMESS ../water_mp2.out INFO] Creating attribute mocoeffs
> > > [GAMESS ../water_mp2.out INFO] Creating attribute geotargets[] with
> > > default values
> > >
> > > Notice how only the same parser class contributes to the repeats. The
> > > problem is not in parsing, though, since it happens whenever
> > > logger.info is called. Consider this from the current revision in
> > > trunk, where it is called in LogFile.__setattr__ whenever an attribute
> > > from _attrlist is set that did not exist previously:
> > >
> > > langner@slim:~/tmp/python/cclib/trunk/data$ python
> > > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> > > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> > > Type "help", "copyright", "credits" or "license" for more information.
> > >
> > > >>> import cclib
> > > >>> cclib.__version__
> > >
> > > '0.7'
> > >
> > > >>> cclib.parser.logfileparser.__revision__
> > >
> > > '$Revision: 620 $'
> > >
> > > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log")
> > > >>> a.mult = 1
> > >
> > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult:
> > > 1
> > >
> > > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log")
> > > >>> a.mult = 1
> > >
> > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult:
> > > 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute
> > > mult: 1
> > >
> > > >>> b = cclib.parser.ccopen("basicGAMESS-US/water_mp2.out")
> > > >>> b.mult = 1
> > >
> > > [GAMESS basicGAMESS-US/water_mp2.out INFO] Creating attribute mult: 1
> > >
> > > >>> a.mult = 1
> > > >>> a.clean()
> > > >>> a.mult = 1
> > >
> > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult:
> > > 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute
> > > mult: 1
> > >
> > > This looks pretty strange.
> > >
> > > - Karol
> > >
> > > --
> > > written by Karol Langner
> > > Thu May  3 13:25:54 CEST 2007
> > >
> > > -----------------------------------------------------------------------
> > >-- This SF.net email is sponsored by DB2 Express
> > > Download DB2 Express C - the FREE version of DB2 express and take
> > > control of your XML. No limits. Just data. Click to get it now.
> > > http://sourceforge.net/powerbar/db2/
> > > _______________________________________________
> > > cclib-devel mailing list
> > > ccl...@li...
> > > https://lists.sourceforge.net/lists/listinfo/cclib-devel

-- 
written by Karol Langner
Mon May  7 02:51:00 CEST 2007

Re: [cclib-devel] further parser refactoring

From: Noel O'B. <bao...@gm...> - 2007-05-03 11:21:30

On 03/05/07, Karol Langner <kar...@kn...> wrote:
> On Thursday 03 May 2007 10:23, Noel O'Boyle wrote:
> > For instance, what was the effect of the recent change where you
> > avoided calling extract() when the line was empty? It seems reasonable
> > that this would speed things up, but did it in fact? What's the
> > fastest way of testing whether a line is empty (must be
> > cross-platform)? And so on.
>
> Below, "parse_slower" is the same method as "parse" from the trunk without the
> condition that checks if the line is empty.
>
> langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/Gaussian03$ python
> Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import cclib
> >>> a = cclib.parser.ccopen("chn1.log.gz")
> >>> import timeit
> >>> t = timeit.Timer("a.clean(); a.parse()", "from __main__ import a")
> >>> min(t.repeat(repeat=10,number=5))
> .... logger output ....
> 0.92677688598632812
> >>> t_slower = timeit.Timer("a.clean();a.parse_slower()", "from __main__
> import a")
> >>> min(t_slower.repeat(repeat=10,number=5))
> ... logger output ...
> 0.92177586353772345
>
> I tried a bigger file and it also had no visible effect. So... what seemed
> reasonable to me was wrong. I guess that revision can be reverted :)

Maybe there's a quicker way of testing for an empty line. Or even
better, all lines less than 4 characters or something (although
clearly we need to be careful not to skip important lines).

> > How we test each line has a large effect on efficiency. I point out
> > again that using line[x:y]=="jklj" is much faster than using "word in
> > line", or line.find(), and so these should be some of the first
> > targets for improving efficiency.
>
> Good point, confirmed by a profiling run:
>
> langner@slim:~/tmp/python/cclib/trunk/data/GAMESS/basicGAMESS-US$ python
> Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import cclib
> >>> a = cclib.parser.ccopen("C_bigbasis.out")
> >>> import profile
> >>> profile.run("a.parse()", "parse.prof")
> >>> import pstats
> >>> s = pstats.Stats("parse.prof")
> >>> s.sort_stats("time")
> >>> s.print_stats(.12)
> Thu May  3 14:43:04 2007    parse.prof
>
>          199815 function calls in 9.069 CPU seconds
>
>    Ordered by: internal time
>    List reduced from 96 to 12 due to restriction <0.12>
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>      8581    4.548    0.001    8.625
> 0.001 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/gamessparser.py:90
> (extract)
>    137355    3.080    0.000    3.080    0.000 :0(find)
>     20310    0.480    0.000    0.480    0.000 :0(len)
>         1    0.316    0.316    9.069
> 9.069 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:165
> (parse)
>      8600    0.184    0.000    0.184    0.000 :0(rstrip)
>      2143    0.140    0.000    0.140    0.000 :0(split)
>      2055    0.124    0.000    0.124    0.000 :0(range)
>      9145    0.076    0.000    0.076    0.000 :0(strip)
>      8868    0.060    0.000    0.060
> 0.000 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:375
> (updateprogress)
>       370    0.016    0.000    0.016    0.000 :0(append)
>       218    0.004    0.000    0.004    0.000 :0(replace)
>        31    0.004    0.000    0.032
> 0.001 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:153
> (__setattr__)

I've never used the profiler. Can you interpret this for me in simple language?

> > On 03/05/07, Karol Langner <kar...@kn...> wrote:
> > > Some thoughts about more refactoring to the parser...
> > >
> > > If you take a look at the parsers after the recent refactoring, it is now
> > > more evident that they are quite inefficient. That isn't a problem, since
> > > cclib isn't about efficiency, but it would be nice. For example, even
> > > something as simple as putting a 'return' statement at the end of each
> > > parsing block would speed things up (the following conditions are not
> > > evaluated). Anyway, this already suggests that it would be useful to
> > > break up the extract() method into pieces, one for each block of parsed
> > > output.
> >
> > I think that there is one case where the block shouldn't return, but
> > in general it would be fine. However, it wouldn't speed things up that
> > much, so I feel it is not worth doing. If you think about it, most
> > lines don't match any of the 'if' statements. If each block is
> > executed once, and there are 10 blocks, then the number of wasteful
> > 'if' statements will be 9 + 8 + 7 + ...  + 1 = 45.
>
> There are also the lines between blocks that don't match any condition, but in
> principle you're right, it's not worth it.

>  > I've been hovering around this subject for some time, and turning it
> > > around in my mind. A dictionary of functions seems appropriate (with
> > > regexps or something as keys), and more easy to manage that the current
> > > "long" function. I don't think we can do away with the functions, since
> > > sometimes pretty complicated operations are done with the parsed output.
> > > The problem I see is where to define all these functions (30-40 separate
> > > parsed blocks)?
> >
> > I don't think defining the functions is the problem - just define them
> > in the gaussian parser for example. We could do this already without
> > affecting anything, and leave the dictionary of functions idea till a
> > later date.
>
> What do you mean by 'gaussian parser' - the file gaussianparser.py or the
> class? I think I didn't make myself clear - my worry is that if we define all
> these functions in the parser class, then when you go "a =
> ccopen("....").parse(); print dir(a)" you will get flooded by function names.

Ah, ok. I see. I think I would prefer users to use help(a), rather
than dir(a). I think that if you use function names starting with _ it
shouldn't appear in this list, although I haven't tested this.

> > > How about this: the functions would be defined in a different class, not
> > > LogFile. What I'm suggesting, is to separate from the class that
> > > represents a parsed log file a class that represents the parser.
> > > Currently, they are one. An instance of the parser class would be an
> > > attribute of the log file class, say "_parser". This object would hold
> > > all the parsing functions and a dict used by the parse() method of
> > > LogFile, and any other stuff needed for parsing. An additional advantage
> > > is that the parser becomes less visible to the casual user, leaving only
> > > parsed attributes in the log file object.
> > >
> > > Summarizing, I propose two layers of classes:
> > > LogFile - subclasses Gaussian, GAMES, ...
> > > LogFileParser - subclasses GaussianParser, GAMESSParser, ...
> > > The first remains as is (at least for the user), except that everything
> > > related to parsing is put in the second. Of course, instances of the
> > > latter class should be attributes of the instances of the former.
> >
> > I think you'll have to explain this some more...I'm not sure what the
> > advantage is in doing this. I guess I don't have enough time right now
> > to think this through fully...
>
> Let me sketch out the idea. Snippets of the parse class (second layer):
>
> def GaussianParser(LogFileParser):
>         (...)
>         def parse_charge(self, inputfile, line):
>                 super(GaussianParser, self).charge = (...)
>         def parse_scfenergy(self, intputfile, line):
>                 super(GaussianParser, self).scfenergies.append(...)
>         (...)
>         self.parse_dict = {
>                 <regexp_charge>:                self.parse_charge,
>                 <regexp_scfenergy>:     self.parse_scfenergy,
>                 (...)
>         }
>
> Now the first layer, the log file class:
>
> def Gaussian(LogFile):
>         self._parser = GaussianParser(...)
>         (...)
>         def parse(self, ...):
>                 (...)
>                 for line in inputfile:
>                         for regexp in self._parser.parse_dict:
>                                 if re.match(regexp, line):
>                                         self._parser.parse_dict[regexp](line, inputfile)
>
> I hope that's clearer.

OK, thanks for that. As I said, it needs more time...

> No problem, I'm just brainstorming on a holiday.

If you want to do something neat, you can see if you can integrate
cclib with PyVib2, or whether you can get cclib to run under Jython
(there has been some interest in this, more later...)...I would hate
for you not to have something to do on your holiday :-)

Noel

Re: [cclib-devel] further parser refactoring

From: Karol L. <kar...@kn...> - 2007-05-03 11:11:28

On Thursday 03 May 2007 10:23, Noel O'Boyle wrote:
> For instance, what was the effect of the recent change where you
> avoided calling extract() when the line was empty? It seems reasonable
> that this would speed things up, but did it in fact? What's the
> fastest way of testing whether a line is empty (must be
> cross-platform)? And so on.

Below, "parse_slower" is the same method as "parse" from the trunk without the 
condition that checks if the line is empty.

langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/Gaussian03$ python
Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) 
[GCC 3.4.6 (Debian 3.4.6-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cclib
>>> a = cclib.parser.ccopen("chn1.log.gz")
>>> import timeit
>>> t = timeit.Timer("a.clean(); a.parse()", "from __main__ import a")
>>> min(t.repeat(repeat=10,number=5))
.... logger output ....
0.92677688598632812
>>> t_slower = timeit.Timer("a.clean();a.parse_slower()", "from __main__ 
import a")
>>> min(t_slower.repeat(repeat=10,number=5))
... logger output ...
0.92177586353772345

I tried a bigger file and it also had no visible effect. So... what seemed 
reasonable to me was wrong. I guess that revision can be reverted :)

> How we test each line has a large effect on efficiency. I point out
> again that using line[x:y]=="jklj" is much faster than using "word in
> line", or line.find(), and so these should be some of the first
> targets for improving efficiency.

Good point, confirmed by a profiling run:

langner@slim:~/tmp/python/cclib/trunk/data/GAMESS/basicGAMESS-US$ python
Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) 
[GCC 3.4.6 (Debian 3.4.6-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cclib
>>> a = cclib.parser.ccopen("C_bigbasis.out")
>>> import profile
>>> profile.run("a.parse()", "parse.prof")
>>> import pstats
>>> s = pstats.Stats("parse.prof")
>>> s.sort_stats("time")
>>> s.print_stats(.12)
Thu May  3 14:43:04 2007    parse.prof

         199815 function calls in 9.069 CPU seconds

   Ordered by: internal time
   List reduced from 96 to 12 due to restriction <0.12>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     8581    4.548    0.001    8.625    
0.001 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/gamessparser.py:90
(extract)
   137355    3.080    0.000    3.080    0.000 :0(find)
    20310    0.480    0.000    0.480    0.000 :0(len)
        1    0.316    0.316    9.069    
9.069 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:165
(parse)
     8600    0.184    0.000    0.184    0.000 :0(rstrip)
     2143    0.140    0.000    0.140    0.000 :0(split)
     2055    0.124    0.000    0.124    0.000 :0(range)
     9145    0.076    0.000    0.076    0.000 :0(strip)
     8868    0.060    0.000    0.060    
0.000 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:375
(updateprogress)
      370    0.016    0.000    0.016    0.000 :0(append)
      218    0.004    0.000    0.004    0.000 :0(replace)
       31    0.004    0.000    0.032    
0.001 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:153
(__setattr__)

> On 03/05/07, Karol Langner <kar...@kn...> wrote:
> > Some thoughts about more refactoring to the parser...
> >
> > If you take a look at the parsers after the recent refactoring, it is now
> > more evident that they are quite inefficient. That isn't a problem, since
> > cclib isn't about efficiency, but it would be nice. For example, even
> > something as simple as putting a 'return' statement at the end of each
> > parsing block would speed things up (the following conditions are not
> > evaluated). Anyway, this already suggests that it would be useful to
> > break up the extract() method into pieces, one for each block of parsed
> > output.
>
> I think that there is one case where the block shouldn't return, but
> in general it would be fine. However, it wouldn't speed things up that
> much, so I feel it is not worth doing. If you think about it, most
> lines don't match any of the 'if' statements. If each block is
> executed once, and there are 10 blocks, then the number of wasteful
> 'if' statements will be 9 + 8 + 7 + ...  + 1 = 45.

There are also the lines between blocks that don't match any condition, but in 
principle you're right, it's not worth it.

 > I've been hovering around this subject for some time, and turning it
> > around in my mind. A dictionary of functions seems appropriate (with
> > regexps or something as keys), and more easy to manage that the current
> > "long" function. I don't think we can do away with the functions, since
> > sometimes pretty complicated operations are done with the parsed output.
> > The problem I see is where to define all these functions (30-40 separate
> > parsed blocks)?
>
> I don't think defining the functions is the problem - just define them
> in the gaussian parser for example. We could do this already without
> affecting anything, and leave the dictionary of functions idea till a
> later date.

What do you mean by 'gaussian parser' - the file gaussianparser.py or the 
class? I think I didn't make myself clear - my worry is that if we define all 
these functions in the parser class, then when you go "a = 
ccopen("....").parse(); print dir(a)" you will get flooded by function names.

> > How about this: the functions would be defined in a different class, not
> > LogFile. What I'm suggesting, is to separate from the class that
> > represents a parsed log file a class that represents the parser.
> > Currently, they are one. An instance of the parser class would be an
> > attribute of the log file class, say "_parser". This object would hold
> > all the parsing functions and a dict used by the parse() method of
> > LogFile, and any other stuff needed for parsing. An additional advantage
> > is that the parser becomes less visible to the casual user, leaving only
> > parsed attributes in the log file object.
> >
> > Summarizing, I propose two layers of classes:
> > LogFile - subclasses Gaussian, GAMES, ...
> > LogFileParser - subclasses GaussianParser, GAMESSParser, ...
> > The first remains as is (at least for the user), except that everything
> > related to parsing is put in the second. Of course, instances of the
> > latter class should be attributes of the instances of the former.
>
> I think you'll have to explain this some more...I'm not sure what the
> advantage is in doing this. I guess I don't have enough time right now
> to think this through fully...

Let me sketch out the idea. Snippets of the parse class (second layer):

def GaussianParser(LogFileParser):
	(...)
	def parse_charge(self, inputfile, line):
		super(GaussianParser, self).charge = (...)
	def parse_scfenergy(self, intputfile, line):
		super(GaussianParser, self).scfenergies.append(...)
	(...)
	self.parse_dict = {
		<regexp_charge>:		self.parse_charge,
		<regexp_scfenergy>:	self.parse_scfenergy,
		(...)
	}

Now the first layer, the log file class:

def Gaussian(LogFile):
	self._parser = GaussianParser(...)
	(...)
	def parse(self, ...):
		(...)
		for line in inputfile:
			for regexp in self._parser.parse_dict:
				if re.match(regexp, line):
					self._parser.parse_dict[regexp](line, inputfile)

I hope that's clearer.

> > Waiting to hear what you think about this idea,
>
> I think I need some more time....

No problem, I'm just brainstorming on a holiday.

- Karol

-- 
written by Karol Langner
Thu May  3 14:15:39 CEST 2007

Re: [cclib-devel] strange logger bug

From: Noel O'B. <bao...@gm...> - 2007-05-03 10:26:12

Sorry, just checked. clean() doesn't remove the logger, nor should it.
If you parse the same file a second time (e.g. this is common with
GaussSum users following a geometry optimisation) you still want a
logger.

The problem you found occurs if you create two instances of parsers
with the same name. We either ignore this problem, or implement some
complicated way of tracking file names. Perhaps if we only create the
logger when parse() is called, then we can remove it when clean() is
called.

On 03/05/07, Noel O'Boyle <bao...@gm...> wrote:
> This is a "feature" of the logger (yes, I agree it's annoying, but
> it's not a bug). The problem is that now you have two loggers both
> doing the exact same thing. Loggers are distinguished by their names;
> so if you parse the same file twice, without using clear() (which I
> hope deletes the logger), you get this effect.
>
> On 03/05/07, Karol Langner <kar...@kn...> wrote:
> > While working interactively with cclib on many files, I noticed a bug related
> > to the logger. For some reason, it prints as many messages for each attribute
> > as there were instances created of the specific parser (GAMESS, Gaussian,
> > counted separately), so you get an increasingly long output form the logger.
> > The problems is not related to any recent changes, it goes as far back as
> > version 0.5, as you can see here:
> >
> > langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/basicGaussian03$ python
> > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> import cclib
> > >>> print cclib.__version__
> > 0.5
> > >>> print cclib.parser.logfileparser.__revision__
> > $Revision: 240 $
> > >>> print cclib.parser.gamessparser.__revision__
> > $Revision: 240 $
> > >>> cclib.parser.Gaussian("water_mp2.log").parse()
> > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> > [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> > [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> > [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> > [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> > [Gaussian water_mp2.log INFO] Creating attribute homos[]
> > >>> cclib.parser.Gaussian("water_mp2.log").parse()
> > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> > [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> > [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> > [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> > [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> > [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> > [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> > [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> > [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> > [Gaussian water_mp2.log INFO] Creating attribute homos[]
> > [Gaussian water_mp2.log INFO] Creating attribute homos[]
> > >>> cclib.parser.GAMESS("../../GAMESS/basicGAMESS-US/water_mp2.out").parse()
> > [GAMESS ../GAMESS/water_mp2.out INFO] Creating attribute atomcoords, atomnos
> > [GAMESS ../water_mp2.out INFO] Creating attribute nbasis
> > [GAMESS ../water_mp2.out INFO] Creating attribute homos
> > [GAMESS ../water_mp2.out INFO] Creating attribute natom
> > [GAMESS ..//water_mp2.out INFO] Creating attribute scftargets
> > [GAMESS ../water_mp2.out INFO] Creating attribute scfvalues
> > [GAMESS ../water_mp2.out INFO] Creating attribute scfenergies[]
> > [GAMESS ..//water_mp2.out INFO] Creating attributes moenergies, mosyms
> > [GAMESS ../water_mp2.out INFO] Creating attribute nmo with default value
> > [GAMESS ../water_mp2.out INFO] Creating attribute mocoeffs
> > [GAMESS ../water_mp2.out INFO] Creating attribute geotargets[] with default
> > values
> >
> > Notice how only the same parser class contributes to the repeats. The problem
> > is not in parsing, though, since it happens whenever logger.info is called.
> > Consider this from the current revision in trunk, where it is called in
> > LogFile.__setattr__ whenever an attribute from _attrlist is set that did not
> > exist previously:
> >
> > langner@slim:~/tmp/python/cclib/trunk/data$ python
> > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> import cclib
> > >>> cclib.__version__
> > '0.7'
> > >>> cclib.parser.logfileparser.__revision__
> > '$Revision: 620 $'
> > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log")
> > >>> a.mult = 1
> > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log")
> > >>> a.mult = 1
> > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> > >>> b = cclib.parser.ccopen("basicGAMESS-US/water_mp2.out")
> > >>> b.mult = 1
> > [GAMESS basicGAMESS-US/water_mp2.out INFO] Creating attribute mult: 1
> > >>> a.mult = 1
> > >>> a.clean()
> > >>> a.mult = 1
> > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> >
> > This looks pretty strange.
> >
> > - Karol
> >
> > --
> > written by Karol Langner
> > Thu May  3 13:25:54 CEST 2007
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > cclib-devel mailing list
> > ccl...@li...
> > https://lists.sourceforge.net/lists/listinfo/cclib-devel
> >
>

Re: [cclib-devel] strange logger bug

From: Noel O'B. <bao...@gm...> - 2007-05-03 10:18:19

This is a "feature" of the logger (yes, I agree it's annoying, but
it's not a bug). The problem is that now you have two loggers both
doing the exact same thing. Loggers are distinguished by their names;
so if you parse the same file twice, without using clear() (which I
hope deletes the logger), you get this effect.

On 03/05/07, Karol Langner <kar...@kn...> wrote:
> While working interactively with cclib on many files, I noticed a bug related
> to the logger. For some reason, it prints as many messages for each attribute
> as there were instances created of the specific parser (GAMESS, Gaussian,
> counted separately), so you get an increasingly long output form the logger.
> The problems is not related to any recent changes, it goes as far back as
> version 0.5, as you can see here:
>
> langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/basicGaussian03$ python
> Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import cclib
> >>> print cclib.__version__
> 0.5
> >>> print cclib.parser.logfileparser.__revision__
> $Revision: 240 $
> >>> print cclib.parser.gamessparser.__revision__
> $Revision: 240 $
> >>> cclib.parser.Gaussian("water_mp2.log").parse()
> [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> [Gaussian water_mp2.log INFO] Creating attribute homos[]
> >>> cclib.parser.Gaussian("water_mp2.log").parse()
> [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> [Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
> [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> [Gaussian water_mp2.log INFO] Creating attribute atomnos[]
> [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> [Gaussian water_mp2.log INFO] Creating attribute natom: 3
> [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
> [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> [Gaussian water_mp2.log INFO] Creating attribute nmo: 7
> [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> [Gaussian water_mp2.log INFO] Creating attribute scftargets[]
> [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> [Gaussian water_mp2.log INFO] Creating attribute scfvalues
> [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> [Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
> [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
> [Gaussian water_mp2.log INFO] Creating attribute homos[]
> [Gaussian water_mp2.log INFO] Creating attribute homos[]
> >>> cclib.parser.GAMESS("../../GAMESS/basicGAMESS-US/water_mp2.out").parse()
> [GAMESS ../GAMESS/water_mp2.out INFO] Creating attribute atomcoords, atomnos
> [GAMESS ../water_mp2.out INFO] Creating attribute nbasis
> [GAMESS ../water_mp2.out INFO] Creating attribute homos
> [GAMESS ../water_mp2.out INFO] Creating attribute natom
> [GAMESS ..//water_mp2.out INFO] Creating attribute scftargets
> [GAMESS ../water_mp2.out INFO] Creating attribute scfvalues
> [GAMESS ../water_mp2.out INFO] Creating attribute scfenergies[]
> [GAMESS ..//water_mp2.out INFO] Creating attributes moenergies, mosyms
> [GAMESS ../water_mp2.out INFO] Creating attribute nmo with default value
> [GAMESS ../water_mp2.out INFO] Creating attribute mocoeffs
> [GAMESS ../water_mp2.out INFO] Creating attribute geotargets[] with default
> values
>
> Notice how only the same parser class contributes to the repeats. The problem
> is not in parsing, though, since it happens whenever logger.info is called.
> Consider this from the current revision in trunk, where it is called in
> LogFile.__setattr__ whenever an attribute from _attrlist is set that did not
> exist previously:
>
> langner@slim:~/tmp/python/cclib/trunk/data$ python
> Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
> [GCC 3.4.6 (Debian 3.4.6-5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import cclib
> >>> cclib.__version__
> '0.7'
> >>> cclib.parser.logfileparser.__revision__
> '$Revision: 620 $'
> >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log")
> >>> a.mult = 1
> [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log")
> >>> a.mult = 1
> [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> >>> b = cclib.parser.ccopen("basicGAMESS-US/water_mp2.out")
> >>> b.mult = 1
> [GAMESS basicGAMESS-US/water_mp2.out INFO] Creating attribute mult: 1
> >>> a.mult = 1
> >>> a.clean()
> >>> a.mult = 1
> [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
> [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
>
> This looks pretty strange.
>
> - Karol
>
> --
> written by Karol Langner
> Thu May  3 13:25:54 CEST 2007
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> cclib-devel mailing list
> ccl...@li...
> https://lists.sourceforge.net/lists/listinfo/cclib-devel
>

[cclib-devel] strange logger bug

From: Karol L. <kar...@kn...> - 2007-05-03 10:13:10

While working interactively with cclib on many files, I noticed a bug related 
to the logger. For some reason, it prints as many messages for each attribute 
as there were instances created of the specific parser (GAMESS, Gaussian, 
counted separately), so you get an increasingly long output form the logger. 
The problems is not related to any recent changes, it goes as far back as 
version 0.5, as you can see here:

langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/basicGaussian03$ python
Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) 
[GCC 3.4.6 (Debian 3.4.6-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cclib
>>> print cclib.__version__
0.5
>>> print cclib.parser.logfileparser.__revision__
$Revision: 240 $
>>> print cclib.parser.gamessparser.__revision__
$Revision: 240 $
>>> cclib.parser.Gaussian("water_mp2.log").parse()
[Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
[Gaussian water_mp2.log INFO] Creating attribute atomnos[]
[Gaussian water_mp2.log INFO] Creating attribute natom: 3
[Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
[Gaussian water_mp2.log INFO] Creating attribute nmo: 7
[Gaussian water_mp2.log INFO] Creating attribute scftargets[]
[Gaussian water_mp2.log INFO] Creating attribute scfvalues
[Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
[Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
[Gaussian water_mp2.log INFO] Creating attribute homos[]
>>> cclib.parser.Gaussian("water_mp2.log").parse()
[Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
[Gaussian water_mp2.log INFO] Creating attribute atomcoords[]
[Gaussian water_mp2.log INFO] Creating attribute atomnos[]
[Gaussian water_mp2.log INFO] Creating attribute atomnos[]
[Gaussian water_mp2.log INFO] Creating attribute natom: 3
[Gaussian water_mp2.log INFO] Creating attribute natom: 3
[Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
[Gaussian water_mp2.log INFO] Creating attribute nbasis: 7
[Gaussian water_mp2.log INFO] Creating attribute nmo: 7
[Gaussian water_mp2.log INFO] Creating attribute nmo: 7
[Gaussian water_mp2.log INFO] Creating attribute scftargets[]
[Gaussian water_mp2.log INFO] Creating attribute scftargets[]
[Gaussian water_mp2.log INFO] Creating attribute scfvalues
[Gaussian water_mp2.log INFO] Creating attribute scfvalues
[Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
[Gaussian water_mp2.log INFO] Creating attribute scfenergies[]
[Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
[Gaussian water_mp2.log INFO] Creating attribute moenergies[[]]
[Gaussian water_mp2.log INFO] Creating attribute homos[]
[Gaussian water_mp2.log INFO] Creating attribute homos[]
>>> cclib.parser.GAMESS("../../GAMESS/basicGAMESS-US/water_mp2.out").parse()
[GAMESS ../GAMESS/water_mp2.out INFO] Creating attribute atomcoords, atomnos
[GAMESS ../water_mp2.out INFO] Creating attribute nbasis
[GAMESS ../water_mp2.out INFO] Creating attribute homos
[GAMESS ../water_mp2.out INFO] Creating attribute natom
[GAMESS ..//water_mp2.out INFO] Creating attribute scftargets
[GAMESS ../water_mp2.out INFO] Creating attribute scfvalues
[GAMESS ../water_mp2.out INFO] Creating attribute scfenergies[]
[GAMESS ..//water_mp2.out INFO] Creating attributes moenergies, mosyms
[GAMESS ../water_mp2.out INFO] Creating attribute nmo with default value
[GAMESS ../water_mp2.out INFO] Creating attribute mocoeffs
[GAMESS ../water_mp2.out INFO] Creating attribute geotargets[] with default 
values

Notice how only the same parser class contributes to the repeats. The problem 
is not in parsing, though, since it happens whenever logger.info is called. 
Consider this from the current revision in trunk, where it is called in 
LogFile.__setattr__ whenever an attribute from _attrlist is set that did not 
exist previously:

langner@slim:~/tmp/python/cclib/trunk/data$ python
Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) 
[GCC 3.4.6 (Debian 3.4.6-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cclib
>>> cclib.__version__
'0.7'
>>> cclib.parser.logfileparser.__revision__
'$Revision: 620 $'
>>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log")
>>> a.mult = 1
[Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
>>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log")
>>> a.mult = 1
[Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
[Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
>>> b = cclib.parser.ccopen("basicGAMESS-US/water_mp2.out")
>>> b.mult = 1
[GAMESS basicGAMESS-US/water_mp2.out INFO] Creating attribute mult: 1
>>> a.mult = 1
>>> a.clean()
>>> a.mult = 1
[Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1
[Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1

This looks pretty strange.

- Karol

-- 
written by Karol Langner
Thu May  3 13:25:54 CEST 2007

Re: [cclib-devel] further parser refactoring

From: Noel O'B. <bao...@gm...> - 2007-05-03 08:23:33

If you are going to think about efficiency, I'd like to see some
timings. That means something like timing the parsing of a particular
very large file (several times, and take the minimum). I don't like to
speculate too much on whether certain changes will make everything
more efficient.

For instance, what was the effect of the recent change where you
avoided calling extract() when the line was empty? It seems reasonable
that this would speed things up, but did it in fact? What's the
fastest way of testing whether a line is empty (must be
cross-platform)? And so on.

How we test each line has a large effect on efficiency. I point out
again that using line[x:y]=="jklj" is much faster than using "word in
line", or line.find(), and so these should be some of the first
targets for improving efficiency.

On 03/05/07, Karol Langner <kar...@kn...> wrote:
> Some thoughts about more refactoring to the parser...
>
> If you take a look at the parsers after the recent refactoring, it is now more
> evident that they are quite inefficient. That isn't a problem, since cclib
> isn't about efficiency, but it would be nice. For example, even something as
> simple as putting a 'return' statement at the end of each parsing block would
> speed things up (the following conditions are not evaluated). Anyway, this
> already suggests that it would be useful to break up the extract() method
> into pieces, one for each block of parsed output.

I think that there is one case where the block shouldn't return, but
in general it would be fine. However, it wouldn't speed things up that
much, so I feel it is not worth doing. If you think about it, most
lines don't match any of the 'if' statements. If each block is
executed once, and there are 10 blocks, then the number of wasteful
'if' statements will be 9 + 8 + 7 + ...  + 1 = 45.

> I've been hovering around this subject for some time, and turning it around in
> my mind. A dictionary of functions seems appropriate (with regexps or
> something as keys), and more easy to manage that the current "long" function.
> I don't think we can do away with the functions, since sometimes pretty
> complicated operations are done with the parsed output. The problem I see is
> where to define all these functions (30-40 separate parsed blocks)?

I don't think defining the functions is the problem - just define them
in the gaussian parser for example. We could do this already without
affecting anything, and leave the dictionary of functions idea till a
later date.

> How about this: the functions would be defined in a different class, not
> LogFile. What I'm suggesting, is to separate from the class that represents a
> parsed log file a class that represents the parser. Currently, they are one.
> An instance of the parser class would be an attribute of the log file class,
> say "_parser". This object would hold all the parsing functions and a dict
> used by the parse() method of LogFile, and any other stuff needed for
> parsing. An additional advantage is that the parser becomes less visible to
> the casual user, leaving only parsed attributes in the log file object.
>
> Summarizing, I propose two layers of classes:
> LogFile - subclasses Gaussian, GAMES, ...
> LogFileParser - subclasses GaussianParser, GAMESSParser, ...
> The first remains as is (at least for the user), except that everything
> related to parsing is put in the second. Of course, instances of the latter
> class should be attributes of the instances of the former.

I think you'll have to explain this some more...I'm not sure what the
advantage is in doing this. I guess I don't have enough time right now
to think this through fully...

> Waiting to hear what you think about this idea,
I think I need some more time....

Noel

[cclib-devel] further parser refactoring

From: Karol L. <kar...@kn...> - 2007-05-02 21:50:03

Some thoughts about more refactoring to the parser...

If you take a look at the parsers after the recent refactoring, it is now more 
evident that they are quite inefficient. That isn't a problem, since cclib 
isn't about efficiency, but it would be nice. For example, even something as 
simple as putting a 'return' statement at the end of each parsing block would 
speed things up (the following conditions are not evaluated). Anyway, this 
already suggests that it would be useful to break up the extract() method 
into pieces, one for each block of parsed output.

I've been hovering around this subject for some time, and turning it around in 
my mind. A dictionary of functions seems appropriate (with regexps or 
something as keys), and more easy to manage that the current "long" function. 
I don't think we can do away with the functions, since sometimes pretty 
complicated operations are done with the parsed output. The problem I see is 
where to define all these functions (30-40 separate parsed blocks)?

How about this: the functions would be defined in a different class, not 
LogFile. What I'm suggesting, is to separate from the class that represents a 
parsed log file a class that represents the parser. Currently, they are one. 
An instance of the parser class would be an attribute of the log file class, 
say "_parser". This object would hold all the parsing functions and a dict 
used by the parse() method of LogFile, and any other stuff needed for 
parsing. An additional advantage is that the parser becomes less visible to 
the casual user, leaving only parsed attributes in the log file object.

Summarizing, I propose two layers of classes:
LogFile - subclasses Gaussian, GAMES, ...
LogFileParser - subclasses GaussianParser, GAMESSParser, ...
The first remains as is (at least for the user), except that everything 
related to parsing is put in the second. Of course, instances of the latter 
class should be attributes of the instances of the former.

Waiting to hear what you think about this idea,
Karol

-- 
written by Karol Langner
Thu May  3 01:20:44 CEST 2007

Re: [cclib-devel] charge and multiplicity

From: Karol L. <kar...@kn...> - 2007-05-02 14:48:31

On Monday 30 April 2007 08:44, Noel O'Boyle wrote:
> I've added charge and mult (multiplicity) to cclib. Any ideas on
> better names? Perhaps spin instead of mult, or would that just be
> confusing?
>
> Noel

The names are fine by me - 'spin' would be confusing if it were still 2S+1! I 
added the attributes to the docstring of LogFile and to the wiki.

- Karol

-- 
written by Karol Langner
Wed May  2 18:46:34 CEST 2007

Re: [cclib-devel] Another Refactoring change

From: Karol L. <kar...@kn...> - 2007-04-30 08:18:55

On Monday 30 April 2007 08:54, Noel O'Boyle wrote:
> Maybe someone has already suggested this, but here's an idea to handle
> updating the "self.updateprogress".
>
> Every time an attribute is set by __setattr__, it should change the
> updateprogress string to the name of the attribute rather than having
> to do this in the subclass.

Good idea. This doesn't cover the whole range of situations 
LogFile.updateprogress is used in, though.

> Just a note:
>
> Line 197 of logfileparser:
>             self.updateprogress(inputfile, "Unsupported information",
> cupdate)
>
> If inputfile and cupdate are attributes of logfileparser, then there's
> no need to pass them in to updateprogress.

Yup.

-Karol

-- 
written by Karol Langner
Mon Apr 30 12:16:37 CEST 2007

[cclib-devel] Another Refactoring change

From: Noel O'B. <bao...@gm...> - 2007-04-30 06:54:44

Maybe someone has already suggested this, but here's an idea to handle
updating the "self.updateprogress".

Every time an attribute is set by __setattr__, it should change the
updateprogress string to the name of the attribute rather than having
to do this in the subclass.

Just a note:

Line 197 of logfileparser:
            self.updateprogress(inputfile, "Unsupported information", cupdate)

If inputfile and cupdate are attributes of logfileparser, then there's
no need to pass them in to updateprogress.

Noel

[cclib-devel] charge and multiplicity

From: Noel O'B. <bao...@gm...> - 2007-04-30 06:44:34

I've added charge and mult (multiplicity) to cclib. Any ideas on
better names? Perhaps spin instead of mult, or would that just be
confusing?

Noel

Re: [cclib-devel] Numeric -> NumPy transition

From: Noel O'B. <bao...@gm...> - 2007-04-28 10:17:44

Nice work!

On 28/04/07, Karol Langner <kar...@kn...> wrote:
> Sure. I'll just mention the more important changes in names:
>
> Numeric.matrixmultiply -> numpy.dot
> Numeric.outerproduct -> numpy.outer
> LinearAlgebra.inverse -> numpy.linalg.inv
> Numeric.typecode -> numpy.dtype
>
> The last was the problematic one, since Numeric.typecode is a function, while
> numpy.dtype is a type.
>
> On Friday 27 April 2007 15:47, Noel O'Boyle wrote:
> > Let's leave a few days "cooling off period" to look through the code
> > for anything that doesn't seem right.
> >
> > On 27/04/07, Karol Langner <kar...@kn...> wrote:
> > > I changed all references to NumPy, but left the Numeric imports (when
> > > importing numpy fails). Some method names had to be changed. All the
> > > tests and regressions go fine, although I'm worried we don't have tests
> > > for all the methods, since that's where the most Numeric functions are
> > > used.
> > >
> > > --
> > > written by Karol Langner
> > > Fri Apr 27 17:36:53 CEST 2007
>
> --
> written by Karol Langner
> Sat Apr 28 11:36:59 CEST 2007
>

Re: [cclib-devel] Numeric -> NumPy transition

From: Karol L. <kar...@kn...> - 2007-04-28 07:45:03

Sure. I'll just mention the more important changes in names:

Numeric.matrixmultiply -> numpy.dot
Numeric.outerproduct -> numpy.outer
LinearAlgebra.inverse -> numpy.linalg.inv
Numeric.typecode -> numpy.dtype

The last was the problematic one, since Numeric.typecode is a function, while 
numpy.dtype is a type.

On Friday 27 April 2007 15:47, Noel O'Boyle wrote:
> Let's leave a few days "cooling off period" to look through the code
> for anything that doesn't seem right.
>
> On 27/04/07, Karol Langner <kar...@kn...> wrote:
> > I changed all references to NumPy, but left the Numeric imports (when
> > importing numpy fails). Some method names had to be changed. All the
> > tests and regressions go fine, although I'm worried we don't have tests
> > for all the methods, since that's where the most Numeric functions are
> > used.
> >
> > --
> > written by Karol Langner
> > Fri Apr 27 17:36:53 CEST 2007

-- 
written by Karol Langner
Sat Apr 28 11:36:59 CEST 2007

Re: [cclib-devel] Numeric -> NumPy transition

From: Noel O'B. <bao...@gm...> - 2007-04-27 13:47:25

Let's leave a few days "cooling off period" to look through the code
for anything that doesn't seem right.

On 27/04/07, Karol Langner <kar...@kn...> wrote:
> I changed all references to NumPy, but left the Numeric imports (when
> importing numpy fails). Some method names had to be changed. All the tests
> and regressions go fine, although I'm worried we don't have tests for all the
> methods, since that's where the most Numeric functions are used.
>
> - Karol
>
> --
> written by Karol Langner
> Fri Apr 27 17:36:53 CEST 2007
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> cclib-devel mailing list
> ccl...@li...
> https://lists.sourceforge.net/lists/listinfo/cclib-devel
>

[cclib-devel] Numeric -> NumPy transition

From: Karol L. <kar...@kn...> - 2007-04-27 13:41:15

I changed all references to NumPy, but left the Numeric imports (when 
importing numpy fails). Some method names had to be changed. All the tests 
and regressions go fine, although I'm worried we don't have tests for all the 
methods, since that's where the most Numeric functions are used.

- Karol

-- 
written by Karol Langner
Fri Apr 27 17:36:53 CEST 2007

17 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 37 38 39 40 41 .. 59 > >> (Page 39 of 59)