This list is closed, nobody may subscribe to it.
2006 |
Jan
|
Feb
|
Mar
(7) |
Apr
(30) |
May
(42) |
Jun
(24) |
Jul
(17) |
Aug
(11) |
Sep
(37) |
Oct
(39) |
Nov
(17) |
Dec
(10) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(64) |
Feb
(90) |
Mar
(89) |
Apr
(24) |
May
(23) |
Jun
(44) |
Jul
(74) |
Aug
(40) |
Sep
(32) |
Oct
(31) |
Nov
(27) |
Dec
|
2008 |
Jan
|
Feb
(7) |
Mar
(10) |
Apr
(7) |
May
(16) |
Jun
(4) |
Jul
(8) |
Aug
|
Sep
(13) |
Oct
(6) |
Nov
|
Dec
|
2009 |
Jan
(1) |
Feb
(9) |
Mar
(5) |
Apr
(6) |
May
(5) |
Jun
(13) |
Jul
(11) |
Aug
(17) |
Sep
(3) |
Oct
(11) |
Nov
(9) |
Dec
(15) |
2010 |
Jan
(14) |
Feb
(15) |
Mar
(10) |
Apr
(14) |
May
|
Jun
(10) |
Jul
|
Aug
(12) |
Sep
(4) |
Oct
(3) |
Nov
|
Dec
(3) |
2011 |
Jan
(20) |
Feb
(7) |
Mar
(22) |
Apr
(14) |
May
(2) |
Jun
|
Jul
(13) |
Aug
(4) |
Sep
(1) |
Oct
|
Nov
(6) |
Dec
(3) |
2012 |
Jan
(7) |
Feb
(5) |
Mar
(7) |
Apr
(23) |
May
|
Jun
|
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(12) |
Nov
(13) |
Dec
(3) |
2013 |
Jan
(8) |
Feb
(17) |
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(5) |
Sep
(6) |
Oct
(9) |
Nov
(5) |
Dec
(22) |
2014 |
Jan
(4) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(3) |
Jul
|
Aug
(15) |
Sep
(3) |
Oct
(1) |
Nov
(18) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
(7) |
Oct
|
Nov
(1) |
Dec
(1) |
2016 |
Jan
(1) |
Feb
(2) |
Mar
(3) |
Apr
(5) |
May
(3) |
Jun
(1) |
Jul
(3) |
Aug
(1) |
Sep
|
Oct
(3) |
Nov
(11) |
Dec
(12) |
2017 |
Jan
(4) |
Feb
(7) |
Mar
|
Apr
(5) |
May
(5) |
Jun
|
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(3) |
Nov
(2) |
Dec
(1) |
2018 |
Jan
(1) |
Feb
(6) |
Mar
(17) |
Apr
(8) |
May
|
Jun
|
Jul
(2) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2019 |
Jan
(2) |
Feb
(5) |
Mar
(18) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
(2) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2021 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Karol L. <kar...@kn...> - 2007-05-15 23:16:19
|
On Wednesday 16 May 2007 00:50, Noel O'Boyle wrote: > It's likely that all algorithms have been broken - I assume I'm going > to have similar problems with GaussSum. It's best to make a clean > break with Numeric as this point though, as it's no longer available > for Python 2.5 on windows for example, which about 50% of people are > now using. > > We will fix all these issues before the next release (promise!). Not all algorithms are broken, the MPA bug is related to me copy-pasting the wrong function name. I'm all for a clean break, although supporting Numeric is not an issue in cclib, since no functionality unique for numpy is used yet. And if it's not a problem, why not keep it for some time? > On 15/05/07, Adam Tenderholt <a-t...@st...> wrote: > > The Mulliken and C-squared population analysis are broken with > > revision 624. I haven't really explored it other than noticing that > > the numbers are way off and that it looks like every aoresult for a > > given MO have the same number (~1). > > > > Adam Thanks for pointing these out... the bug with MPA is obvious to me now that I look at the diff - a misclick. I'm just about to commit the fix, along with a test for MPA (this probably wouldn't have been overlooked if there had been an MPA test before). I'll look at the CSPA problem after that. - Karol -- written by Karol Langner Wed May 16 01:08:30 CEST 2007 |
From: Noel O'B. <bao...@gm...> - 2007-05-15 22:50:05
|
It's likely that all algorithms have been broken - I assume I'm going to have similar problems with GaussSum. It's best to make a clean break with Numeric as this point though, as it's no longer available for Python 2.5 on windows for example, which about 50% of people are now using. We will fix all these issues before the next release (promise!). Noel On 15/05/07, Adam Tenderholt <a-t...@st...> wrote: > > As fas as I can tell, everything in cclib still works with Numeric. > > I haven't > > actaully run the tests until now (just commited a little hack to > > get the test > > code working, though), but they do finish fine. The regression > > tests are also > > OK. It is wise, however, to drop Numeric at some point, since it is > > not > > supported anymore. > > The Mulliken and C-squared population analysis are broken with > revision 624. I haven't really explored it other than noticing that > the numbers are way off and that it looks like every aoresult for a > given MO have the same number (~1). > > Adam > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > cclib-devel mailing list > ccl...@li... > https://lists.sourceforge.net/lists/listinfo/cclib-devel > |
From: Adam T. <a-t...@st...> - 2007-05-15 20:37:34
|
> As fas as I can tell, everything in cclib still works with Numeric. > I haven't > actaully run the tests until now (just commited a little hack to > get the test > code working, though), but they do finish fine. The regression > tests are also > OK. It is wise, however, to drop Numeric at some point, since it is > not > supported anymore. The Mulliken and C-squared population analysis are broken with revision 624. I haven't really explored it other than noticing that the numbers are way off and that it looks like every aoresult for a given MO have the same number (~1). Adam |
From: Noel O'B. <bao...@gm...> - 2007-05-15 09:01:06
|
On 07/05/07, Adam Tenderholt <a-t...@st...> wrote: > One of my labmates has a Gaussian 98 frequency calculation that cclib > chokes on. The issue is the assert nbasis == self.nbasis found on > line 532 of the gaussian parser. For some reason, it says it uses 290 > basis functions at one point and then says there are 292 basis > functions. It has the following route section: > > #P UBP86/LanL2DZ 5D 7F SCF(MaxCycle=500,conver=8) Pop=(NPA) Freq IOP > (7/33=1) guess=read geom=check > > Have we seen this before? I have no idea what's going on and the 7/33 > IOP doesn't exist in the Gaussian 03 Documentation. It would be useful to have the actual input file, although I suspect that that is not going to help much. Also the original input file for the geometry optimisation, e.g. did the geo-opt use exactly the same settings for basis set (there is a "guess=read" so it's possible that this might have an effect)? The quick fix for your friend is to remove the second NBasis line from the log file (note: I haven't tested this). The G98 docs are available on the web, but don't contain 7/33. However, some googling shows that it is simply related to the output of the force constant matrix. Actually, I now notice that the error occurs in the "Dens" section, probably due to "DoDens=T"@3432, and there's a "ToCart=T". The following matrix is 292x290 and is referred to as a transformation matrix. So it sounds like there's a transformation from something to Cartesian, which causes the basis set number to increase from 290 to 292. This may be du to NBO or due to the fact that 5D and 7F were used up till that point, and they cannot be applied to the density (??). What to do about this problem I'm not sure...if your colleague had used iop(3/33=1,3/36=-1) and pop=full which figure would have been used for the matrices? 290 or 292. I think we need to reproduce this for dvb, and see what the consequences are for the data that we extract. > Adam > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > cclib-devel mailing list > ccl...@li... > https://lists.sourceforge.net/lists/listinfo/cclib-devel > |
From: Karol L. <kar...@kn...> - 2007-05-10 18:29:42
|
On Monday 07 May 2007 20:44, Adam Tenderholt wrote: > So does this mean that Numeric will no longer work and NumPy is now > required? Just curious... > > Adam As fas as I can tell, everything in cclib still works with Numeric. I haven't actaully run the tests until now (just commited a little hack to get the test code working, though), but they do finish fine. The regression tests are also OK. It is wise, however, to drop Numeric at some point, since it is not supported anymore. - Karol -- written by Karol Langner Thu May 10 20:25:26 CEST 2007 |
From: Adam T. <a-t...@st...> - 2007-05-07 21:29:36
|
One of my labmates has a Gaussian 98 frequency calculation that cclib chokes on. The issue is the assert nbasis == self.nbasis found on line 532 of the gaussian parser. For some reason, it says it uses 290 basis functions at one point and then says there are 292 basis functions. It has the following route section: #P UBP86/LanL2DZ 5D 7F SCF(MaxCycle=500,conver=8) Pop=(NPA) Freq IOP (7/33=1) guess=read geom=check Have we seen this before? I have no idea what's going on and the 7/33 IOP doesn't exist in the Gaussian 03 Documentation. Adam |
From: Adam T. <a-t...@st...> - 2007-05-07 18:44:31
|
So does this mean that Numeric will no longer work and NumPy is now required? Just curious... Adam On Apr 28, 2007, at 3:17 AM, Noel O'Boyle wrote: > Nice work! > > On 28/04/07, Karol Langner <kar...@kn...> wrote: >> Sure. I'll just mention the more important changes in names: >> >> Numeric.matrixmultiply -> numpy.dot >> Numeric.outerproduct -> numpy.outer >> LinearAlgebra.inverse -> numpy.linalg.inv >> Numeric.typecode -> numpy.dtype >> >> The last was the problematic one, since Numeric.typecode is a >> function, while >> numpy.dtype is a type. >> >> On Friday 27 April 2007 15:47, Noel O'Boyle wrote: >>> Let's leave a few days "cooling off period" to look through the code >>> for anything that doesn't seem right. >>> >>> On 27/04/07, Karol Langner <kar...@kn...> wrote: >>>> I changed all references to NumPy, but left the Numeric imports >>>> (when >>>> importing numpy fails). Some method names had to be changed. All >>>> the >>>> tests and regressions go fine, although I'm worried we don't >>>> have tests >>>> for all the methods, since that's where the most Numeric >>>> functions are >>>> used. >>>> >>>> -- >>>> written by Karol Langner >>>> Fri Apr 27 17:36:53 CEST 2007 >> >> -- >> written by Karol Langner >> Sat Apr 28 11:36:59 CEST 2007 >> > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > cclib-devel mailing list > ccl...@li... > https://lists.sourceforge.net/lists/listinfo/cclib-devel |
From: Noel O'B. <bao...@gm...> - 2007-05-07 08:45:27
|
> A simple little test shows that find() is in fact the worst, but "word in > line" is at least comparable to "line[i:j] == word": > >>> import timeit > >>> t1 = timeit.Timer("'a' in 'abcdefg'") > >>> t2 = timeit.Timer("'abcdefg'[:1] == 'a'") > >>> t3 = timeit.Timer("'abcdefg'.find('a')") > >>> min(t1.repeat(repeat=100, number=1000000)) > 0.18727612495422363 > >>> min(t2.repeat(repeat=100, number=1000000)) > 0.3044281005859375 > >>> min(t3.repeat(repeat=100, number=1000000)) > 0.7338860034942627 I was surprised by this, but it's the same for me. However, my earlier experiments showed 'in' to be the worst, and that is because most lines don't match the expression. I will show some timings for a large log file when I get a chance. Noel |
From: Karol L. <kar...@kn...> - 2007-05-07 08:28:09
|
On Thursday 03 May 2007 13:21, Noel O'Boyle wrote: > > > How we test each line has a large effect on efficiency. I point out > > > again that using line[x:y]=="jklj" is much faster than using "word in > > > line", or line.find(), and so these should be some of the first > > > targets for improving efficiency. >> > > langner@slim:~/tmp/python/cclib/trunk/data/GAMESS/basicGAMESS-US$ python > > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > > > >>> import cclib > > >>> a = cclib.parser.ccopen("C_bigbasis.out") > > >>> import profile > > >>> profile.run("a.parse()", "parse.prof") > > >>> import pstats > > >>> s = pstats.Stats("parse.prof") > > >>> s.sort_stats("time") > > >>> s.print_stats(.12) > > > > Thu May 3 14:43:04 2007 parse.prof > > > > 199815 function calls in 9.069 CPU seconds > > > > Ordered by: internal time > > List reduced from 96 to 12 due to restriction <0.12> > > > > ncalls tottime percall cumtime percall filename:lineno(function) > > 8581 4.548 0.001 8.625 > > 0.001 > > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/gamess > >parser.py:90 (extract) > > 137355 3.080 0.000 3.080 0.000 :0(find) > > 20310 0.480 0.000 0.480 0.000 :0(len) > > 1 0.316 0.316 9.069 > > 9.069 > > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil > >eparser.py:165 (parse) > > 8600 0.184 0.000 0.184 0.000 :0(rstrip) > > 2143 0.140 0.000 0.140 0.000 :0(split) > > 2055 0.124 0.000 0.124 0.000 :0(range) > > 9145 0.076 0.000 0.076 0.000 :0(strip) > > 8868 0.060 0.000 0.060 > > 0.000 > > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil > >eparser.py:375 (updateprogress) > > 370 0.016 0.000 0.016 0.000 :0(append) > > 218 0.004 0.000 0.004 0.000 :0(replace) > > 31 0.004 0.000 0.032 > > 0.001 > > /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfil > >eparser.py:153 (__setattr__) > > I've never used the profiler. Can you interpret this for me in simple > language? The profiler measures the time used for function calls when executing a command. In the columns you have: ncalls - number of times a function ws called tottime - time spent in the given function (excluding time in sub-functions) percall - tottime/ncalls cumtime - time in function including subfunctions (from invocation to exit) percall (2nd) - cumtime/ncalls Now that I think about all this, though, statements such as "word in line" and "line[i:j] = word" are not measured here, since they are not function calls (the time is cumulated into the time of extract). A simple little test shows that find() is in fact the worst, but "word in line" is at least comparable to "line[i:j] == word": >>> import timeit >>> t1 = timeit.Timer("'a' in 'abcdefg'") >>> t2 = timeit.Timer("'abcdefg'[:1] == 'a'") >>> t3 = timeit.Timer("'abcdefg'.find('a')") >>> min(t1.repeat(repeat=100, number=1000000)) 0.18727612495422363 >>> min(t2.repeat(repeat=100, number=1000000)) 0.3044281005859375 >>> min(t3.repeat(repeat=100, number=1000000)) 0.7338860034942627 - Karol -- written by Karol Langner Mon May 7 11:47:55 CEST 2007 |
From: Karol L. <kar...@kn...> - 2007-05-06 23:12:24
|
It seems that logging.getlogger() returns the same instance when called with the same "logname", supposedly to not have to pass loggers around in an application that uses them. So, all parser instances in cclib of the same class were duplicating their own handlers (since each instance added another one). So a quick check that LogFile.logger.handlers is empty fixes this. I wonder if there is any advantage in having at maximum one logger for each kind of parser class there is - that's the way it is now. If not, it might be good to establish a clearer logging strategy: 1) create a global logger when cclib is loaded (or rather cclib.parser), and use only this one throughout by all parsers 2) create a new logger for any parser instance, each one with a unique name (the docs say that dots determine the hierarchy for loggers - such as a.b.c). Cheers, - Karol On Thursday 03 May 2007 12:26, Noel O'Boyle wrote: > Sorry, just checked. clean() doesn't remove the logger, nor should it. > If you parse the same file a second time (e.g. this is common with > GaussSum users following a geometry optimisation) you still want a > logger. > > The problem you found occurs if you create two instances of parsers > with the same name. We either ignore this problem, or implement some > complicated way of tracking file names. Perhaps if we only create the > logger when parse() is called, then we can remove it when clean() is > called. > > On 03/05/07, Noel O'Boyle <bao...@gm...> wrote: > > This is a "feature" of the logger (yes, I agree it's annoying, but > > it's not a bug). The problem is that now you have two loggers both > > doing the exact same thing. Loggers are distinguished by their names; > > so if you parse the same file twice, without using clear() (which I > > hope deletes the logger), you get this effect. > > > > On 03/05/07, Karol Langner <kar...@kn...> wrote: > > > While working interactively with cclib on many files, I noticed a bug > > > related to the logger. For some reason, it prints as many messages for > > > each attribute as there were instances created of the specific parser > > > (GAMESS, Gaussian, counted separately), so you get an increasingly long > > > output form the logger. The problems is not related to any recent > > > changes, it goes as far back as version 0.5, as you can see here: > > > > > > langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/basicGaussian03$ > > > python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > > > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > > > Type "help", "copyright", "credits" or "license" for more information. > > > > > > >>> import cclib > > > >>> print cclib.__version__ > > > > > > 0.5 > > > > > > >>> print cclib.parser.logfileparser.__revision__ > > > > > > $Revision: 240 $ > > > > > > >>> print cclib.parser.gamessparser.__revision__ > > > > > > $Revision: 240 $ > > > > > > >>> cclib.parser.Gaussian("water_mp2.log").parse() > > > > > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > > > [Gaussian water_mp2.log INFO] Creating attribute homos[] > > > > > > >>> cclib.parser.Gaussian("water_mp2.log").parse() > > > > > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > > > [Gaussian water_mp2.log INFO] Creating attribute homos[] > > > [Gaussian water_mp2.log INFO] Creating attribute homos[] > > > > > > >>> cclib.parser.GAMESS("../../GAMESS/basicGAMESS-US/water_mp2.out").pa > > > >>>rse() > > > > > > [GAMESS ../GAMESS/water_mp2.out INFO] Creating attribute atomcoords, > > > atomnos [GAMESS ../water_mp2.out INFO] Creating attribute nbasis > > > [GAMESS ../water_mp2.out INFO] Creating attribute homos > > > [GAMESS ../water_mp2.out INFO] Creating attribute natom > > > [GAMESS ..//water_mp2.out INFO] Creating attribute scftargets > > > [GAMESS ../water_mp2.out INFO] Creating attribute scfvalues > > > [GAMESS ../water_mp2.out INFO] Creating attribute scfenergies[] > > > [GAMESS ..//water_mp2.out INFO] Creating attributes moenergies, mosyms > > > [GAMESS ../water_mp2.out INFO] Creating attribute nmo with default > > > value [GAMESS ../water_mp2.out INFO] Creating attribute mocoeffs > > > [GAMESS ../water_mp2.out INFO] Creating attribute geotargets[] with > > > default values > > > > > > Notice how only the same parser class contributes to the repeats. The > > > problem is not in parsing, though, since it happens whenever > > > logger.info is called. Consider this from the current revision in > > > trunk, where it is called in LogFile.__setattr__ whenever an attribute > > > from _attrlist is set that did not exist previously: > > > > > > langner@slim:~/tmp/python/cclib/trunk/data$ python > > > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > > > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > > > Type "help", "copyright", "credits" or "license" for more information. > > > > > > >>> import cclib > > > >>> cclib.__version__ > > > > > > '0.7' > > > > > > >>> cclib.parser.logfileparser.__revision__ > > > > > > '$Revision: 620 $' > > > > > > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log") > > > >>> a.mult = 1 > > > > > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: > > > 1 > > > > > > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log") > > > >>> a.mult = 1 > > > > > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: > > > 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute > > > mult: 1 > > > > > > >>> b = cclib.parser.ccopen("basicGAMESS-US/water_mp2.out") > > > >>> b.mult = 1 > > > > > > [GAMESS basicGAMESS-US/water_mp2.out INFO] Creating attribute mult: 1 > > > > > > >>> a.mult = 1 > > > >>> a.clean() > > > >>> a.mult = 1 > > > > > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: > > > 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute > > > mult: 1 > > > > > > This looks pretty strange. > > > > > > - Karol > > > > > > -- > > > written by Karol Langner > > > Thu May 3 13:25:54 CEST 2007 > > > > > > ----------------------------------------------------------------------- > > >-- This SF.net email is sponsored by DB2 Express > > > Download DB2 Express C - the FREE version of DB2 express and take > > > control of your XML. No limits. Just data. Click to get it now. > > > http://sourceforge.net/powerbar/db2/ > > > _______________________________________________ > > > cclib-devel mailing list > > > ccl...@li... > > > https://lists.sourceforge.net/lists/listinfo/cclib-devel -- written by Karol Langner Mon May 7 02:51:00 CEST 2007 |
From: Noel O'B. <bao...@gm...> - 2007-05-03 11:21:30
|
On 03/05/07, Karol Langner <kar...@kn...> wrote: > On Thursday 03 May 2007 10:23, Noel O'Boyle wrote: > > For instance, what was the effect of the recent change where you > > avoided calling extract() when the line was empty? It seems reasonable > > that this would speed things up, but did it in fact? What's the > > fastest way of testing whether a line is empty (must be > > cross-platform)? And so on. > > Below, "parse_slower" is the same method as "parse" from the trunk without the > condition that checks if the line is empty. > > langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/Gaussian03$ python > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import cclib > >>> a = cclib.parser.ccopen("chn1.log.gz") > >>> import timeit > >>> t = timeit.Timer("a.clean(); a.parse()", "from __main__ import a") > >>> min(t.repeat(repeat=10,number=5)) > .... logger output .... > 0.92677688598632812 > >>> t_slower = timeit.Timer("a.clean();a.parse_slower()", "from __main__ > import a") > >>> min(t_slower.repeat(repeat=10,number=5)) > ... logger output ... > 0.92177586353772345 > > I tried a bigger file and it also had no visible effect. So... what seemed > reasonable to me was wrong. I guess that revision can be reverted :) Maybe there's a quicker way of testing for an empty line. Or even better, all lines less than 4 characters or something (although clearly we need to be careful not to skip important lines). > > How we test each line has a large effect on efficiency. I point out > > again that using line[x:y]=="jklj" is much faster than using "word in > > line", or line.find(), and so these should be some of the first > > targets for improving efficiency. > > Good point, confirmed by a profiling run: > > langner@slim:~/tmp/python/cclib/trunk/data/GAMESS/basicGAMESS-US$ python > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import cclib > >>> a = cclib.parser.ccopen("C_bigbasis.out") > >>> import profile > >>> profile.run("a.parse()", "parse.prof") > >>> import pstats > >>> s = pstats.Stats("parse.prof") > >>> s.sort_stats("time") > >>> s.print_stats(.12) > Thu May 3 14:43:04 2007 parse.prof > > 199815 function calls in 9.069 CPU seconds > > Ordered by: internal time > List reduced from 96 to 12 due to restriction <0.12> > > ncalls tottime percall cumtime percall filename:lineno(function) > 8581 4.548 0.001 8.625 > 0.001 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/gamessparser.py:90 > (extract) > 137355 3.080 0.000 3.080 0.000 :0(find) > 20310 0.480 0.000 0.480 0.000 :0(len) > 1 0.316 0.316 9.069 > 9.069 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:165 > (parse) > 8600 0.184 0.000 0.184 0.000 :0(rstrip) > 2143 0.140 0.000 0.140 0.000 :0(split) > 2055 0.124 0.000 0.124 0.000 :0(range) > 9145 0.076 0.000 0.076 0.000 :0(strip) > 8868 0.060 0.000 0.060 > 0.000 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:375 > (updateprogress) > 370 0.016 0.000 0.016 0.000 :0(append) > 218 0.004 0.000 0.004 0.000 :0(replace) > 31 0.004 0.000 0.032 > 0.001 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:153 > (__setattr__) I've never used the profiler. Can you interpret this for me in simple language? > > On 03/05/07, Karol Langner <kar...@kn...> wrote: > > > Some thoughts about more refactoring to the parser... > > > > > > If you take a look at the parsers after the recent refactoring, it is now > > > more evident that they are quite inefficient. That isn't a problem, since > > > cclib isn't about efficiency, but it would be nice. For example, even > > > something as simple as putting a 'return' statement at the end of each > > > parsing block would speed things up (the following conditions are not > > > evaluated). Anyway, this already suggests that it would be useful to > > > break up the extract() method into pieces, one for each block of parsed > > > output. > > > > I think that there is one case where the block shouldn't return, but > > in general it would be fine. However, it wouldn't speed things up that > > much, so I feel it is not worth doing. If you think about it, most > > lines don't match any of the 'if' statements. If each block is > > executed once, and there are 10 blocks, then the number of wasteful > > 'if' statements will be 9 + 8 + 7 + ... + 1 = 45. > > There are also the lines between blocks that don't match any condition, but in > principle you're right, it's not worth it. > > I've been hovering around this subject for some time, and turning it > > > around in my mind. A dictionary of functions seems appropriate (with > > > regexps or something as keys), and more easy to manage that the current > > > "long" function. I don't think we can do away with the functions, since > > > sometimes pretty complicated operations are done with the parsed output. > > > The problem I see is where to define all these functions (30-40 separate > > > parsed blocks)? > > > > I don't think defining the functions is the problem - just define them > > in the gaussian parser for example. We could do this already without > > affecting anything, and leave the dictionary of functions idea till a > > later date. > > What do you mean by 'gaussian parser' - the file gaussianparser.py or the > class? I think I didn't make myself clear - my worry is that if we define all > these functions in the parser class, then when you go "a = > ccopen("....").parse(); print dir(a)" you will get flooded by function names. Ah, ok. I see. I think I would prefer users to use help(a), rather than dir(a). I think that if you use function names starting with _ it shouldn't appear in this list, although I haven't tested this. > > > How about this: the functions would be defined in a different class, not > > > LogFile. What I'm suggesting, is to separate from the class that > > > represents a parsed log file a class that represents the parser. > > > Currently, they are one. An instance of the parser class would be an > > > attribute of the log file class, say "_parser". This object would hold > > > all the parsing functions and a dict used by the parse() method of > > > LogFile, and any other stuff needed for parsing. An additional advantage > > > is that the parser becomes less visible to the casual user, leaving only > > > parsed attributes in the log file object. > > > > > > Summarizing, I propose two layers of classes: > > > LogFile - subclasses Gaussian, GAMES, ... > > > LogFileParser - subclasses GaussianParser, GAMESSParser, ... > > > The first remains as is (at least for the user), except that everything > > > related to parsing is put in the second. Of course, instances of the > > > latter class should be attributes of the instances of the former. > > > > I think you'll have to explain this some more...I'm not sure what the > > advantage is in doing this. I guess I don't have enough time right now > > to think this through fully... > > Let me sketch out the idea. Snippets of the parse class (second layer): > > def GaussianParser(LogFileParser): > (...) > def parse_charge(self, inputfile, line): > super(GaussianParser, self).charge = (...) > def parse_scfenergy(self, intputfile, line): > super(GaussianParser, self).scfenergies.append(...) > (...) > self.parse_dict = { > <regexp_charge>: self.parse_charge, > <regexp_scfenergy>: self.parse_scfenergy, > (...) > } > > Now the first layer, the log file class: > > def Gaussian(LogFile): > self._parser = GaussianParser(...) > (...) > def parse(self, ...): > (...) > for line in inputfile: > for regexp in self._parser.parse_dict: > if re.match(regexp, line): > self._parser.parse_dict[regexp](line, inputfile) > > I hope that's clearer. OK, thanks for that. As I said, it needs more time... > No problem, I'm just brainstorming on a holiday. If you want to do something neat, you can see if you can integrate cclib with PyVib2, or whether you can get cclib to run under Jython (there has been some interest in this, more later...)...I would hate for you not to have something to do on your holiday :-) Noel |
From: Karol L. <kar...@kn...> - 2007-05-03 11:11:28
|
On Thursday 03 May 2007 10:23, Noel O'Boyle wrote: > For instance, what was the effect of the recent change where you > avoided calling extract() when the line was empty? It seems reasonable > that this would speed things up, but did it in fact? What's the > fastest way of testing whether a line is empty (must be > cross-platform)? And so on. Below, "parse_slower" is the same method as "parse" from the trunk without the condition that checks if the line is empty. langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/Gaussian03$ python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import cclib >>> a = cclib.parser.ccopen("chn1.log.gz") >>> import timeit >>> t = timeit.Timer("a.clean(); a.parse()", "from __main__ import a") >>> min(t.repeat(repeat=10,number=5)) .... logger output .... 0.92677688598632812 >>> t_slower = timeit.Timer("a.clean();a.parse_slower()", "from __main__ import a") >>> min(t_slower.repeat(repeat=10,number=5)) ... logger output ... 0.92177586353772345 I tried a bigger file and it also had no visible effect. So... what seemed reasonable to me was wrong. I guess that revision can be reverted :) > How we test each line has a large effect on efficiency. I point out > again that using line[x:y]=="jklj" is much faster than using "word in > line", or line.find(), and so these should be some of the first > targets for improving efficiency. Good point, confirmed by a profiling run: langner@slim:~/tmp/python/cclib/trunk/data/GAMESS/basicGAMESS-US$ python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import cclib >>> a = cclib.parser.ccopen("C_bigbasis.out") >>> import profile >>> profile.run("a.parse()", "parse.prof") >>> import pstats >>> s = pstats.Stats("parse.prof") >>> s.sort_stats("time") >>> s.print_stats(.12) Thu May 3 14:43:04 2007 parse.prof 199815 function calls in 9.069 CPU seconds Ordered by: internal time List reduced from 96 to 12 due to restriction <0.12> ncalls tottime percall cumtime percall filename:lineno(function) 8581 4.548 0.001 8.625 0.001 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/gamessparser.py:90 (extract) 137355 3.080 0.000 3.080 0.000 :0(find) 20310 0.480 0.000 0.480 0.000 :0(len) 1 0.316 0.316 9.069 9.069 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:165 (parse) 8600 0.184 0.000 0.184 0.000 :0(rstrip) 2143 0.140 0.000 0.140 0.000 :0(split) 2055 0.124 0.000 0.124 0.000 :0(range) 9145 0.076 0.000 0.076 0.000 :0(strip) 8868 0.060 0.000 0.060 0.000 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:375 (updateprogress) 370 0.016 0.000 0.016 0.000 :0(append) 218 0.004 0.000 0.004 0.000 :0(replace) 31 0.004 0.000 0.032 0.001 /home/langner/apps/python/lib/python2.5/site-packages/cclib/parser/logfileparser.py:153 (__setattr__) > On 03/05/07, Karol Langner <kar...@kn...> wrote: > > Some thoughts about more refactoring to the parser... > > > > If you take a look at the parsers after the recent refactoring, it is now > > more evident that they are quite inefficient. That isn't a problem, since > > cclib isn't about efficiency, but it would be nice. For example, even > > something as simple as putting a 'return' statement at the end of each > > parsing block would speed things up (the following conditions are not > > evaluated). Anyway, this already suggests that it would be useful to > > break up the extract() method into pieces, one for each block of parsed > > output. > > I think that there is one case where the block shouldn't return, but > in general it would be fine. However, it wouldn't speed things up that > much, so I feel it is not worth doing. If you think about it, most > lines don't match any of the 'if' statements. If each block is > executed once, and there are 10 blocks, then the number of wasteful > 'if' statements will be 9 + 8 + 7 + ... + 1 = 45. There are also the lines between blocks that don't match any condition, but in principle you're right, it's not worth it. > I've been hovering around this subject for some time, and turning it > > around in my mind. A dictionary of functions seems appropriate (with > > regexps or something as keys), and more easy to manage that the current > > "long" function. I don't think we can do away with the functions, since > > sometimes pretty complicated operations are done with the parsed output. > > The problem I see is where to define all these functions (30-40 separate > > parsed blocks)? > > I don't think defining the functions is the problem - just define them > in the gaussian parser for example. We could do this already without > affecting anything, and leave the dictionary of functions idea till a > later date. What do you mean by 'gaussian parser' - the file gaussianparser.py or the class? I think I didn't make myself clear - my worry is that if we define all these functions in the parser class, then when you go "a = ccopen("....").parse(); print dir(a)" you will get flooded by function names. > > How about this: the functions would be defined in a different class, not > > LogFile. What I'm suggesting, is to separate from the class that > > represents a parsed log file a class that represents the parser. > > Currently, they are one. An instance of the parser class would be an > > attribute of the log file class, say "_parser". This object would hold > > all the parsing functions and a dict used by the parse() method of > > LogFile, and any other stuff needed for parsing. An additional advantage > > is that the parser becomes less visible to the casual user, leaving only > > parsed attributes in the log file object. > > > > Summarizing, I propose two layers of classes: > > LogFile - subclasses Gaussian, GAMES, ... > > LogFileParser - subclasses GaussianParser, GAMESSParser, ... > > The first remains as is (at least for the user), except that everything > > related to parsing is put in the second. Of course, instances of the > > latter class should be attributes of the instances of the former. > > I think you'll have to explain this some more...I'm not sure what the > advantage is in doing this. I guess I don't have enough time right now > to think this through fully... Let me sketch out the idea. Snippets of the parse class (second layer): def GaussianParser(LogFileParser): (...) def parse_charge(self, inputfile, line): super(GaussianParser, self).charge = (...) def parse_scfenergy(self, intputfile, line): super(GaussianParser, self).scfenergies.append(...) (...) self.parse_dict = { <regexp_charge>: self.parse_charge, <regexp_scfenergy>: self.parse_scfenergy, (...) } Now the first layer, the log file class: def Gaussian(LogFile): self._parser = GaussianParser(...) (...) def parse(self, ...): (...) for line in inputfile: for regexp in self._parser.parse_dict: if re.match(regexp, line): self._parser.parse_dict[regexp](line, inputfile) I hope that's clearer. > > Waiting to hear what you think about this idea, > > I think I need some more time.... No problem, I'm just brainstorming on a holiday. - Karol -- written by Karol Langner Thu May 3 14:15:39 CEST 2007 |
From: Noel O'B. <bao...@gm...> - 2007-05-03 10:26:12
|
Sorry, just checked. clean() doesn't remove the logger, nor should it. If you parse the same file a second time (e.g. this is common with GaussSum users following a geometry optimisation) you still want a logger. The problem you found occurs if you create two instances of parsers with the same name. We either ignore this problem, or implement some complicated way of tracking file names. Perhaps if we only create the logger when parse() is called, then we can remove it when clean() is called. On 03/05/07, Noel O'Boyle <bao...@gm...> wrote: > This is a "feature" of the logger (yes, I agree it's annoying, but > it's not a bug). The problem is that now you have two loggers both > doing the exact same thing. Loggers are distinguished by their names; > so if you parse the same file twice, without using clear() (which I > hope deletes the logger), you get this effect. > > On 03/05/07, Karol Langner <kar...@kn...> wrote: > > While working interactively with cclib on many files, I noticed a bug related > > to the logger. For some reason, it prints as many messages for each attribute > > as there were instances created of the specific parser (GAMESS, Gaussian, > > counted separately), so you get an increasingly long output form the logger. > > The problems is not related to any recent changes, it goes as far back as > > version 0.5, as you can see here: > > > > langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/basicGaussian03$ python > > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import cclib > > >>> print cclib.__version__ > > 0.5 > > >>> print cclib.parser.logfileparser.__revision__ > > $Revision: 240 $ > > >>> print cclib.parser.gamessparser.__revision__ > > $Revision: 240 $ > > >>> cclib.parser.Gaussian("water_mp2.log").parse() > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > > [Gaussian water_mp2.log INFO] Creating attribute homos[] > > >>> cclib.parser.Gaussian("water_mp2.log").parse() > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > > [Gaussian water_mp2.log INFO] Creating attribute homos[] > > [Gaussian water_mp2.log INFO] Creating attribute homos[] > > >>> cclib.parser.GAMESS("../../GAMESS/basicGAMESS-US/water_mp2.out").parse() > > [GAMESS ../GAMESS/water_mp2.out INFO] Creating attribute atomcoords, atomnos > > [GAMESS ../water_mp2.out INFO] Creating attribute nbasis > > [GAMESS ../water_mp2.out INFO] Creating attribute homos > > [GAMESS ../water_mp2.out INFO] Creating attribute natom > > [GAMESS ..//water_mp2.out INFO] Creating attribute scftargets > > [GAMESS ../water_mp2.out INFO] Creating attribute scfvalues > > [GAMESS ../water_mp2.out INFO] Creating attribute scfenergies[] > > [GAMESS ..//water_mp2.out INFO] Creating attributes moenergies, mosyms > > [GAMESS ../water_mp2.out INFO] Creating attribute nmo with default value > > [GAMESS ../water_mp2.out INFO] Creating attribute mocoeffs > > [GAMESS ../water_mp2.out INFO] Creating attribute geotargets[] with default > > values > > > > Notice how only the same parser class contributes to the repeats. The problem > > is not in parsing, though, since it happens whenever logger.info is called. > > Consider this from the current revision in trunk, where it is called in > > LogFile.__setattr__ whenever an attribute from _attrlist is set that did not > > exist previously: > > > > langner@slim:~/tmp/python/cclib/trunk/data$ python > > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import cclib > > >>> cclib.__version__ > > '0.7' > > >>> cclib.parser.logfileparser.__revision__ > > '$Revision: 620 $' > > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log") > > >>> a.mult = 1 > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log") > > >>> a.mult = 1 > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > > >>> b = cclib.parser.ccopen("basicGAMESS-US/water_mp2.out") > > >>> b.mult = 1 > > [GAMESS basicGAMESS-US/water_mp2.out INFO] Creating attribute mult: 1 > > >>> a.mult = 1 > > >>> a.clean() > > >>> a.mult = 1 > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > > > > This looks pretty strange. > > > > - Karol > > > > -- > > written by Karol Langner > > Thu May 3 13:25:54 CEST 2007 > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > cclib-devel mailing list > > ccl...@li... > > https://lists.sourceforge.net/lists/listinfo/cclib-devel > > > |
From: Noel O'B. <bao...@gm...> - 2007-05-03 10:18:19
|
This is a "feature" of the logger (yes, I agree it's annoying, but it's not a bug). The problem is that now you have two loggers both doing the exact same thing. Loggers are distinguished by their names; so if you parse the same file twice, without using clear() (which I hope deletes the logger), you get this effect. On 03/05/07, Karol Langner <kar...@kn...> wrote: > While working interactively with cclib on many files, I noticed a bug related > to the logger. For some reason, it prints as many messages for each attribute > as there were instances created of the specific parser (GAMESS, Gaussian, > counted separately), so you get an increasingly long output form the logger. > The problems is not related to any recent changes, it goes as far back as > version 0.5, as you can see here: > > langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/basicGaussian03$ python > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import cclib > >>> print cclib.__version__ > 0.5 > >>> print cclib.parser.logfileparser.__revision__ > $Revision: 240 $ > >>> print cclib.parser.gamessparser.__revision__ > $Revision: 240 $ > >>> cclib.parser.Gaussian("water_mp2.log").parse() > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > [Gaussian water_mp2.log INFO] Creating attribute homos[] > >>> cclib.parser.Gaussian("water_mp2.log").parse() > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > [Gaussian water_mp2.log INFO] Creating attribute atomnos[] > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > [Gaussian water_mp2.log INFO] Creating attribute natom: 3 > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > [Gaussian water_mp2.log INFO] Creating attribute scftargets[] > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > [Gaussian water_mp2.log INFO] Creating attribute scfvalues > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] > [Gaussian water_mp2.log INFO] Creating attribute homos[] > [Gaussian water_mp2.log INFO] Creating attribute homos[] > >>> cclib.parser.GAMESS("../../GAMESS/basicGAMESS-US/water_mp2.out").parse() > [GAMESS ../GAMESS/water_mp2.out INFO] Creating attribute atomcoords, atomnos > [GAMESS ../water_mp2.out INFO] Creating attribute nbasis > [GAMESS ../water_mp2.out INFO] Creating attribute homos > [GAMESS ../water_mp2.out INFO] Creating attribute natom > [GAMESS ..//water_mp2.out INFO] Creating attribute scftargets > [GAMESS ../water_mp2.out INFO] Creating attribute scfvalues > [GAMESS ../water_mp2.out INFO] Creating attribute scfenergies[] > [GAMESS ..//water_mp2.out INFO] Creating attributes moenergies, mosyms > [GAMESS ../water_mp2.out INFO] Creating attribute nmo with default value > [GAMESS ../water_mp2.out INFO] Creating attribute mocoeffs > [GAMESS ../water_mp2.out INFO] Creating attribute geotargets[] with default > values > > Notice how only the same parser class contributes to the repeats. The problem > is not in parsing, though, since it happens whenever logger.info is called. > Consider this from the current revision in trunk, where it is called in > LogFile.__setattr__ whenever an attribute from _attrlist is set that did not > exist previously: > > langner@slim:~/tmp/python/cclib/trunk/data$ python > Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) > [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import cclib > >>> cclib.__version__ > '0.7' > >>> cclib.parser.logfileparser.__revision__ > '$Revision: 620 $' > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log") > >>> a.mult = 1 > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log") > >>> a.mult = 1 > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > >>> b = cclib.parser.ccopen("basicGAMESS-US/water_mp2.out") > >>> b.mult = 1 > [GAMESS basicGAMESS-US/water_mp2.out INFO] Creating attribute mult: 1 > >>> a.mult = 1 > >>> a.clean() > >>> a.mult = 1 > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 > > This looks pretty strange. > > - Karol > > -- > written by Karol Langner > Thu May 3 13:25:54 CEST 2007 > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > cclib-devel mailing list > ccl...@li... > https://lists.sourceforge.net/lists/listinfo/cclib-devel > |
From: Karol L. <kar...@kn...> - 2007-05-03 10:13:10
|
While working interactively with cclib on many files, I noticed a bug related to the logger. For some reason, it prints as many messages for each attribute as there were instances created of the specific parser (GAMESS, Gaussian, counted separately), so you get an increasingly long output form the logger. The problems is not related to any recent changes, it goes as far back as version 0.5, as you can see here: langner@slim:~/tmp/python/cclib/trunk/data/Gaussian/basicGaussian03$ python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import cclib >>> print cclib.__version__ 0.5 >>> print cclib.parser.logfileparser.__revision__ $Revision: 240 $ >>> print cclib.parser.gamessparser.__revision__ $Revision: 240 $ >>> cclib.parser.Gaussian("water_mp2.log").parse() [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] [Gaussian water_mp2.log INFO] Creating attribute atomnos[] [Gaussian water_mp2.log INFO] Creating attribute natom: 3 [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 [Gaussian water_mp2.log INFO] Creating attribute scftargets[] [Gaussian water_mp2.log INFO] Creating attribute scfvalues [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] [Gaussian water_mp2.log INFO] Creating attribute homos[] >>> cclib.parser.Gaussian("water_mp2.log").parse() [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] [Gaussian water_mp2.log INFO] Creating attribute atomcoords[] [Gaussian water_mp2.log INFO] Creating attribute atomnos[] [Gaussian water_mp2.log INFO] Creating attribute atomnos[] [Gaussian water_mp2.log INFO] Creating attribute natom: 3 [Gaussian water_mp2.log INFO] Creating attribute natom: 3 [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 [Gaussian water_mp2.log INFO] Creating attribute nbasis: 7 [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 [Gaussian water_mp2.log INFO] Creating attribute nmo: 7 [Gaussian water_mp2.log INFO] Creating attribute scftargets[] [Gaussian water_mp2.log INFO] Creating attribute scftargets[] [Gaussian water_mp2.log INFO] Creating attribute scfvalues [Gaussian water_mp2.log INFO] Creating attribute scfvalues [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] [Gaussian water_mp2.log INFO] Creating attribute scfenergies[] [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] [Gaussian water_mp2.log INFO] Creating attribute moenergies[[]] [Gaussian water_mp2.log INFO] Creating attribute homos[] [Gaussian water_mp2.log INFO] Creating attribute homos[] >>> cclib.parser.GAMESS("../../GAMESS/basicGAMESS-US/water_mp2.out").parse() [GAMESS ../GAMESS/water_mp2.out INFO] Creating attribute atomcoords, atomnos [GAMESS ../water_mp2.out INFO] Creating attribute nbasis [GAMESS ../water_mp2.out INFO] Creating attribute homos [GAMESS ../water_mp2.out INFO] Creating attribute natom [GAMESS ..//water_mp2.out INFO] Creating attribute scftargets [GAMESS ../water_mp2.out INFO] Creating attribute scfvalues [GAMESS ../water_mp2.out INFO] Creating attribute scfenergies[] [GAMESS ..//water_mp2.out INFO] Creating attributes moenergies, mosyms [GAMESS ../water_mp2.out INFO] Creating attribute nmo with default value [GAMESS ../water_mp2.out INFO] Creating attribute mocoeffs [GAMESS ../water_mp2.out INFO] Creating attribute geotargets[] with default values Notice how only the same parser class contributes to the repeats. The problem is not in parsing, though, since it happens whenever logger.info is called. Consider this from the current revision in trunk, where it is called in LogFile.__setattr__ whenever an attribute from _attrlist is set that did not exist previously: langner@slim:~/tmp/python/cclib/trunk/data$ python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import cclib >>> cclib.__version__ '0.7' >>> cclib.parser.logfileparser.__revision__ '$Revision: 620 $' >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log") >>> a.mult = 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 >>> a = cclib.parser.ccopen("basicGaussian03/water_mp2.log") >>> a.mult = 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 >>> b = cclib.parser.ccopen("basicGAMESS-US/water_mp2.out") >>> b.mult = 1 [GAMESS basicGAMESS-US/water_mp2.out INFO] Creating attribute mult: 1 >>> a.mult = 1 >>> a.clean() >>> a.mult = 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 [Gaussian basicGaussian03/water_mp2.log INFO] Creating attribute mult: 1 This looks pretty strange. - Karol -- written by Karol Langner Thu May 3 13:25:54 CEST 2007 |
From: Noel O'B. <bao...@gm...> - 2007-05-03 08:23:33
|
If you are going to think about efficiency, I'd like to see some timings. That means something like timing the parsing of a particular very large file (several times, and take the minimum). I don't like to speculate too much on whether certain changes will make everything more efficient. For instance, what was the effect of the recent change where you avoided calling extract() when the line was empty? It seems reasonable that this would speed things up, but did it in fact? What's the fastest way of testing whether a line is empty (must be cross-platform)? And so on. How we test each line has a large effect on efficiency. I point out again that using line[x:y]=="jklj" is much faster than using "word in line", or line.find(), and so these should be some of the first targets for improving efficiency. On 03/05/07, Karol Langner <kar...@kn...> wrote: > Some thoughts about more refactoring to the parser... > > If you take a look at the parsers after the recent refactoring, it is now more > evident that they are quite inefficient. That isn't a problem, since cclib > isn't about efficiency, but it would be nice. For example, even something as > simple as putting a 'return' statement at the end of each parsing block would > speed things up (the following conditions are not evaluated). Anyway, this > already suggests that it would be useful to break up the extract() method > into pieces, one for each block of parsed output. I think that there is one case where the block shouldn't return, but in general it would be fine. However, it wouldn't speed things up that much, so I feel it is not worth doing. If you think about it, most lines don't match any of the 'if' statements. If each block is executed once, and there are 10 blocks, then the number of wasteful 'if' statements will be 9 + 8 + 7 + ... + 1 = 45. > I've been hovering around this subject for some time, and turning it around in > my mind. A dictionary of functions seems appropriate (with regexps or > something as keys), and more easy to manage that the current "long" function. > I don't think we can do away with the functions, since sometimes pretty > complicated operations are done with the parsed output. The problem I see is > where to define all these functions (30-40 separate parsed blocks)? I don't think defining the functions is the problem - just define them in the gaussian parser for example. We could do this already without affecting anything, and leave the dictionary of functions idea till a later date. > How about this: the functions would be defined in a different class, not > LogFile. What I'm suggesting, is to separate from the class that represents a > parsed log file a class that represents the parser. Currently, they are one. > An instance of the parser class would be an attribute of the log file class, > say "_parser". This object would hold all the parsing functions and a dict > used by the parse() method of LogFile, and any other stuff needed for > parsing. An additional advantage is that the parser becomes less visible to > the casual user, leaving only parsed attributes in the log file object. > > Summarizing, I propose two layers of classes: > LogFile - subclasses Gaussian, GAMES, ... > LogFileParser - subclasses GaussianParser, GAMESSParser, ... > The first remains as is (at least for the user), except that everything > related to parsing is put in the second. Of course, instances of the latter > class should be attributes of the instances of the former. I think you'll have to explain this some more...I'm not sure what the advantage is in doing this. I guess I don't have enough time right now to think this through fully... > Waiting to hear what you think about this idea, I think I need some more time.... Noel |
From: Karol L. <kar...@kn...> - 2007-05-02 21:50:03
|
Some thoughts about more refactoring to the parser... If you take a look at the parsers after the recent refactoring, it is now more evident that they are quite inefficient. That isn't a problem, since cclib isn't about efficiency, but it would be nice. For example, even something as simple as putting a 'return' statement at the end of each parsing block would speed things up (the following conditions are not evaluated). Anyway, this already suggests that it would be useful to break up the extract() method into pieces, one for each block of parsed output. I've been hovering around this subject for some time, and turning it around in my mind. A dictionary of functions seems appropriate (with regexps or something as keys), and more easy to manage that the current "long" function. I don't think we can do away with the functions, since sometimes pretty complicated operations are done with the parsed output. The problem I see is where to define all these functions (30-40 separate parsed blocks)? How about this: the functions would be defined in a different class, not LogFile. What I'm suggesting, is to separate from the class that represents a parsed log file a class that represents the parser. Currently, they are one. An instance of the parser class would be an attribute of the log file class, say "_parser". This object would hold all the parsing functions and a dict used by the parse() method of LogFile, and any other stuff needed for parsing. An additional advantage is that the parser becomes less visible to the casual user, leaving only parsed attributes in the log file object. Summarizing, I propose two layers of classes: LogFile - subclasses Gaussian, GAMES, ... LogFileParser - subclasses GaussianParser, GAMESSParser, ... The first remains as is (at least for the user), except that everything related to parsing is put in the second. Of course, instances of the latter class should be attributes of the instances of the former. Waiting to hear what you think about this idea, Karol -- written by Karol Langner Thu May 3 01:20:44 CEST 2007 |
From: Karol L. <kar...@kn...> - 2007-05-02 14:48:31
|
On Monday 30 April 2007 08:44, Noel O'Boyle wrote: > I've added charge and mult (multiplicity) to cclib. Any ideas on > better names? Perhaps spin instead of mult, or would that just be > confusing? > > Noel The names are fine by me - 'spin' would be confusing if it were still 2S+1! I added the attributes to the docstring of LogFile and to the wiki. - Karol -- written by Karol Langner Wed May 2 18:46:34 CEST 2007 |
From: Karol L. <kar...@kn...> - 2007-04-30 08:18:55
|
On Monday 30 April 2007 08:54, Noel O'Boyle wrote: > Maybe someone has already suggested this, but here's an idea to handle > updating the "self.updateprogress". > > Every time an attribute is set by __setattr__, it should change the > updateprogress string to the name of the attribute rather than having > to do this in the subclass. Good idea. This doesn't cover the whole range of situations LogFile.updateprogress is used in, though. > Just a note: > > Line 197 of logfileparser: > self.updateprogress(inputfile, "Unsupported information", > cupdate) > > If inputfile and cupdate are attributes of logfileparser, then there's > no need to pass them in to updateprogress. Yup. -Karol -- written by Karol Langner Mon Apr 30 12:16:37 CEST 2007 |
From: Noel O'B. <bao...@gm...> - 2007-04-30 06:54:44
|
Maybe someone has already suggested this, but here's an idea to handle updating the "self.updateprogress". Every time an attribute is set by __setattr__, it should change the updateprogress string to the name of the attribute rather than having to do this in the subclass. Just a note: Line 197 of logfileparser: self.updateprogress(inputfile, "Unsupported information", cupdate) If inputfile and cupdate are attributes of logfileparser, then there's no need to pass them in to updateprogress. Noel |
From: Noel O'B. <bao...@gm...> - 2007-04-30 06:44:34
|
I've added charge and mult (multiplicity) to cclib. Any ideas on better names? Perhaps spin instead of mult, or would that just be confusing? Noel |
From: Noel O'B. <bao...@gm...> - 2007-04-28 10:17:44
|
Nice work! On 28/04/07, Karol Langner <kar...@kn...> wrote: > Sure. I'll just mention the more important changes in names: > > Numeric.matrixmultiply -> numpy.dot > Numeric.outerproduct -> numpy.outer > LinearAlgebra.inverse -> numpy.linalg.inv > Numeric.typecode -> numpy.dtype > > The last was the problematic one, since Numeric.typecode is a function, while > numpy.dtype is a type. > > On Friday 27 April 2007 15:47, Noel O'Boyle wrote: > > Let's leave a few days "cooling off period" to look through the code > > for anything that doesn't seem right. > > > > On 27/04/07, Karol Langner <kar...@kn...> wrote: > > > I changed all references to NumPy, but left the Numeric imports (when > > > importing numpy fails). Some method names had to be changed. All the > > > tests and regressions go fine, although I'm worried we don't have tests > > > for all the methods, since that's where the most Numeric functions are > > > used. > > > > > > -- > > > written by Karol Langner > > > Fri Apr 27 17:36:53 CEST 2007 > > -- > written by Karol Langner > Sat Apr 28 11:36:59 CEST 2007 > |
From: Karol L. <kar...@kn...> - 2007-04-28 07:45:03
|
Sure. I'll just mention the more important changes in names: Numeric.matrixmultiply -> numpy.dot Numeric.outerproduct -> numpy.outer LinearAlgebra.inverse -> numpy.linalg.inv Numeric.typecode -> numpy.dtype The last was the problematic one, since Numeric.typecode is a function, while numpy.dtype is a type. On Friday 27 April 2007 15:47, Noel O'Boyle wrote: > Let's leave a few days "cooling off period" to look through the code > for anything that doesn't seem right. > > On 27/04/07, Karol Langner <kar...@kn...> wrote: > > I changed all references to NumPy, but left the Numeric imports (when > > importing numpy fails). Some method names had to be changed. All the > > tests and regressions go fine, although I'm worried we don't have tests > > for all the methods, since that's where the most Numeric functions are > > used. > > > > -- > > written by Karol Langner > > Fri Apr 27 17:36:53 CEST 2007 -- written by Karol Langner Sat Apr 28 11:36:59 CEST 2007 |
From: Noel O'B. <bao...@gm...> - 2007-04-27 13:47:25
|
Let's leave a few days "cooling off period" to look through the code for anything that doesn't seem right. On 27/04/07, Karol Langner <kar...@kn...> wrote: > I changed all references to NumPy, but left the Numeric imports (when > importing numpy fails). Some method names had to be changed. All the tests > and regressions go fine, although I'm worried we don't have tests for all the > methods, since that's where the most Numeric functions are used. > > - Karol > > -- > written by Karol Langner > Fri Apr 27 17:36:53 CEST 2007 > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > cclib-devel mailing list > ccl...@li... > https://lists.sourceforge.net/lists/listinfo/cclib-devel > |
From: Karol L. <kar...@kn...> - 2007-04-27 13:41:15
|
I changed all references to NumPy, but left the Numeric imports (when importing numpy fails). Some method names had to be changed. All the tests and regressions go fine, although I'm worried we don't have tests for all the methods, since that's where the most Numeric functions are used. - Karol -- written by Karol Langner Fri Apr 27 17:36:53 CEST 2007 |