Re: [cclib-devel] Analysis of scan results

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

One second thoughts it probably makes more sense to use an array of rank 1 to store energies and an array of rank n (where n is the number of variables scanned) to store the scan parameters. Not only does this fit better with the rest of cclib but it also makes for nicer code!

***************************************************************************************

#!/software/languages/python/2.5.2/gnu-4.2.4/bin/python

import sys

filename = sys.argv[1]
scanEng = []
scanParm = []

try:
    inputfile = open(filename, "r")
except:
    print ("file cannot be opened")

for line in inputfile:
    if "Summary of the potential surface scan:" in line:
        colmNames = inputfile.next()
        hyphens = inputfile.next()

        line = inputfile.next()
        while line != hyphens:
            broken = line.split()
            scanEng.append(broken[-1])
            scanParm.append(broken[1:-1])
            line = inputfile.next()

print scaneng, scanparm

***************************************************************************************

Yours

Ed

On 19 Dec 2011, at 16:27, Edward Holland wrote:

> Hi Karol, 
> 
> I think there must have been a problem with the attachments i sent. For ease i'll just include the relevant sections for each case.
> 
> The most simple case in the rigid scan.
> 
> *****************Rigid scan section from gaussian output file**********************
> Summary of the potential surface scan:
>   N       A          SCF
> ----  ---------  -----------
>    1   109.0000    -76.43373
>    2   119.0000    -76.43011
>    3   129.0000    -76.42311
>    4   139.0000    -76.41398
>    5   149.0000    -76.40420
>    6   159.0000    -76.39541
>    7   169.0000    -76.38916
>    8   179.0000    -76.38664
>    9   189.0000    -76.38833
>   10   199.0000    -76.39391
>   11   209.0000    -76.40231
> ----  ---------  -----------
> ***************************************************************************************
> 
> This is the most simple example as it only varies one value. More complex scans have additional columns like A.
> 
> I think this can be parsed fairly simply using code such as (this code will handle an arbitrary number of columns like A)
> 
> ***************************************************************************************
> #!/software/languages/python/2.5.2/gnu-4.2.4/bin/python
> 
> import sys
> 
> filename = sys.argv[1]
> scanres = []
> 
> try:
>    inputfile = open(filename, "r")
> except:
>    print ("file cannot be opened")
> 
> for line in inputfile:
>    if "Summary of the potential surface scan:" in line:
>        colmNames = inputfile.next()
>        hyphens = inputfile.next()
> 
>        line = inputfile.next()
>        while line != hyphens:
>            broken = line.split()
>            broken[1],broken[-1] = broken[-1],broken[1]
>            scanres.append( (broken[1], tuple(broken[2:])) )
>            line = inputfile.next()
> 
> for res in scanres:
>    print res[0] + '\t',
>    for parm in res[1]:
>        print parm+ '\t',
>    print
> ***************************************************************************************
> 
> I haven't written a patch for cclib to do similar as i thought it worthwhile to discuss the data structure used to hold this data. A list of tuples containing a float and a tuple doesn't seem like the most elegant solution!
> 
> The other type of scan is the relaxed scan. 
> 
> *****************Relaxed scan section from gaussian output file**********************
> Summary of Optimized Potential Surface Scan
>                           1         2         3         4         5
>     EIGENVALUES --   -76.43381 -76.43364 -76.43343 -76.43318 -76.43289
>           R1           0.96360   0.96297   0.96250   0.96204   0.96157
>           R2           0.96360   0.96297   0.96250   0.96204   0.96157
>           A1         109.00000 110.00000 111.00000 112.00000 113.00000
>                           6         7         8         9        10
>     EIGENVALUES --   -76.43256 -76.43219 -76.43178 -76.43134 -76.43086
>           R1           0.96112   0.96067   0.96022   0.95978   0.95934
>           R2           0.96112   0.96067   0.96022   0.95978   0.95934
>           A1         114.00000 115.00000 116.00000 117.00000 118.00000
>                          11
>     EIGENVALUES --   -76.43035
>           R1           0.95891
>           R2           0.95891
>           A1         119.00000
> ***************************************************************************************
> 
> As you can see this case will be much more complicated to parse as gaussian outputs the optimised geometries as a series of z matrix variables.  
> 
> My thoughts regarding this is to to cheat somewhat. Instead of trying to convert the z-matrix we could search through the structures already found by cclib to find optimised structures. 
> This is a very "hacky" way to solve this problem and i would be interested to know if you can think of a better way of doing this.
> 
> Yours
> 
> Ed
> 
> 
> On 15 Dec 2011, at 19:28, Karol M. Langner wrote:
> 
>> Hi Ed,
>> 
>> Sorry for not answering sooner, I was bogged down with work.
>> 
>> On Nov 07 2011, Edward Holland wrote:
>>> I managed to run two fairly simple, and representative jobs over the weekend. They aren't very interesting jobs but the correctly illustrate the type of data that will need to be parsed.
>>> The key parts are headed with "Summary of Optimized Potential Surface Scan" or "Summary of the potential surface scan".
>>> as you will be able to see the non optimised jobs will be much simpler to parse as there is no need to store a complete set of geometry data for each step.
>>> One complicated i can see if translating the z matrix variables given by the output into xyz coordinates as normally expected within cclib. 
>> 
>> So, could you provide the output file(s) you would liked parsed for this? With something to work, it oculd be easy to implement what you want.
>> 
>>> Another idea i've had is a boolean that stores if a output file has terminated	at an expected point, or been cut short.
>>> I suspect this would be useful for people hoping to make their job submission scripts more intelligent.
>> 
>> I'm afraid I don't understand this idea, and you need to explain in more detail what you mean.
>> 
>> Best regards,
>> Karol
>> 
>>> On 3 Nov 2011, at 13:08, Noel O'Boyle wrote:
>>> 
>>>> Hi Ed,
>>>> 
>>>> Sounds like an interesting project.
>>>> 
>>>> Regarding the scans, your're right - currently we don't support this.
>>>> Can you provide an example? (Preferably a small test case.) Then we'll
>>>> be able to discuss this a bit more.
>>>> 
>>>> - Noel
>>>> 
>>>> On 3 November 2011 12:44, Edward Holland <hol...@ca...> wrote:
>>>>> Hi all,
>>>>> 
>>>>>      I thought before i got down to the juice of the subject i should introduce myself. I'm Ed Holland i'm currently studying for a phD under Prof Barry K Carpenter in Cardiff on physical organic chemistry. Specifically we are researching novel H transfer reactions in amine radical cations and their applications to renewable energy. I've come across your project and its something i would much like to be involved with in any way i can help. As I am primarily a chemist and my programming skills are not as advanced as i would like but i hope i can see contribute in some way!
>>>>> 
>>>>>      Now down to the real core of the message. I've been running a number of scan jobs recently and i would find it incredibly useful to automate the analysis of such jobs. Currently (as far as i can tell) cclib doesn't support this at all, having had a read through some of the code i think this would be fairly simple to implement for the Gaussian parser at least.
>>>>> 
>>>>>      Has anyone thought about doing this before? Can anyone provide examples of scan jobs from the other programs supported by cclib? Would anyone be willing to discuss the most suitable way of storing this data within the ccData type. I wouldn't want to miss out any conventions i don't know about.
>> 
>> -- 
>> written by Karol M. Langner
>> Thu Dec 15 20:24:17 CET 2011
>