Re: [cclib-devel] Analysis of scan results

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Karol, 

I think there must have been a problem with the attachments i sent. For ease i'll just include the relevant sections for each case.

The most simple case in the rigid scan.

*****************Rigid scan section from gaussian output file**********************
 Summary of the potential surface scan:
   N       A          SCF
 ----  ---------  -----------
    1   109.0000    -76.43373
    2   119.0000    -76.43011
    3   129.0000    -76.42311
    4   139.0000    -76.41398
    5   149.0000    -76.40420
    6   159.0000    -76.39541
    7   169.0000    -76.38916
    8   179.0000    -76.38664
    9   189.0000    -76.38833
   10   199.0000    -76.39391
   11   209.0000    -76.40231
 ----  ---------  -----------
***************************************************************************************

This is the most simple example as it only varies one value. More complex scans have additional columns like A.

I think this can be parsed fairly simply using code such as (this code will handle an arbitrary number of columns like A)

***************************************************************************************
#!/software/languages/python/2.5.2/gnu-4.2.4/bin/python

import sys

filename = sys.argv[1]
scanres = []

try:
    inputfile = open(filename, "r")
except:
    print ("file cannot be opened")

for line in inputfile:
    if "Summary of the potential surface scan:" in line:
        colmNames = inputfile.next()
        hyphens = inputfile.next()

        line = inputfile.next()
        while line != hyphens:
            broken = line.split()
            broken[1],broken[-1] = broken[-1],broken[1]
            scanres.append( (broken[1], tuple(broken[2:])) )
            line = inputfile.next()

for res in scanres:
    print res[0] + '\t',
    for parm in res[1]:
        print parm+ '\t',
    print
***************************************************************************************

I haven't written a patch for cclib to do similar as i thought it worthwhile to discuss the data structure used to hold this data. A list of tuples containing a float and a tuple doesn't seem like the most elegant solution!

The other type of scan is the relaxed scan. 

*****************Relaxed scan section from gaussian output file**********************
 Summary of Optimized Potential Surface Scan
                           1         2         3         4         5
     EIGENVALUES --   -76.43381 -76.43364 -76.43343 -76.43318 -76.43289
           R1           0.96360   0.96297   0.96250   0.96204   0.96157
           R2           0.96360   0.96297   0.96250   0.96204   0.96157
           A1         109.00000 110.00000 111.00000 112.00000 113.00000
                           6         7         8         9        10
     EIGENVALUES --   -76.43256 -76.43219 -76.43178 -76.43134 -76.43086
           R1           0.96112   0.96067   0.96022   0.95978   0.95934
           R2           0.96112   0.96067   0.96022   0.95978   0.95934
           A1         114.00000 115.00000 116.00000 117.00000 118.00000
                          11
     EIGENVALUES --   -76.43035
           R1           0.95891
           R2           0.95891
           A1         119.00000
***************************************************************************************

As you can see this case will be much more complicated to parse as gaussian outputs the optimised geometries as a series of z matrix variables.  

My thoughts regarding this is to to cheat somewhat. Instead of trying to convert the z-matrix we could search through the structures already found by cclib to find optimised structures. 
This is a very "hacky" way to solve this problem and i would be interested to know if you can think of a better way of doing this.

Yours

Ed

On 15 Dec 2011, at 19:28, Karol M. Langner wrote:

> Hi Ed,
> 
> Sorry for not answering sooner, I was bogged down with work.
> 
> On Nov 07 2011, Edward Holland wrote:
>> I managed to run two fairly simple, and representative jobs over the weekend. They aren't very interesting jobs but the correctly illustrate the type of data that will need to be parsed.
>> The key parts are headed with "Summary of Optimized Potential Surface Scan" or "Summary of the potential surface scan".
>> as you will be able to see the non optimised jobs will be much simpler to parse as there is no need to store a complete set of geometry data for each step.
>> One complicated i can see if translating the z matrix variables given by the output into xyz coordinates as normally expected within cclib. 
> 
> So, could you provide the output file(s) you would liked parsed for this? With something to work, it oculd be easy to implement what you want.
> 
>> Another idea i've had is a boolean that stores if a output file has terminated	at an expected point, or been cut short.
>> I suspect this would be useful for people hoping to make their job submission scripts more intelligent.
> 
> I'm afraid I don't understand this idea, and you need to explain in more detail what you mean.
> 
> Best regards,
> Karol
> 
>> On 3 Nov 2011, at 13:08, Noel O'Boyle wrote:
>> 
>>> Hi Ed,
>>> 
>>> Sounds like an interesting project.
>>> 
>>> Regarding the scans, your're right - currently we don't support this.
>>> Can you provide an example? (Preferably a small test case.) Then we'll
>>> be able to discuss this a bit more.
>>> 
>>> - Noel
>>> 
>>> On 3 November 2011 12:44, Edward Holland <hol...@ca...> wrote:
>>>> Hi all,
>>>> 
>>>>       I thought before i got down to the juice of the subject i should introduce myself. I'm Ed Holland i'm currently studying for a phD under Prof Barry K Carpenter in Cardiff on physical organic chemistry. Specifically we are researching novel H transfer reactions in amine radical cations and their applications to renewable energy. I've come across your project and its something i would much like to be involved with in any way i can help. As I am primarily a chemist and my programming skills are not as advanced as i would like but i hope i can see contribute in some way!
>>>> 
>>>>       Now down to the real core of the message. I've been running a number of scan jobs recently and i would find it incredibly useful to automate the analysis of such jobs. Currently (as far as i can tell) cclib doesn't support this at all, having had a read through some of the code i think this would be fairly simple to implement for the Gaussian parser at least.
>>>> 
>>>>       Has anyone thought about doing this before? Can anyone provide examples of scan jobs from the other programs supported by cclib? Would anyone be willing to discuss the most suitable way of storing this data within the ccData type. I wouldn't want to miss out any conventions i don't know about.
> 
> -- 
> written by Karol M. Langner
> Thu Dec 15 20:24:17 CET 2011