Thread: [cclib-devel] Turbomole parser

Brought to you by: atenderholt, baoilleach, langner

This project can now be found here.

cclib-devel

[cclib-devel] Turbomole parser

From: Christopher R. <cro...@uo...> - 2007-07-09 18:25:00

Hi,
 
I'm interested in using cclib with turbomole. Is there a parser in
development? If not, I might have the time to write one and contribute
it.
 
Thanks,
Christopher Rowley
Ph.D Candidate
Department of Chemistry
University of Ottawa

[cclib-devel] Turbomole parser

From: Noel O'B. <bao...@gm...> - 2007-08-11 17:45:25

Chris,

The Turbomole parser seems to be coming along well. I've added a colm
to the wiki for keeping track of progress:
http://cclib.sourceforge.net/wiki/index.php/Development_parsed_data#Details_of_current_implementation

Could you send me (off-list) a username and password for the wiki and
I'll create an a/c for you.

Noel

[cclib-devel] Parsing multiple files (Re: Turbomole parser)

From: Karol L. <kar...@kn...> - 2007-08-13 17:12:53

On Saturday 11 August 2007 13:45, Noel O'Boyle wrote:
> Chris,
>
> The Turbomole parser seems to be coming along well. I've added a colm
> to the wiki for keeping track of progress:
> http://cclib.sourceforge.net/wiki/index.php/Development_parsed_data#Details
>_of_current_implementation

Yes, I agree. I'd like to expand here on parsing multiple files, since 
Turbomole output is the prime example for this. After updating the files with 
the code for parsing multiple output files, I'd like to demonstrate how that 
works. This is from my working Turbomole branch, with the dvb_sp data files 
copied in manually...

langner@slim:~/cclib/branches/turbomoleparser/src/cclib/parser$ python
Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
[GCC 3.4.6 (Debian 3.4.6-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from utils import ccopen

All the files were concatenated into turbo.out, I presume...
>>> ccopen("turbo.out").parse()
[Turbomole turbo.out INFO] Creating attribute natom: 20
[Turbomole turbo.out INFO] Creating attribute aonames[]
[Turbomole turbo.out INFO] Creating attribute nbasis: 60
[Turbomole turbo.out INFO] Creating attribute nmo: 60
[Turbomole turbo.out INFO] Creating attribute homos[]
[Turbomole turbo.out INFO] Creating attribute aooverlaps[]
[Turbomole turbo.out INFO] Creating attribute atomcoords[]
[Turbomole turbo.out INFO] Creating attribute atomnos[]
[Turbomole turbo.out INFO] Creating attribute moenergies[]
[Turbomole turbo.out INFO] Creating attribute mocoeffs[]
[Turbomole turbo.out INFO] Creating attribute coreelectrons[]
<cclib.data.ccData object at 0xb63a1dec>

Passing just the basis stuff doesn't get anything parsed:
>>> ccopen("basis").parse()
<cclib.data.ccData object at 0xb63a1e6c>

To parse two files sequentially, pass them as a list:
>>> ccopen(["basis","control"]).parse()
[Turbomole ['basis', 'control'] INFO] Creating attribute natom: 20
[Turbomole ['basis', 'control'] INFO] Creating attribute aonames[]
[Turbomole ['basis', 'control'] INFO] Creating attribute nbasis: 60
[Turbomole ['basis', 'control'] INFO] Creating attribute nmo: 60
[Turbomole ['basis', 'control'] INFO] Creating attribute homos[]
[Turbomole ['basis', 'control'] INFO] Creating attribute coreelectrons[]
<cclib.data.ccData object at 0xb63a484c>

This will be equivalent to parsing turbo.out in terms of parsed attributes:
>>> ccopen(["basis","control","coord","energy","mos"]).parse()
	...
<cclib.data.ccData object at 0xb63a42cc>

By the way, no need to add extra lines for Turbomole to be recognized - ccopen 
can do it with the condition (line[0] == "$" and line[1].islower()), which is 
unique for Turbomole at least presently for all the cclib parsers.

In my opinion, we should do without concatenated files such as turbo.out, and 
parse the tests using multiple files where required. What do you think?

There are caveats that need to be dealt with - passing the files in the wrong 
order crashes the parser. And that is equivalent to cancatenating in the 
wrong order, so it still requires the user to do something right :)
>>> ccopen(["control","basis"]).parse()
[Turbomole ['control', 'basis'] INFO] Creating attribute natom: 20
[Turbomole ['control', 'basis'] INFO] Creating attribute aonames[]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "logfileparser.py", line 140, in parse
    self.extract(inputfile, line)
  File "turbomoleparser.py", line 214, in extract
    for i in range(0, len(self.basis_lib), 1):
AttributeError: 'Turbomole' object has no attribute 'basis_lib'

One more issue is how to get ccget to parse multiple files into one data 
object. What I mean is something that will be equivalent to passing a 
concatenated (turbo.out, for example) file to ccget. I propose to add an 
option to ccget that will do that. What do you all think?

Cheers,
Karol

-- 
written by Karol Langner
Mon Aug 13 18:49:58 EDT 2007

Re: [cclib-devel] Parsing multiple files (Re: Turbomole parser)

From: Christopher R. <cro...@uo...> - 2007-08-13 17:18:07

Yes, I really like this solution too. I think it's worthwhile to
preserve the merge_turbo feature for now. I think it's actually a
preferable way to preserve the output of a job, rather than keeping
tarballs of each directory required for a computation, although as
you've demonstrated, it's not longer necessary.

Chris

-----Original Message-----
From: ccl...@li...
[mailto:ccl...@li...] On Behalf Of Karol
Langner
Sent: Monday, August 13, 2007 7:09 PM
To: ccl...@li...
Cc: Noel O'Boyle
Subject: [cclib-devel] Parsing multiple files (Re: Turbomole parser)

On Saturday 11 August 2007 13:45, Noel O'Boyle wrote:
> Chris,
>
> The Turbomole parser seems to be coming along well. I've added a colm
> to the wiki for keeping track of progress:
>
http://cclib.sourceforge.net/wiki/index.php/Development_parsed_data#Deta
ils
>_of_current_implementation

Yes, I agree. I'd like to expand here on parsing multiple files, since 
Turbomole output is the prime example for this. After updating the files
with 
the code for parsing multiple output files, I'd like to demonstrate how
that 
works. This is from my working Turbomole branch, with the dvb_sp data
files 
copied in manually...

langner@slim:~/cclib/branches/turbomoleparser/src/cclib/parser$ python
Python 2.5 (r25:51908, Apr 30 2007, 15:03:13)
[GCC 3.4.6 (Debian 3.4.6-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from utils import ccopen

All the files were concatenated into turbo.out, I presume...
>>> ccopen("turbo.out").parse()
[Turbomole turbo.out INFO] Creating attribute natom: 20
[Turbomole turbo.out INFO] Creating attribute aonames[]
[Turbomole turbo.out INFO] Creating attribute nbasis: 60
[Turbomole turbo.out INFO] Creating attribute nmo: 60
[Turbomole turbo.out INFO] Creating attribute homos[]
[Turbomole turbo.out INFO] Creating attribute aooverlaps[]
[Turbomole turbo.out INFO] Creating attribute atomcoords[]
[Turbomole turbo.out INFO] Creating attribute atomnos[]
[Turbomole turbo.out INFO] Creating attribute moenergies[]
[Turbomole turbo.out INFO] Creating attribute mocoeffs[]
[Turbomole turbo.out INFO] Creating attribute coreelectrons[]
<cclib.data.ccData object at 0xb63a1dec>

Passing just the basis stuff doesn't get anything parsed:
>>> ccopen("basis").parse()
<cclib.data.ccData object at 0xb63a1e6c>

To parse two files sequentially, pass them as a list:
>>> ccopen(["basis","control"]).parse()
[Turbomole ['basis', 'control'] INFO] Creating attribute natom: 20
[Turbomole ['basis', 'control'] INFO] Creating attribute aonames[]
[Turbomole ['basis', 'control'] INFO] Creating attribute nbasis: 60
[Turbomole ['basis', 'control'] INFO] Creating attribute nmo: 60
[Turbomole ['basis', 'control'] INFO] Creating attribute homos[]
[Turbomole ['basis', 'control'] INFO] Creating attribute coreelectrons[]
<cclib.data.ccData object at 0xb63a484c>

This will be equivalent to parsing turbo.out in terms of parsed
attributes:
>>> ccopen(["basis","control","coord","energy","mos"]).parse()
	...
<cclib.data.ccData object at 0xb63a42cc>

By the way, no need to add extra lines for Turbomole to be recognized -
ccopen 
can do it with the condition (line[0] == "$" and line[1].islower()),
which is 
unique for Turbomole at least presently for all the cclib parsers.

In my opinion, we should do without concatenated files such as
turbo.out, and 
parse the tests using multiple files where required. What do you think?

There are caveats that need to be dealt with - passing the files in the
wrong 
order crashes the parser. And that is equivalent to cancatenating in the

wrong order, so it still requires the user to do something right :)
>>> ccopen(["control","basis"]).parse()
[Turbomole ['control', 'basis'] INFO] Creating attribute natom: 20
[Turbomole ['control', 'basis'] INFO] Creating attribute aonames[]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "logfileparser.py", line 140, in parse
    self.extract(inputfile, line)
  File "turbomoleparser.py", line 214, in extract
    for i in range(0, len(self.basis_lib), 1):
AttributeError: 'Turbomole' object has no attribute 'basis_lib'

One more issue is how to get ccget to parse multiple files into one data

object. What I mean is something that will be equivalent to passing a 
concatenated (turbo.out, for example) file to ccget. I propose to add an

option to ccget that will do that. What do you all think?

Cheers,
Karol

-- 
written by Karol Langner
Mon Aug 13 18:49:58 EDT 2007

------------------------------------------------------------------------
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
cclib-devel mailing list
ccl...@li...
https://lists.sourceforge.net/lists/listinfo/cclib-devel

Re: [cclib-devel] Parsing multiple files (Re: Turbomole parser)

From: Karol L. <kar...@kn...> - 2007-08-17 16:08:24

On Monday 13 August 2007 13:17, Christopher Rowley wrote:
> Yes, I really like this solution too. I think it's worthwhile to
> preserve the merge_turbo feature for now. I think it's actually a
> preferable way to preserve the output of a job, rather than keeping
> tarballs of each directory required for a computation, although as
> you've demonstrated, it's not longer necessary.
>
> Chris

It doesn't matter for testing purposes how the data is stored if cclib can 
read both things, but keeping the cclib package small (excluding regression 
tests) is an important incentive. So when the Turbomole parser goes into the 
trunk we should not duplicate the data files and keep only one copy (the 
concatenated file or a directory).

Another thought: I bet users will generally provide Turbomole output in the 
wrong order and break the parser - give the list of files in the wrong order 
or concatenate the files in the wrong order. In the first case, cclib can 
potentially fix this by reordering them basing on the file names if they are 
not changed. If everything is in one file in the wrong order, that is alot 
harder to do. That is the main advantage I see in choosing not to concatenate 
if a package provides output in multiple files.

Karol

-- 
written by Karol Langner
Fri Aug 17 17:48:48 EDT 2007

Re: [cclib-devel] Turbomole parser

From: Karol L. <kar...@kn...> - 2007-07-09 19:25:49

On Monday 09 July 2007 14:24, Christopher Rowley wrote:
> Hi,
>
> I'm interested in using cclib with turbomole. Is there a parser in
> development? If not, I might have the time to write one and contribute
> it.
>
> Thanks,
> Christopher Rowley
> Ph.D Candidate
> Department of Chemistry
> University of Ottawa

Hi Chris!

 No, there is not Turbomole parser in development presently. You are surely 
welcome to work on one and contribute, although you need to wait for Noel 
O'Boyle's reply to your post as he is the main developer here. From my part, 
I can help a bit with it and generate test files, but not before mid-August 
when I get back to work.

Cheers,
Karol

-- 
written by Karol Langner
Mon Jul  9 21:21:16 EDT 2007

Re: [cclib-devel] Turbomole parser

From: Christopher R. <cro...@uo...> - 2007-07-10 01:41:54

Ok, I'll try to set aside some time do to it once there's approval. I
have a project needs automated CDA analysis with turbomole, so I'd be
looking get parser for the MO coefficients working first.

I took a look at the other parsers and I don't think it will be that
difficult to get turbomole working. The only significant problem is that
turbomole doesn't put all its output in a single file. I generally keep
a separate directory for each turbomole calculation. The simplest way to
do it would be to pass the path of a directory containing the output of
a turbomole job to the parser instead of the filename of the output
file.

This would be inconsistent with all the other parsers, so it's a little
unattractive. The other route I can see is to have a separate utility to
merge the various turbomole output files into a single output file that
could be read in by the parser.

Chris

-----Original Message-----
From: Karol Langner [mailto:kar...@kn...] 
Sent: Monday, July 09, 2007 9:24 PM
To: ccl...@li...
Cc: Christopher Rowley
Subject: Re: [cclib-devel] Turbomole parser

On Monday 09 July 2007 14:24, Christopher Rowley wrote:
> Hi,
>
> I'm interested in using cclib with turbomole. Is there a parser in
> development? If not, I might have the time to write one and contribute
> it.
>
> Thanks,
> Christopher Rowley
> Ph.D Candidate
> Department of Chemistry
> University of Ottawa

Hi Chris!

 No, there is not Turbomole parser in development presently. You are
surely 
welcome to work on one and contribute, although you need to wait for
Noel 
O'Boyle's reply to your post as he is the main developer here. From my
part, 
I can help a bit with it and generate test files, but not before
mid-August 
when I get back to work.

Cheers,
Karol

-- 
written by Karol Langner
Mon Jul  9 21:21:16 EDT 2007

Re: [cclib-devel] Turbomole parser

From: Adam T. <a-t...@st...> - 2007-07-11 01:01:18

> Ok, I'll try to set aside some time do to it once there's approval. I
> have a project needs automated CDA analysis with turbomole, so I'd be
> looking get parser for the MO coefficients working first.

I too generally focus on MO coeffs, so if you want help, let me know.  
One of the first steps you could take is to have a look at our basic  
test datafiles, and start running calculations with those.  
Specifically, you should run dvb_sp and dvb_un_sp. Both are  
calculations on di-vinylbenzene, and I believe you can find xyz  
coordinates in the input files of other calculations. The SP calc is  
a restricted single-point calc with no net charge and the UN_SP calc  
is an unrestricted single-point with a positive charge and a  
multiplicity of 2.

> I took a look at the other parsers and I don't think it will be that
> difficult to get turbomole working. The only significant problem is  
> that
> turbomole doesn't put all its output in a single file. I generally  
> keep
> a separate directory for each turbomole calculation. The simplest  
> way to
> do it would be to pass the path of a directory containing the  
> output of
> a turbomole job to the parser instead of the filename of the output
> file.
>
> This would be inconsistent with all the other parsers, so it's a  
> little
> unattractive. The other route I can see is to have a separate  
> utility to
> merge the various turbomole output files into a single output file  
> that
> could be read in by the parser.

I'd suggest using cat to combine all of the files into one, although  
this probably isn't the best option for our windows-using friends.  
Perhaps we should handle zip or tar files (we already handle gz and  
bzip2, as I recall). Noel, Karol, any comments? I'm willing to help  
add any logfiles or code to a branch in the svn tree during the next  
few days, so if you have anything ready, let me know.

Adam

Re: [cclib-devel] Turbomole parser

From: Noel O'B. <bao...@gm...> - 2007-07-11 06:53:33

Hello Chris,

Good to hear from you and welcome to cclib. We welcome all the help we
can get so there's really no question about approval. Our only
standard is that the code works before we release it, and we have a
number of tests that try to ensure this is the case.

On 11/07/07, Adam Tenderholt <a-t...@st...> wrote:
> > Ok, I'll try to set aside some time do to it once there's approval. I
> > have a project needs automated CDA analysis with turbomole, so I'd be
> > looking get parser for the MO coefficients working first.
>
> I too generally focus on MO coeffs, so if you want help, let me know.
> One of the first steps you could take is to have a look at our basic
> test datafiles, and start running calculations with those.
> Specifically, you should run dvb_sp and dvb_un_sp. Both are
> calculations on di-vinylbenzene, and I believe you can find xyz
> coordinates in the input files of other calculations. The SP calc is
> a restricted single-point calc with no net charge and the UN_SP calc
> is an unrestricted single-point with a positive charge and a
> multiplicity of 2.
>
> > I took a look at the other parsers and I don't think it will be that
> > difficult to get turbomole working. The only significant problem is
> > that
> > turbomole doesn't put all its output in a single file. I generally
> > keep
> > a separate directory for each turbomole calculation. The simplest
> > way to
> > do it would be to pass the path of a directory containing the
> > output of
> > a turbomole job to the parser instead of the filename of the output
> > file.
> >
> > This would be inconsistent with all the other parsers, so it's a
> > little
> > unattractive. The other route I can see is to have a separate
> > utility to
> > merge the various turbomole output files into a single output file
> > that
> > could be read in by the parser.
>
> I'd suggest using cat to combine all of the files into one, although
> this probably isn't the best option for our windows-using friends.
> Perhaps we should handle zip or tar files (we already handle gz and
> bzip2, as I recall). Noel, Karol, any comments? I'm willing to help
> add any logfiles or code to a branch in the svn tree during the next
> few days, so if you have anything ready, let me know.

Perhaps Chris, you could describe in detail the typical output from a
Turbomole calculation; that is, what files are created, what do they
contain (in general terms), are they ASCII files or binary. It might
make sense if you send to this list a .zip file of results of a small
example calculation.

One possibility is that we have several parsers for several files. At
first this may seem messy, but we are moving towards separating the
parsers and the results, and it will be trivial for the user to add
the results together. But let's take a look at the actual output first
before we think about this too much.

Some bookkeeping now. Can you create an account on SourceForge and
send me the name? I will need this to make you a developer.

> Adam
>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> cclib-devel mailing list
> ccl...@li...
> https://lists.sourceforge.net/lists/listinfo/cclib-devel
>

Re: [cclib-devel] Turbomole parser

From: Karol L. <kar...@kn...> - 2007-07-11 07:08:26

On Wednesday 11 July 2007 02:53, Noel O'Boyle wrote:
> > > I took a look at the other parsers and I don't think it will be that
> > > difficult to get turbomole working. The only significant problem is
> > > that
> > > turbomole doesn't put all its output in a single file. I generally
> > > keep
> > > a separate directory for each turbomole calculation. The simplest
> > > way to
> > > do it would be to pass the path of a directory containing the
> > > output of
> > > a turbomole job to the parser instead of the filename of the output
> > > file.
> > >
> > > This would be inconsistent with all the other parsers, so it's a
> > > little
> > > unattractive. The other route I can see is to have a separate
> > > utility to
> > > merge the various turbomole output files into a single output file
> > > that
> > > could be read in by the parser.
> >
> > I'd suggest using cat to combine all of the files into one, although
> > this probably isn't the best option for our windows-using friends.
> > Perhaps we should handle zip or tar files (we already handle gz and
> > bzip2, as I recall). Noel, Karol, any comments? I'm willing to help
> > add any logfiles or code to a branch in the svn tree during the next
> > few days, so if you have anything ready, let me know.
>
> Perhaps Chris, you could describe in detail the typical output from a
> Turbomole calculation; that is, what files are created, what do they
> contain (in general terms), are they ASCII files or binary. It might
> make sense if you send to this list a .zip file of results of a small
> example calculation.
>
> One possibility is that we have several parsers for several files. At
> first this may seem messy, but we are moving towards separating the
> parsers and the results, and it will be trivial for the user to add
> the results together. But let's take a look at the actual output first
> before we think about this too much.

A comment. There already is a 'support parsing multiple log files' point on 
the wiki progress page, and this problem is not specific to Turbomole, since 
both ADF and GAMESS have two or more output files. In terms of programming 
this is not a big problem: pass more arguments to the parser, iterate over 
them. There will be some logistical dangers, though. The order of the files 
will matter and some information can be duplicated. In GAMESS, for instance, 
the .dat file also contains MO coefficients but with higher precision than 
the .out file, which is another advantage of doing this...

-- 
written by Karol Langner
Wed Jul 11 08:58:25 EDT 2007

Re: [cclib-devel] Turbomole parser

From: Christopher R. <cro...@uo...> - 2007-07-11 16:29:25

I think this would make sense, but there are a couple of complications:
There is the option within turbomole to use non-standard filenames for
some of the output files, but I think it's ok to ignore this situation.
Also, turbomole could have as many as 9 files that need to be read, so
we'd have to pass the parser a large number of arguments.

In turbomole, the file names of the other files related to the
calculation are lines in the control file. In principle, the parser
could be passed the path of the control file, and then other files are
read as necessary based on it. 

The only option that wouldn't require modifications to cclib is to merge
the turbomole output files first and then run the parser on the whole
thing. At least for the time being, I'm inclined to use that.

-----Original Message-----
From: ccl...@li...
[mailto:ccl...@li...] On Behalf Of Karol
Langner
Sent: Wednesday, July 11, 2007 9:06 AM
To: ccl...@li...
Cc: Noel O'Boyle; Adam Tenderholt
Subject: Re: [cclib-devel] Turbomole parser

On Wednesday 11 July 2007 02:53, Noel O'Boyle wrote:
> > > I took a look at the other parsers and I don't think it will be
that
> > > difficult to get turbomole working. The only significant problem
is
> > > that
> > > turbomole doesn't put all its output in a single file. I generally
> > > keep
> > > a separate directory for each turbomole calculation. The simplest
> > > way to
> > > do it would be to pass the path of a directory containing the
> > > output of
> > > a turbomole job to the parser instead of the filename of the
output
> > > file.
> > >
> > > This would be inconsistent with all the other parsers, so it's a
> > > little
> > > unattractive. The other route I can see is to have a separate
> > > utility to
> > > merge the various turbomole output files into a single output file
> > > that
> > > could be read in by the parser.
> >
> > I'd suggest using cat to combine all of the files into one, although
> > this probably isn't the best option for our windows-using friends.
> > Perhaps we should handle zip or tar files (we already handle gz and
> > bzip2, as I recall). Noel, Karol, any comments? I'm willing to help
> > add any logfiles or code to a branch in the svn tree during the next
> > few days, so if you have anything ready, let me know.
>
> Perhaps Chris, you could describe in detail the typical output from a
> Turbomole calculation; that is, what files are created, what do they
> contain (in general terms), are they ASCII files or binary. It might
> make sense if you send to this list a .zip file of results of a small
> example calculation.
>
> One possibility is that we have several parsers for several files. At
> first this may seem messy, but we are moving towards separating the
> parsers and the results, and it will be trivial for the user to add
> the results together. But let's take a look at the actual output first
> before we think about this too much.

A comment. There already is a 'support parsing multiple log files' point
on 
the wiki progress page, and this problem is not specific to Turbomole,
since 
both ADF and GAMESS have two or more output files. In terms of
programming 
this is not a big problem: pass more arguments to the parser, iterate
over 
them. There will be some logistical dangers, though. The order of the
files 
will matter and some information can be duplicated. In GAMESS, for
instance, 
the .dat file also contains MO coefficients but with higher precision
than 
the .out file, which is another advantage of doing this...

-- 
written by Karol Langner
Wed Jul 11 08:58:25 EDT 2007

------------------------------------------------------------------------
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
cclib-devel mailing list
ccl...@li...
https://lists.sourceforge.net/lists/listinfo/cclib-devel