From: Christopher R. <cro...@uo...> - 2007-07-09 18:25:00
|
Hi, I'm interested in using cclib with turbomole. Is there a parser in development? If not, I might have the time to write one and contribute it. Thanks, Christopher Rowley Ph.D Candidate Department of Chemistry University of Ottawa |
From: Noel O'B. <bao...@gm...> - 2007-08-11 17:45:25
|
Chris, The Turbomole parser seems to be coming along well. I've added a colm to the wiki for keeping track of progress: http://cclib.sourceforge.net/wiki/index.php/Development_parsed_data#Details_of_current_implementation Could you send me (off-list) a username and password for the wiki and I'll create an a/c for you. Noel |
From: Karol L. <kar...@kn...> - 2007-08-13 17:12:53
|
On Saturday 11 August 2007 13:45, Noel O'Boyle wrote: > Chris, > > The Turbomole parser seems to be coming along well. I've added a colm > to the wiki for keeping track of progress: > http://cclib.sourceforge.net/wiki/index.php/Development_parsed_data#Details >_of_current_implementation Yes, I agree. I'd like to expand here on parsing multiple files, since Turbomole output is the prime example for this. After updating the files with the code for parsing multiple output files, I'd like to demonstrate how that works. This is from my working Turbomole branch, with the dvb_sp data files copied in manually... langner@slim:~/cclib/branches/turbomoleparser/src/cclib/parser$ python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from utils import ccopen All the files were concatenated into turbo.out, I presume... >>> ccopen("turbo.out").parse() [Turbomole turbo.out INFO] Creating attribute natom: 20 [Turbomole turbo.out INFO] Creating attribute aonames[] [Turbomole turbo.out INFO] Creating attribute nbasis: 60 [Turbomole turbo.out INFO] Creating attribute nmo: 60 [Turbomole turbo.out INFO] Creating attribute homos[] [Turbomole turbo.out INFO] Creating attribute aooverlaps[] [Turbomole turbo.out INFO] Creating attribute atomcoords[] [Turbomole turbo.out INFO] Creating attribute atomnos[] [Turbomole turbo.out INFO] Creating attribute moenergies[] [Turbomole turbo.out INFO] Creating attribute mocoeffs[] [Turbomole turbo.out INFO] Creating attribute coreelectrons[] <cclib.data.ccData object at 0xb63a1dec> Passing just the basis stuff doesn't get anything parsed: >>> ccopen("basis").parse() <cclib.data.ccData object at 0xb63a1e6c> To parse two files sequentially, pass them as a list: >>> ccopen(["basis","control"]).parse() [Turbomole ['basis', 'control'] INFO] Creating attribute natom: 20 [Turbomole ['basis', 'control'] INFO] Creating attribute aonames[] [Turbomole ['basis', 'control'] INFO] Creating attribute nbasis: 60 [Turbomole ['basis', 'control'] INFO] Creating attribute nmo: 60 [Turbomole ['basis', 'control'] INFO] Creating attribute homos[] [Turbomole ['basis', 'control'] INFO] Creating attribute coreelectrons[] <cclib.data.ccData object at 0xb63a484c> This will be equivalent to parsing turbo.out in terms of parsed attributes: >>> ccopen(["basis","control","coord","energy","mos"]).parse() ... <cclib.data.ccData object at 0xb63a42cc> By the way, no need to add extra lines for Turbomole to be recognized - ccopen can do it with the condition (line[0] == "$" and line[1].islower()), which is unique for Turbomole at least presently for all the cclib parsers. In my opinion, we should do without concatenated files such as turbo.out, and parse the tests using multiple files where required. What do you think? There are caveats that need to be dealt with - passing the files in the wrong order crashes the parser. And that is equivalent to cancatenating in the wrong order, so it still requires the user to do something right :) >>> ccopen(["control","basis"]).parse() [Turbomole ['control', 'basis'] INFO] Creating attribute natom: 20 [Turbomole ['control', 'basis'] INFO] Creating attribute aonames[] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "logfileparser.py", line 140, in parse self.extract(inputfile, line) File "turbomoleparser.py", line 214, in extract for i in range(0, len(self.basis_lib), 1): AttributeError: 'Turbomole' object has no attribute 'basis_lib' One more issue is how to get ccget to parse multiple files into one data object. What I mean is something that will be equivalent to passing a concatenated (turbo.out, for example) file to ccget. I propose to add an option to ccget that will do that. What do you all think? Cheers, Karol -- written by Karol Langner Mon Aug 13 18:49:58 EDT 2007 |
From: Christopher R. <cro...@uo...> - 2007-08-13 17:18:07
|
Yes, I really like this solution too. I think it's worthwhile to preserve the merge_turbo feature for now. I think it's actually a preferable way to preserve the output of a job, rather than keeping tarballs of each directory required for a computation, although as you've demonstrated, it's not longer necessary. Chris -----Original Message----- From: ccl...@li... [mailto:ccl...@li...] On Behalf Of Karol Langner Sent: Monday, August 13, 2007 7:09 PM To: ccl...@li... Cc: Noel O'Boyle Subject: [cclib-devel] Parsing multiple files (Re: Turbomole parser) On Saturday 11 August 2007 13:45, Noel O'Boyle wrote: > Chris, > > The Turbomole parser seems to be coming along well. I've added a colm > to the wiki for keeping track of progress: > http://cclib.sourceforge.net/wiki/index.php/Development_parsed_data#Deta ils >_of_current_implementation Yes, I agree. I'd like to expand here on parsing multiple files, since Turbomole output is the prime example for this. After updating the files with the code for parsing multiple output files, I'd like to demonstrate how that works. This is from my working Turbomole branch, with the dvb_sp data files copied in manually... langner@slim:~/cclib/branches/turbomoleparser/src/cclib/parser$ python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from utils import ccopen All the files were concatenated into turbo.out, I presume... >>> ccopen("turbo.out").parse() [Turbomole turbo.out INFO] Creating attribute natom: 20 [Turbomole turbo.out INFO] Creating attribute aonames[] [Turbomole turbo.out INFO] Creating attribute nbasis: 60 [Turbomole turbo.out INFO] Creating attribute nmo: 60 [Turbomole turbo.out INFO] Creating attribute homos[] [Turbomole turbo.out INFO] Creating attribute aooverlaps[] [Turbomole turbo.out INFO] Creating attribute atomcoords[] [Turbomole turbo.out INFO] Creating attribute atomnos[] [Turbomole turbo.out INFO] Creating attribute moenergies[] [Turbomole turbo.out INFO] Creating attribute mocoeffs[] [Turbomole turbo.out INFO] Creating attribute coreelectrons[] <cclib.data.ccData object at 0xb63a1dec> Passing just the basis stuff doesn't get anything parsed: >>> ccopen("basis").parse() <cclib.data.ccData object at 0xb63a1e6c> To parse two files sequentially, pass them as a list: >>> ccopen(["basis","control"]).parse() [Turbomole ['basis', 'control'] INFO] Creating attribute natom: 20 [Turbomole ['basis', 'control'] INFO] Creating attribute aonames[] [Turbomole ['basis', 'control'] INFO] Creating attribute nbasis: 60 [Turbomole ['basis', 'control'] INFO] Creating attribute nmo: 60 [Turbomole ['basis', 'control'] INFO] Creating attribute homos[] [Turbomole ['basis', 'control'] INFO] Creating attribute coreelectrons[] <cclib.data.ccData object at 0xb63a484c> This will be equivalent to parsing turbo.out in terms of parsed attributes: >>> ccopen(["basis","control","coord","energy","mos"]).parse() ... <cclib.data.ccData object at 0xb63a42cc> By the way, no need to add extra lines for Turbomole to be recognized - ccopen can do it with the condition (line[0] == "$" and line[1].islower()), which is unique for Turbomole at least presently for all the cclib parsers. In my opinion, we should do without concatenated files such as turbo.out, and parse the tests using multiple files where required. What do you think? There are caveats that need to be dealt with - passing the files in the wrong order crashes the parser. And that is equivalent to cancatenating in the wrong order, so it still requires the user to do something right :) >>> ccopen(["control","basis"]).parse() [Turbomole ['control', 'basis'] INFO] Creating attribute natom: 20 [Turbomole ['control', 'basis'] INFO] Creating attribute aonames[] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "logfileparser.py", line 140, in parse self.extract(inputfile, line) File "turbomoleparser.py", line 214, in extract for i in range(0, len(self.basis_lib), 1): AttributeError: 'Turbomole' object has no attribute 'basis_lib' One more issue is how to get ccget to parse multiple files into one data object. What I mean is something that will be equivalent to passing a concatenated (turbo.out, for example) file to ccget. I propose to add an option to ccget that will do that. What do you all think? Cheers, Karol -- written by Karol Langner Mon Aug 13 18:49:58 EDT 2007 ------------------------------------------------------------------------ - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ cclib-devel mailing list ccl...@li... https://lists.sourceforge.net/lists/listinfo/cclib-devel |
From: Karol L. <kar...@kn...> - 2007-08-17 16:08:24
|
On Monday 13 August 2007 13:17, Christopher Rowley wrote: > Yes, I really like this solution too. I think it's worthwhile to > preserve the merge_turbo feature for now. I think it's actually a > preferable way to preserve the output of a job, rather than keeping > tarballs of each directory required for a computation, although as > you've demonstrated, it's not longer necessary. > > Chris It doesn't matter for testing purposes how the data is stored if cclib can read both things, but keeping the cclib package small (excluding regression tests) is an important incentive. So when the Turbomole parser goes into the trunk we should not duplicate the data files and keep only one copy (the concatenated file or a directory). Another thought: I bet users will generally provide Turbomole output in the wrong order and break the parser - give the list of files in the wrong order or concatenate the files in the wrong order. In the first case, cclib can potentially fix this by reordering them basing on the file names if they are not changed. If everything is in one file in the wrong order, that is alot harder to do. That is the main advantage I see in choosing not to concatenate if a package provides output in multiple files. Karol -- written by Karol Langner Fri Aug 17 17:48:48 EDT 2007 |
From: Karol L. <kar...@kn...> - 2007-07-09 19:25:49
|
On Monday 09 July 2007 14:24, Christopher Rowley wrote: > Hi, > > I'm interested in using cclib with turbomole. Is there a parser in > development? If not, I might have the time to write one and contribute > it. > > Thanks, > Christopher Rowley > Ph.D Candidate > Department of Chemistry > University of Ottawa Hi Chris! No, there is not Turbomole parser in development presently. You are surely welcome to work on one and contribute, although you need to wait for Noel O'Boyle's reply to your post as he is the main developer here. From my part, I can help a bit with it and generate test files, but not before mid-August when I get back to work. Cheers, Karol -- written by Karol Langner Mon Jul 9 21:21:16 EDT 2007 |
From: Christopher R. <cro...@uo...> - 2007-07-10 01:41:54
|
Ok, I'll try to set aside some time do to it once there's approval. I have a project needs automated CDA analysis with turbomole, so I'd be looking get parser for the MO coefficients working first. I took a look at the other parsers and I don't think it will be that difficult to get turbomole working. The only significant problem is that turbomole doesn't put all its output in a single file. I generally keep a separate directory for each turbomole calculation. The simplest way to do it would be to pass the path of a directory containing the output of a turbomole job to the parser instead of the filename of the output file. This would be inconsistent with all the other parsers, so it's a little unattractive. The other route I can see is to have a separate utility to merge the various turbomole output files into a single output file that could be read in by the parser. Chris -----Original Message----- From: Karol Langner [mailto:kar...@kn...] Sent: Monday, July 09, 2007 9:24 PM To: ccl...@li... Cc: Christopher Rowley Subject: Re: [cclib-devel] Turbomole parser On Monday 09 July 2007 14:24, Christopher Rowley wrote: > Hi, > > I'm interested in using cclib with turbomole. Is there a parser in > development? If not, I might have the time to write one and contribute > it. > > Thanks, > Christopher Rowley > Ph.D Candidate > Department of Chemistry > University of Ottawa Hi Chris! No, there is not Turbomole parser in development presently. You are surely welcome to work on one and contribute, although you need to wait for Noel O'Boyle's reply to your post as he is the main developer here. From my part, I can help a bit with it and generate test files, but not before mid-August when I get back to work. Cheers, Karol -- written by Karol Langner Mon Jul 9 21:21:16 EDT 2007 |
From: Adam T. <a-t...@st...> - 2007-07-11 01:01:18
|
> Ok, I'll try to set aside some time do to it once there's approval. I > have a project needs automated CDA analysis with turbomole, so I'd be > looking get parser for the MO coefficients working first. I too generally focus on MO coeffs, so if you want help, let me know. One of the first steps you could take is to have a look at our basic test datafiles, and start running calculations with those. Specifically, you should run dvb_sp and dvb_un_sp. Both are calculations on di-vinylbenzene, and I believe you can find xyz coordinates in the input files of other calculations. The SP calc is a restricted single-point calc with no net charge and the UN_SP calc is an unrestricted single-point with a positive charge and a multiplicity of 2. > I took a look at the other parsers and I don't think it will be that > difficult to get turbomole working. The only significant problem is > that > turbomole doesn't put all its output in a single file. I generally > keep > a separate directory for each turbomole calculation. The simplest > way to > do it would be to pass the path of a directory containing the > output of > a turbomole job to the parser instead of the filename of the output > file. > > This would be inconsistent with all the other parsers, so it's a > little > unattractive. The other route I can see is to have a separate > utility to > merge the various turbomole output files into a single output file > that > could be read in by the parser. I'd suggest using cat to combine all of the files into one, although this probably isn't the best option for our windows-using friends. Perhaps we should handle zip or tar files (we already handle gz and bzip2, as I recall). Noel, Karol, any comments? I'm willing to help add any logfiles or code to a branch in the svn tree during the next few days, so if you have anything ready, let me know. Adam |
From: Noel O'B. <bao...@gm...> - 2007-07-11 06:53:33
|
Hello Chris, Good to hear from you and welcome to cclib. We welcome all the help we can get so there's really no question about approval. Our only standard is that the code works before we release it, and we have a number of tests that try to ensure this is the case. On 11/07/07, Adam Tenderholt <a-t...@st...> wrote: > > Ok, I'll try to set aside some time do to it once there's approval. I > > have a project needs automated CDA analysis with turbomole, so I'd be > > looking get parser for the MO coefficients working first. > > I too generally focus on MO coeffs, so if you want help, let me know. > One of the first steps you could take is to have a look at our basic > test datafiles, and start running calculations with those. > Specifically, you should run dvb_sp and dvb_un_sp. Both are > calculations on di-vinylbenzene, and I believe you can find xyz > coordinates in the input files of other calculations. The SP calc is > a restricted single-point calc with no net charge and the UN_SP calc > is an unrestricted single-point with a positive charge and a > multiplicity of 2. > > > I took a look at the other parsers and I don't think it will be that > > difficult to get turbomole working. The only significant problem is > > that > > turbomole doesn't put all its output in a single file. I generally > > keep > > a separate directory for each turbomole calculation. The simplest > > way to > > do it would be to pass the path of a directory containing the > > output of > > a turbomole job to the parser instead of the filename of the output > > file. > > > > This would be inconsistent with all the other parsers, so it's a > > little > > unattractive. The other route I can see is to have a separate > > utility to > > merge the various turbomole output files into a single output file > > that > > could be read in by the parser. > > I'd suggest using cat to combine all of the files into one, although > this probably isn't the best option for our windows-using friends. > Perhaps we should handle zip or tar files (we already handle gz and > bzip2, as I recall). Noel, Karol, any comments? I'm willing to help > add any logfiles or code to a branch in the svn tree during the next > few days, so if you have anything ready, let me know. Perhaps Chris, you could describe in detail the typical output from a Turbomole calculation; that is, what files are created, what do they contain (in general terms), are they ASCII files or binary. It might make sense if you send to this list a .zip file of results of a small example calculation. One possibility is that we have several parsers for several files. At first this may seem messy, but we are moving towards separating the parsers and the results, and it will be trivial for the user to add the results together. But let's take a look at the actual output first before we think about this too much. Some bookkeeping now. Can you create an account on SourceForge and send me the name? I will need this to make you a developer. > Adam > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > cclib-devel mailing list > ccl...@li... > https://lists.sourceforge.net/lists/listinfo/cclib-devel > |
From: Karol L. <kar...@kn...> - 2007-07-11 07:08:26
|
On Wednesday 11 July 2007 02:53, Noel O'Boyle wrote: > > > I took a look at the other parsers and I don't think it will be that > > > difficult to get turbomole working. The only significant problem is > > > that > > > turbomole doesn't put all its output in a single file. I generally > > > keep > > > a separate directory for each turbomole calculation. The simplest > > > way to > > > do it would be to pass the path of a directory containing the > > > output of > > > a turbomole job to the parser instead of the filename of the output > > > file. > > > > > > This would be inconsistent with all the other parsers, so it's a > > > little > > > unattractive. The other route I can see is to have a separate > > > utility to > > > merge the various turbomole output files into a single output file > > > that > > > could be read in by the parser. > > > > I'd suggest using cat to combine all of the files into one, although > > this probably isn't the best option for our windows-using friends. > > Perhaps we should handle zip or tar files (we already handle gz and > > bzip2, as I recall). Noel, Karol, any comments? I'm willing to help > > add any logfiles or code to a branch in the svn tree during the next > > few days, so if you have anything ready, let me know. > > Perhaps Chris, you could describe in detail the typical output from a > Turbomole calculation; that is, what files are created, what do they > contain (in general terms), are they ASCII files or binary. It might > make sense if you send to this list a .zip file of results of a small > example calculation. > > One possibility is that we have several parsers for several files. At > first this may seem messy, but we are moving towards separating the > parsers and the results, and it will be trivial for the user to add > the results together. But let's take a look at the actual output first > before we think about this too much. A comment. There already is a 'support parsing multiple log files' point on the wiki progress page, and this problem is not specific to Turbomole, since both ADF and GAMESS have two or more output files. In terms of programming this is not a big problem: pass more arguments to the parser, iterate over them. There will be some logistical dangers, though. The order of the files will matter and some information can be duplicated. In GAMESS, for instance, the .dat file also contains MO coefficients but with higher precision than the .out file, which is another advantage of doing this... -- written by Karol Langner Wed Jul 11 08:58:25 EDT 2007 |
From: Christopher R. <cro...@uo...> - 2007-07-11 16:29:25
|
I think this would make sense, but there are a couple of complications: There is the option within turbomole to use non-standard filenames for some of the output files, but I think it's ok to ignore this situation. Also, turbomole could have as many as 9 files that need to be read, so we'd have to pass the parser a large number of arguments. In turbomole, the file names of the other files related to the calculation are lines in the control file. In principle, the parser could be passed the path of the control file, and then other files are read as necessary based on it. The only option that wouldn't require modifications to cclib is to merge the turbomole output files first and then run the parser on the whole thing. At least for the time being, I'm inclined to use that. -----Original Message----- From: ccl...@li... [mailto:ccl...@li...] On Behalf Of Karol Langner Sent: Wednesday, July 11, 2007 9:06 AM To: ccl...@li... Cc: Noel O'Boyle; Adam Tenderholt Subject: Re: [cclib-devel] Turbomole parser On Wednesday 11 July 2007 02:53, Noel O'Boyle wrote: > > > I took a look at the other parsers and I don't think it will be that > > > difficult to get turbomole working. The only significant problem is > > > that > > > turbomole doesn't put all its output in a single file. I generally > > > keep > > > a separate directory for each turbomole calculation. The simplest > > > way to > > > do it would be to pass the path of a directory containing the > > > output of > > > a turbomole job to the parser instead of the filename of the output > > > file. > > > > > > This would be inconsistent with all the other parsers, so it's a > > > little > > > unattractive. The other route I can see is to have a separate > > > utility to > > > merge the various turbomole output files into a single output file > > > that > > > could be read in by the parser. > > > > I'd suggest using cat to combine all of the files into one, although > > this probably isn't the best option for our windows-using friends. > > Perhaps we should handle zip or tar files (we already handle gz and > > bzip2, as I recall). Noel, Karol, any comments? I'm willing to help > > add any logfiles or code to a branch in the svn tree during the next > > few days, so if you have anything ready, let me know. > > Perhaps Chris, you could describe in detail the typical output from a > Turbomole calculation; that is, what files are created, what do they > contain (in general terms), are they ASCII files or binary. It might > make sense if you send to this list a .zip file of results of a small > example calculation. > > One possibility is that we have several parsers for several files. At > first this may seem messy, but we are moving towards separating the > parsers and the results, and it will be trivial for the user to add > the results together. But let's take a look at the actual output first > before we think about this too much. A comment. There already is a 'support parsing multiple log files' point on the wiki progress page, and this problem is not specific to Turbomole, since both ADF and GAMESS have two or more output files. In terms of programming this is not a big problem: pass more arguments to the parser, iterate over them. There will be some logistical dangers, though. The order of the files will matter and some information can be duplicated. In GAMESS, for instance, the .dat file also contains MO coefficients but with higher precision than the .out file, which is another advantage of doing this... -- written by Karol Langner Wed Jul 11 08:58:25 EDT 2007 ------------------------------------------------------------------------ - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ cclib-devel mailing list ccl...@li... https://lists.sourceforge.net/lists/listinfo/cclib-devel |