From: Karol L. <kar...@kn...> - 2007-08-13 17:12:53
|
On Saturday 11 August 2007 13:45, Noel O'Boyle wrote: > Chris, > > The Turbomole parser seems to be coming along well. I've added a colm > to the wiki for keeping track of progress: > http://cclib.sourceforge.net/wiki/index.php/Development_parsed_data#Details >_of_current_implementation Yes, I agree. I'd like to expand here on parsing multiple files, since Turbomole output is the prime example for this. After updating the files with the code for parsing multiple output files, I'd like to demonstrate how that works. This is from my working Turbomole branch, with the dvb_sp data files copied in manually... langner@slim:~/cclib/branches/turbomoleparser/src/cclib/parser$ python Python 2.5 (r25:51908, Apr 30 2007, 15:03:13) [GCC 3.4.6 (Debian 3.4.6-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from utils import ccopen All the files were concatenated into turbo.out, I presume... >>> ccopen("turbo.out").parse() [Turbomole turbo.out INFO] Creating attribute natom: 20 [Turbomole turbo.out INFO] Creating attribute aonames[] [Turbomole turbo.out INFO] Creating attribute nbasis: 60 [Turbomole turbo.out INFO] Creating attribute nmo: 60 [Turbomole turbo.out INFO] Creating attribute homos[] [Turbomole turbo.out INFO] Creating attribute aooverlaps[] [Turbomole turbo.out INFO] Creating attribute atomcoords[] [Turbomole turbo.out INFO] Creating attribute atomnos[] [Turbomole turbo.out INFO] Creating attribute moenergies[] [Turbomole turbo.out INFO] Creating attribute mocoeffs[] [Turbomole turbo.out INFO] Creating attribute coreelectrons[] <cclib.data.ccData object at 0xb63a1dec> Passing just the basis stuff doesn't get anything parsed: >>> ccopen("basis").parse() <cclib.data.ccData object at 0xb63a1e6c> To parse two files sequentially, pass them as a list: >>> ccopen(["basis","control"]).parse() [Turbomole ['basis', 'control'] INFO] Creating attribute natom: 20 [Turbomole ['basis', 'control'] INFO] Creating attribute aonames[] [Turbomole ['basis', 'control'] INFO] Creating attribute nbasis: 60 [Turbomole ['basis', 'control'] INFO] Creating attribute nmo: 60 [Turbomole ['basis', 'control'] INFO] Creating attribute homos[] [Turbomole ['basis', 'control'] INFO] Creating attribute coreelectrons[] <cclib.data.ccData object at 0xb63a484c> This will be equivalent to parsing turbo.out in terms of parsed attributes: >>> ccopen(["basis","control","coord","energy","mos"]).parse() ... <cclib.data.ccData object at 0xb63a42cc> By the way, no need to add extra lines for Turbomole to be recognized - ccopen can do it with the condition (line[0] == "$" and line[1].islower()), which is unique for Turbomole at least presently for all the cclib parsers. In my opinion, we should do without concatenated files such as turbo.out, and parse the tests using multiple files where required. What do you think? There are caveats that need to be dealt with - passing the files in the wrong order crashes the parser. And that is equivalent to cancatenating in the wrong order, so it still requires the user to do something right :) >>> ccopen(["control","basis"]).parse() [Turbomole ['control', 'basis'] INFO] Creating attribute natom: 20 [Turbomole ['control', 'basis'] INFO] Creating attribute aonames[] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "logfileparser.py", line 140, in parse self.extract(inputfile, line) File "turbomoleparser.py", line 214, in extract for i in range(0, len(self.basis_lib), 1): AttributeError: 'Turbomole' object has no attribute 'basis_lib' One more issue is how to get ccget to parse multiple files into one data object. What I mean is something that will be equivalent to passing a concatenated (turbo.out, for example) file to ccget. I propose to add an option to ccget that will do that. What do you all think? Cheers, Karol -- written by Karol Langner Mon Aug 13 18:49:58 EDT 2007 |