Thread: [cclib-devel] further parser refactoring

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Some thoughts about more refactoring to the parser...

If you take a look at the parsers after the recent refactoring, it is now more 
evident that they are quite inefficient. That isn't a problem, since cclib 
isn't about efficiency, but it would be nice. For example, even something as 
simple as putting a 'return' statement at the end of each parsing block would 
speed things up (the following conditions are not evaluated). Anyway, this 
already suggests that it would be useful to break up the extract() method 
into pieces, one for each block of parsed output.

I've been hovering around this subject for some time, and turning it around in 
my mind. A dictionary of functions seems appropriate (with regexps or 
something as keys), and more easy to manage that the current "long" function. 
I don't think we can do away with the functions, since sometimes pretty 
complicated operations are done with the parsed output. The problem I see is 
where to define all these functions (30-40 separate parsed blocks)?

How about this: the functions would be defined in a different class, not 
LogFile. What I'm suggesting, is to separate from the class that represents a 
parsed log file a class that represents the parser. Currently, they are one. 
An instance of the parser class would be an attribute of the log file class, 
say "_parser". This object would hold all the parsing functions and a dict 
used by the parse() method of LogFile, and any other stuff needed for 
parsing. An additional advantage is that the parser becomes less visible to 
the casual user, leaving only parsed attributes in the log file object.

Summarizing, I propose two layers of classes:
LogFile - subclasses Gaussian, GAMES, ...
LogFileParser - subclasses GaussianParser, GAMESSParser, ...
The first remains as is (at least for the user), except that everything 
related to parsing is put in the second. Of course, instances of the latter 
class should be attributes of the instances of the former.

Waiting to hear what you think about this idea,
Karol

-- 
written by Karol Langner
Thu May  3 01:20:44 CEST 2007