[cclib-devel] parser-refactoring revisited

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

 I've already started, but I'd like to shortly summarize the changes I'm about 
to do in the trunk related to refactoring the parser, in the order they will 
proceed. Ideally, these changes won't break anything, but you never know, so 
be on the lookout! The goal is to get certain things working automatically 
and to move as many elements of the parsing process to the generic class 
LogFile, which in turn is supposed to make writing and modifying the child 
parsers easier in the future.

1) Move all calls to logger.infio() for newly created attributes to the method 
LogFile.__setattr__() - which is called automatically when an attibute is 
set. Prints the log message only if the attribute is new and present in 
LogFile.attrlist.

2) Up till now, we refrained from setting temporary attributes while parsing 
in order to keep things tidy. By utilizing an additional list attribute 
("__nodelete"), these can be used (alot, if neede), and will be deleted after 
parsing (so the user does not see that they existed). Interestingly, this 
also requires that we keep LogFile.attrlist up-to-date :-)

3) THE MAJOR CHANGE - moving the "for line in inputfile" loop to 
LogFile.parse(). This may seem like just moving one line, but there are some 
extra things to change in order to keep things parsing (like using temporary 
attributes), and a number of things to consider. Note a paradigm shift in the 
way the parsers work: the method extract() will be now called many times 
instead of once. In my opinion, this is a natural step that makes the parsers 
more flexible and opens interesting possibilities for further enhacements,

4) Making fupdate and cupdate attributes of the class - this is reasonable, as 
then they won't have to be passed around to extract() in each call.

5) Numeric->numpy transition. This isn't strictly related to the refactoring, 
but it might as well be done at once. The related changes seem to be trivial.

6) A few other minor changes I might not remember at the moment, but will 
comment on.

Any further changes to the parser structure really need to be discussed 
thouroughly so as not to waste development time. For example, as mentioned by 
Noel, we could choose to use a dictionary of functions that parse separate 
blocks of output (best if the keys are regexps). I like the idea, and the 
present changes are a step in that direction. It would be pretty, optimal, 
and well-structured, which all makes writing new parses easier and more fun. 
Then main question is - where would all these functions be defined? If they 
are to be class methods, things can get messy, since there will be quite a 
lot of them. But first things first.

Hope this clears up the up-coming commits,
Karol

-- 
written by Karol Langner
Thu Apr 26 01:24:12 CEST 2007