Re: [cclib-devel] Refactoring

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Friday 02 of February 2007 12:57, you wrote:
> On 02/02/07, Karol Langner <kar...@kn...> wrote:
> > Thanks... it might be worthwhile to move some more things to
> > LogFile.pares, but I need your input on these.
> >
> > 1)  The stuff before the loop (inputfile.seek, initialize self.progress,
> > etc.). For this, inputfile, nstep, and oldstep would need to be
> > attributes of the class. However, since they are temporary (parsing is
> > never aborted before it's finished, I think), this may not be a good
> > idea.
>
> Do they have to be attributes of the class? Can they be local
> variables in parse() which are passed into extract() (or maybe this is
> too messy)? Alternatively, they can be 'weak' attributes, with names
> like self._nstep. AFAIK, this is a convention to indicate 'private'
> variables of a class, although it doesn't do much else.

Yes, probably better to pass them as arguments, for now at least.

> > 2) The code that uses cupdate, fupdate, nstep, and oldstep (condition "if
> > self.progress and random.random() < fupdate") is repeated multiple times,
> > but the same question arises as the above.
>
> Well, I think that this code can be moved into a function.

Is there a difference in the meaning of the numbers cupdate and fupdate, 
beyond what I can see in the code?

> > Could you perhaps explain the full purpose of LogFile.progress?
>
> It's main purpose is for GUI applications to be able to display a
> progress bar showing how near to completion the parsing is. This is
> because parsing can take more than 20 seconds for large files
> containing population data (which is what both myself and Adam were
> interested in). You should try out PyMOlyze to see this in action.
> There is some cost in seconds in including the progress code, but
> blindingly fast parsing is not the main goal of cclib. For instance,
> it is possible to rewrite our multiple 'if' statements in other forms
> that would parse large files quicker. If you care about this, it might
> be worth thinking about, bearing in mind that "premature optimization
> is the root of all evil" or something.

I don't care about optimization a bit, at least at this point. I do hope, 
though, that some extent of refactoring will make writing new parsers and 
extending the present ones a little easier.

I'm going to try out pyMOlyze and GaussSum soon, when I find a bit more free 
time.

> > Maybe there
> > would be some advantages in making the logfile file object an attribute
> > of LogFile (as in self.intputfile)?
>
> IMO the fewer attributes the better. Since it's a 'derived attribute'
> (i.e. it can be recreated from self.filename), and is only used within
> extract(), I don't see how it can be useful. If you can think of a
> useful reason for doing this though, go ahead.

No, I don't :)

Karol

-- 
written by Karol Langner
Fri Feb  2 16:29:18 CET 2007