Thread: [cclib-devel] Separation of parsers and data

Brought to you by: atenderholt, baoilleach, langner

This project can now be found here.

cclib-devel

[cclib-devel] Separation of parsers and data

From: Noel O'B. <bao...@gm...> - 2007-07-11 12:34:19

Following on the discussion on the list, are we all in favour of
making this separation go ahead?

I am, and I'd like to see it happen as soon as possible, so that we
have time to get everything in shape before the next release.

+1

Noel

Re: [cclib-devel] Separation of parsers and data

From: Adam T. <a-t...@st...> - 2007-07-11 15:53:40

> Following on the discussion on the list, are we all in favour of
> making this separation go ahead?
>
> I am, and I'd like to see it happen as soon as possible, so that we
> have time to get everything in shape before the next release.

I'm mostly in favor, but as always, am a bit cautious. So I have a  
handful of questions:

Can you (again) describe what changes will be necessary?
Will this happen in trunk, or in a separate branch?
How is this going to affect parsers that are being developed in a  
different branch?
How long do you expect this to take?

Adam

Re: [cclib-devel] Separation of parsers and data

From: Noel O'B. <bao...@gm...> - 2007-07-11 15:56:14

On 11/07/07, Adam Tenderholt <a-t...@st...> wrote:
> > Following on the discussion on the list, are we all in favour of
> > making this separation go ahead?
> >
> > I am, and I'd like to see it happen as soon as possible, so that we
> > have time to get everything in shape before the next release.
>
> I'm mostly in favor, but as always, am a bit cautious. So I have a
> handful of questions:
>
> Can you (again) describe what changes will be necessary?
I think Karol's email of the 9th of this month actually has the
changes as an attachment. I think they are pretty minimal.

> Will this happen in trunk, or in a separate branch?
> How is this going to affect parsers that are being developed in a
> different branch?
> How long do you expect this to take?
I would hope for it to be all done quick and nasty on the trunk within
a single day or so, and then merged to the branches, and then
completed for the parsers on the branches. We should start using
svnmerge.py for the branches to make it easy to keep track of what has
been merged, and in what direction.

> Adam
>
>
>

Re: [cclib-devel] Separation of parsers and data

From: Karol L. <kar...@kn...> - 2007-07-12 01:06:16

On Wednesday 11 July 2007 11:56, Noel O'Boyle wrote:
> On 11/07/07, Adam Tenderholt <a-t...@st...> wrote:
> > Can you (again) describe what changes will be necessary?

There is a new class - cclibData (or whatever we rename it to). It contains 
the attribute lists and handles the checking of types in its own __setattr__ 
method - and should do everything concerned with data after it is parsed 
instead of Logfile.

The biggest change is that Logfile.parse() returns something, that is an 
instance of cclibData, which should hold the parsed attributes upen return. 
It is the biggest, because it changes the way cclib will be used. Now you 
need to write something like "data = ccopen(...).parse()" and data will hold 
the parsed data.

Changes are presently contained to logfileparser.py to make things clear and 
simple. However, it might be better to accomodate them in the parsers 
themselves later (that is, in the extract method) - currently attributes are 
set to Logfile as before and moved to the new cclibData object after the 
actual parsing (look at the new code). I don't know if that will raise any 
problems, but it has the potential :)

> > Will this happen in trunk, or in a separate branch?
> > How is this going to affect parsers that are being developed in a
> > different branch?
> > How long do you expect this to take?
>
> I think Karol's email of the 9th of this month actually has the
> changes as an attachment. I think they are pretty minimal.
>
> I would hope for it to be all done quick and nasty on the trunk within
> a single day or so, and then merged to the branches, and then
> completed for the parsers on the branches. We should start using
> svnmerge.py for the branches to make it easy to keep track of what has
> been merged, and in what direction.

Done. I had to add a few more things to the file I sent before to get things 
working. Seems to be OK, test statistics are unchanged.

Concerning Calculation Methods in cclib: they need to accept cclibData objects 
now. And a comment/idea: since cclibData objects are to hold cclib data in 
general, perhaps they should also contain the results of methods (MPA, etc.). 
It annoys me a little that after doing a population analysis I have a two 
objects, both containing data that concern the same calculation! On the other 
hand, other methods need multiple cclibData objects (CDA). Maybe a good 
alternative is to provide functions for some methods that add the method 
results to the same cclibData object passed to them, instead of creating a 
new object. I'm not sure if that is clear... do you have any comments?

-- 
written by Karol Langner
Thu Jul 12 02:30:47 EDT 2007

Re: [cclib-devel] Separation of parsers and data

From: Karol L. <kar...@kn...> - 2007-07-12 07:27:57

On Thursday 12 July 2007 03:03, Karol Langner wrote:
> The biggest change is that Logfile.parse() returns something, that is an
> instance of cclibData, which should hold the parsed attributes upen return.
> It is the biggest, because it changes the way cclib will be used. Now you
> need to write something like "data = ccopen(...).parse()" and data will
> hold the parsed data.

Maybe a function like ccparse(arg) = ccopen(arg).parse() would make a good 
shortcut?

-- 
written by Karol Langner
Thu Jul 12 09:24:22 EDT 2007

Re: [cclib-devel] Separation of parsers and data

From: Noel O'B. <bao...@gm...> - 2007-07-12 08:11:06

On 12/07/07, Karol Langner <kar...@kn...> wrote:
> On Thursday 12 July 2007 03:03, Karol Langner wrote:
> > The biggest change is that Logfile.parse() returns something, that is an
> > instance of cclibData, which should hold the parsed attributes upen return.
> > It is the biggest, because it changes the way cclib will be used. Now you
> > need to write something like "data = ccopen(...).parse()" and data will
> > hold the parsed data.
>
> Maybe a function like ccparse(arg) = ccopen(arg).parse() would make a good
> shortcut?

I feel like we're starting to overengineer here, and likely to confuse
users who just want a single way to parse files. We separated parse()
on purpose, so that people could have access to the settings before
parsing. If you feel that *this* is not necessary, I would be
interested in removing that.

> --
> written by Karol Langner
> Thu Jul 12 09:24:22 EDT 2007
>

Re: [cclib-devel] Separation of parsers and data

From: Noel O'B. <bao...@gm...> - 2007-07-12 08:19:46

On 12/07/07, Karol Langner <kar...@kn...> wrote:
> On Wednesday 11 July 2007 11:56, Noel O'Boyle wrote:
> > On 11/07/07, Adam Tenderholt <a-t...@st...> wrote:
> > > Can you (again) describe what changes will be necessary?
>
> There is a new class - cclibData (or whatever we rename it to). It contains
> the attribute lists and handles the checking of types in its own __setattr__
> method - and should do everything concerned with data after it is parsed
> instead of Logfile.
>
> The biggest change is that Logfile.parse() returns something, that is an
> instance of cclibData, which should hold the parsed attributes upen return.
> It is the biggest, because it changes the way cclib will be used. Now you
> need to write something like "data = ccopen(...).parse()" and data will hold
> the parsed data.
>
> Changes are presently contained to logfileparser.py to make things clear and
> simple. However, it might be better to accomodate them in the parsers
> themselves later (that is, in the extract method) - currently attributes are
> set to Logfile as before and moved to the new cclibData object after the
> actual parsing (look at the new code). I don't know if that will raise any
> problems, but it has the potential :)

I think it would be better to put it into the parsers. Otherwise the
code is misleading. That is, the subclasses set the attributes of the
LogfileParser, but the LogfileParser moves these attributes into a
cclibData instance. I agree though, that this is a very good first
step, and will allow us to deal with the changes incrementally.

> > > Will this happen in trunk, or in a separate branch?
> > > How is this going to affect parsers that are being developed in a
> > > different branch?
> > > How long do you expect this to take?
> >
> > I think Karol's email of the 9th of this month actually has the
> > changes as an attachment. I think they are pretty minimal.
> >
> > I would hope for it to be all done quick and nasty on the trunk within
> > a single day or so, and then merged to the branches, and then
> > completed for the parsers on the branches. We should start using
> > svnmerge.py for the branches to make it easy to keep track of what has
> > been merged, and in what direction.
>
> Done. I had to add a few more things to the file I sent before to get things
> working. Seems to be OK, test statistics are unchanged.
>
> Concerning Calculation Methods in cclib: they need to accept cclibData objects
> now. And a comment/idea: since cclibData objects are to hold cclib data in
> general, perhaps they should also contain the results of methods (MPA, etc.).
> It annoys me a little that after doing a population analysis I have a two
> objects, both containing data that concern the same calculation! On the other
> hand, other methods need multiple cclibData objects (CDA). Maybe a good
> alternative is to provide functions for some methods that add the method
> results to the same cclibData object passed to them, instead of creating a
> new object. I'm not sure if that is clear... do you have any comments?

I'm not so sure about this. Personally, after a population analysis I
don't want an object at all. I just want some data structures, e.g.
the charges on the atoms, and whatever else. Similarily, I don't think
algorithms should be called with cclibData objects. They should just
be fed with whatever information they need (explicit > implicit).
Otherwise, people who want to use these algorithms independently of
cclibData are going to have to jump through some hoops. My 2c.

> --
> written by Karol Langner
> Thu Jul 12 02:30:47 EDT 2007
>

Re: [cclib-devel] Separation of parsers and data

From: Karol L. <kar...@kn...> - 2007-07-12 08:34:22

On Thursday 12 July 2007 04:19, Noel O'Boyle wrote:
> > Concerning Calculation Methods in cclib: they need to accept cclibData
> > objects now. And a comment/idea: since cclibData objects are to hold
> > cclib data in general, perhaps they should also contain the results of
> > methods (MPA, etc.). It annoys me a little that after doing a population
> > analysis I have a two objects, both containing data that concern the same
> > calculation! On the other hand, other methods need multiple cclibData
> > objects (CDA). Maybe a good alternative is to provide functions for some
> > methods that add the method results to the same cclibData object passed
> > to them, instead of creating a new object. I'm not sure if that is
> > clear... do you have any comments?
>
> I'm not so sure about this. Personally, after a population analysis I
> don't want an object at all. I just want some data structures, e.g.
> the charges on the atoms, and whatever else. Similarily, I don't think
> algorithms should be called with cclibData objects. They should just
> be fed with whatever information they need (explicit > implicit).
> Otherwise, people who want to use these algorithms independently of
> cclibData are going to have to jump through some hoops. My 2c.

What I mean is that until now Logfile objects were passed to the methods. Now 
cclibData objects will be passed. By "I don't want an object at all" do you 
mean that you would like the methods to return list, arrays, or whatever, 
instead of setting them as attributes (currently to an the method class 
instance)?

-- 
written by Karol Langner
Thu Jul 12 10:26:27 EDT 2007

Re: [cclib-devel] Separation of parsers and data

From: Noel O'B. <bao...@gm...> - 2007-07-12 08:47:20

On 12/07/07, Karol Langner <kar...@kn...> wrote:
> On Thursday 12 July 2007 04:19, Noel O'Boyle wrote:
> > > Concerning Calculation Methods in cclib: they need to accept cclibData
> > > objects now. And a comment/idea: since cclibData objects are to hold
> > > cclib data in general, perhaps they should also contain the results of
> > > methods (MPA, etc.). It annoys me a little that after doing a population
> > > analysis I have a two objects, both containing data that concern the same
> > > calculation! On the other hand, other methods need multiple cclibData
> > > objects (CDA). Maybe a good alternative is to provide functions for some
> > > methods that add the method results to the same cclibData object passed
> > > to them, instead of creating a new object. I'm not sure if that is
> > > clear... do you have any comments?
> >
> > I'm not so sure about this. Personally, after a population analysis I
> > don't want an object at all. I just want some data structures, e.g.
> > the charges on the atoms, and whatever else. Similarily, I don't think
> > algorithms should be called with cclibData objects. They should just
> > be fed with whatever information they need (explicit > implicit).
> > Otherwise, people who want to use these algorithms independently of
> > cclibData are going to have to jump through some hoops. My 2c.
>
> What I mean is that until now Logfile objects were passed to the methods. Now
> cclibData objects will be passed. By "I don't want an object at all" do you
> mean that you would like the methods to return list, arrays, or whatever,
> instead of setting them as attributes (currently to an the method class
> instance)?

Exactly.

> --
> written by Karol Langner
> Thu Jul 12 10:26:27 EDT 2007
>

Re: [cclib-devel] Separation of parsers and data

From: Karol L. <kar...@kn...> - 2007-07-12 09:04:06

On Thursday 12 July 2007 04:47, Noel O'Boyle wrote:
> On 12/07/07, Karol Langner <kar...@kn...> wrote:
> > On Thursday 12 July 2007 04:19, Noel O'Boyle wrote:
> > > > Concerning Calculation Methods in cclib: they need to accept
> > > > cclibData objects now. And a comment/idea: since cclibData objects
> > > > are to hold cclib data in general, perhaps they should also contain
> > > > the results of methods (MPA, etc.). It annoys me a little that after
> > > > doing a population analysis I have a two objects, both containing
> > > > data that concern the same calculation! On the other hand, other
> > > > methods need multiple cclibData objects (CDA). Maybe a good
> > > > alternative is to provide functions for some methods that add the
> > > > method results to the same cclibData object passed to them, instead
> > > > of creating a new object. I'm not sure if that is clear... do you
> > > > have any comments?
> > >
> > > I'm not so sure about this. Personally, after a population analysis I
> > > don't want an object at all. I just want some data structures, e.g.
> > > the charges on the atoms, and whatever else. Similarily, I don't think
> > > algorithms should be called with cclibData objects. They should just
> > > be fed with whatever information they need (explicit > implicit).
> > > Otherwise, people who want to use these algorithms independently of
> > > cclibData are going to have to jump through some hoops. My 2c.
> >
> > What I mean is that until now Logfile objects were passed to the methods.
> > Now cclibData objects will be passed. By "I don't want an object at all"
> > do you mean that you would like the methods to return list, arrays, or
> > whatever, instead of setting them as attributes (currently to an the
> > method class instance)?
>
> Exactly.

I didn't think of that... and the idea appeals to me :) so I'm +1 on that. 
This can be done right away, since the methods need to be fixed due to the 
parser-data separation anyway.

-- 
written by Karol Langner
Thu Jul 12 10:59:29 EDT 2007

Re: [cclib-devel] Separation of parsers and data

From: Noel O'B. <bao...@gm...> - 2007-07-12 09:11:45

On 12/07/07, Karol Langner <kar...@kn...> wrote:
> On Thursday 12 July 2007 04:47, Noel O'Boyle wrote:
> > On 12/07/07, Karol Langner <kar...@kn...> wrote:
> > > On Thursday 12 July 2007 04:19, Noel O'Boyle wrote:
> > > > > Concerning Calculation Methods in cclib: they need to accept
> > > > > cclibData objects now. And a comment/idea: since cclibData objects
> > > > > are to hold cclib data in general, perhaps they should also contain
> > > > > the results of methods (MPA, etc.). It annoys me a little that after
> > > > > doing a population analysis I have a two objects, both containing
> > > > > data that concern the same calculation! On the other hand, other
> > > > > methods need multiple cclibData objects (CDA). Maybe a good
> > > > > alternative is to provide functions for some methods that add the
> > > > > method results to the same cclibData object passed to them, instead
> > > > > of creating a new object. I'm not sure if that is clear... do you
> > > > > have any comments?
> > > >
> > > > I'm not so sure about this. Personally, after a population analysis I
> > > > don't want an object at all. I just want some data structures, e.g.
> > > > the charges on the atoms, and whatever else. Similarily, I don't think
> > > > algorithms should be called with cclibData objects. They should just
> > > > be fed with whatever information they need (explicit > implicit).
> > > > Otherwise, people who want to use these algorithms independently of
> > > > cclibData are going to have to jump through some hoops. My 2c.
> > >
> > > What I mean is that until now Logfile objects were passed to the methods.
> > > Now cclibData objects will be passed. By "I don't want an object at all"
> > > do you mean that you would like the methods to return list, arrays, or
> > > whatever, instead of setting them as attributes (currently to an the
> > > method class instance)?
> >
> > Exactly.
>
> I didn't think of that... and the idea appeals to me :) so I'm +1 on that.
> This can be done right away, since the methods need to be fixed due to the
> parser-data separation anyway.

Before you do anything, you should realise that this code is tightly
linked to PyMOlyze, so  unless Adam agrees...

> --
> written by Karol Langner
> Thu Jul 12 10:59:29 EDT 2007
>

Re: [cclib-devel] Separation of parsers and data

From: Karol L. <kar...@kn...> - 2007-07-12 09:43:08

On Thursday 12 July 2007 05:11, Noel O'Boyle wrote:
> Before you do anything, you should realise that this code is tightly
> linked to PyMOlyze, so  unless Adam agrees...

Yes, of course, I try not to make any commits that change the current 
functionality without everyone's apporval.

-- 
written by Karol Langner
Thu Jul 12 11:39:33 EDT 2007

Re: [cclib-devel] Separation of parsers and data

From: Adam T. <a-t...@st...> - 2007-07-12 20:27:59

> Before you do anything, you should realise that this code is tightly
> linked to PyMOlyze, so  unless Adam agrees...

I agree that it makes sense to pass cclibData (or even attributes of  
cclibData) to the calculation methods instead of the parsers. Right  
now, I think it would be best to pass cclibData instead of attributes  
so that the interface for the methods remain as similar as possible.  
For instance: CSPA doesn't need overlaps, while most other  
populations analyses do.

Also, will the methods add the calculated attributes to the  
cclibData? The current implementations allow a FragmentAnalysis  
object to be passed directly to any of the population analysis  
methods, which is quite convenient for determining which fragment MOs  
contribute to the MOs of the entire molecule.

Adam