From: Peter Murray-R. <pm...@ca...> - 2003-12-02 21:10:11
|
Hope this isn't off topic... I am creating code for accessing CML DOM and would like guidance on how to get() primitive quantities that may not have been specifically initialised and for which there is no default. I Is there a standard way for doing this?. For example to get formalCharge (where no value is disallowed specifically) I can see three possibilities int getFormalCharge() throws Exception; // here an exception would be thrown if the value had not been initialised int getFormalCharge(); // an agreed meaningless value (e.g. -Integer.MAXVALUE) would be returned Integer getFormalCharge(); // a null Integer would be returned if no initialisation. I do not like 2 as it is easy to forget to check and the value may get passed into algorithms. I have personally been using 3 but it may suffer from performance overheads. Does CDK have a policy on non-initialised quantities? Or are there defaults for everything? P. Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |
From: Peter Murray-R. <pm...@ca...> - 2003-12-02 21:10:19
|
Hope this isn't off topic... I am creating code for accessing CML DOM and would like guidance on how to get() primitive quantities that may not have been specifically initialised and for which there is no default. I Is there a standard way for doing this?. For example to get formalCharge (where no value is disallowed specifically) I can see three possibilities int getFormalCharge() throws Exception; // here an exception would be thrown if the value had not been initialised int getFormalCharge(); // an agreed meaningless value (e.g. -Integer.MAXVALUE) would be returned Integer getFormalCharge(); // a null Integer would be returned if no initialisation. I do not like 2 as it is easy to forget to check and the value may get passed into algorithms. I have personally been using 3 but it may suffer from performance overheads. Does CDK have a policy on non-initialised quantities? Or are there defaults for everything? P. Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |
From: Peter Murray-R. <pm...@ca...> - 2003-12-02 21:10:21
|
Hope this isn't off topic... I am creating code for accessing CML DOM and would like guidance on how to get() primitive quantities that may not have been specifically initialised and for which there is no default. I Is there a standard way for doing this?. For example to get formalCharge (where no value is disallowed specifically) I can see three possibilities int getFormalCharge() throws Exception; // here an exception would be thrown if the value had not been initialised int getFormalCharge(); // an agreed meaningless value (e.g. -Integer.MAXVALUE) would be returned Integer getFormalCharge(); // a null Integer would be returned if no initialisation. I do not like 2 as it is easy to forget to check and the value may get passed into algorithms. I have personally been using 3 but it may suffer from performance overheads. Does CDK have a policy on non-initialised quantities? Or are there defaults for everything? P. Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |
From: Miguel H. <mt...@mt...> - 2003-12-02 21:31:50
|
To date I haven't used CDK much, so my perspective is that of an outsider looking in. > int getFormalCharge() throws Exception; > // here an exception would be thrown if the value had not been > initialised I think that the exception mechanism is heavily overused in CDK. We expect to have entries with no formal charge. Therefore, it is inappropriate to use the exception mechanism. > int getFormalCharge(); > // an agreed meaningless value (e.g. -Integer.MAXVALUE) would be > returned I'm not sure what the expected range is for this attribute. (You should use Integer.MIN_VALUE instead of -Integer.MAX_VALUE ... they are not the same) I would probably define my own constant for this, something like CHARGE_UNSPECIFIED. > Integer getFormalCharge(); > // a null Integer would be returned if no initialisation. I think this is a lot of unnecessary overhead. > I do not like 2 as it is easy to forget to check and the value may get > passed into algorithms. I suppose that the other two options do force the end-user (or the system) to do a little more checking. But I think the cost is pretty high. I would: 1. define my own constant ... CHARGE_UNSPECIFIED ... or something 2. supply a predicate for testing public boolean hasFormalCharge() 3. go with option 2 public int getFormalCharge() Miguel |
From: Peter Murray-R. <pm...@ca...> - 2003-12-03 19:43:18
|
At 22:30 02/12/2003 +0100, Miguel Howard wrote: >To date I haven't used CDK much, so my perspective is that of an outsider >looking in. Thanks for helping > > int getFormalCharge() throws Exception; > > // here an exception would be thrown if the value had not been > > initialised > >I think that the exception mechanism is heavily overused in CDK. Interesting. I don't feel the same way - in fact there are cases where I think more exceptions should be thrown. Is there a general philosophy of how exceptions should be used? Is this a performance problem? > We expect >to have entries with no formal charge. Therefore, it is inappropriate to >use the exception mechanism. The point is that it can be made an exception if a file has no information on formal charge. In which case we need some way of discovering that the field has not been set. > > int getFormalCharge(); > > // an agreed meaningless value (e.g. -Integer.MAXVALUE) would be > > returned >I'm not sure what the expected range is for this attribute. There isn't one. (In practice it is likely to be low integers though proteins can amass quite large values of charge.) >(You should use Integer.MIN_VALUE instead of -Integer.MAX_VALUE ... they >are not the same) Thanks. >I would probably define my own constant for this, something like >CHARGE_UNSPECIFIED. Agreed - that was implied > > Integer getFormalCharge(); > > // a null Integer would be returned if no initialisation. >I think this is a lot of unnecessary overhead. Understood > > I do not like 2 as it is easy to forget to check and the value may get > > passed into algorithms. > >I suppose that the other two options do force the end-user (or the system) >to do a little more checking. But I think the cost is pretty high. Do you have any figures on this (other than increased code verbosity). >I would: > 1. define my own constant ... CHARGE_UNSPECIFIED ... or something > 2. supply a predicate for testing > public boolean hasFormalCharge() I like this idea. > 3. go with option 2 > public int getFormalCharge() So : if (atom.hasFormalCharge()) return atom.getFormalCharge(), else ??? P. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |
From: Egon W. <eg...@sc...> - 2003-12-05 23:30:16
|
On Wednesday 03 December 2003 20:43, Peter Murray-Rust wrote: > At 22:30 02/12/2003 +0100, Miguel Howard wrote: > >To date I haven't used CDK much, so my perspective is that of an outsider > >looking in. > > Thanks for helping > > > > int getFormalCharge() throws Exception; > > > // here an exception would be thrown if the value had not been > > > initialised > > > >I think that the exception mechanism is heavily overused in CDK. > > Interesting. I don't feel the same way - in fact there are cases where I > think more exceptions should be thrown. Is there a general philosophy of > how exceptions should be used? Is this a performance problem? I don't think there is a clear policy yet, at least, there is no RFC ;) The CDK, in general, tries to separate storage classes and algorithms as much as possible. That taken as a first directive, we migth consider use of Exception's separately for those too... For algorithms, I think, CDK uses Exception to signal the fail of an algorithm caused by bad input... Or some other problem, like IO problems, caused deeper inside the JVM... This should be formalized in a RFC... The core classes tend not to use Exception's, but use null values in cases information is not available. But, clearly, this does not work with signatures like: int getX2D(). This is inconsistent, and, I think, must be fixed. Thanx for bringing this up. I'll come back to this, after I have inventorized the core classes to see how wide spread this problem is... Egon BTW, there is an article about use of Exception's here: http://www.onjava.com/pub/a/onjava/2003/11/19/exceptions.html |
From: Egon W. <eg...@sc...> - 2003-12-06 13:34:43
|
On Saturday 06 December 2003 02:54, Egon Willighagen wrote: > I'll come back to this, after I have inventorized the core classes to see > how wide spread this problem is... To summarize the problem: there is no well defined way to see if fields are defined for which the get() method returns a native type (int, double, etc). For other fields, CDK tends to use a null value default for unset fields. However, this is not formalized and not every core class uses this mechanism: not all fields are defaulted to null! Class Field is set? Atom point2D getPoint2D() != null point3D getPoint3D() != null fractionalPoint3D hydrogenCount stereoParity charge formalCharge problematic: fractionalPoint3D, hydrogenCount, stereoParity, charge and formalCharge Bond order atomCount atoms getAtom[].length > 0 stereo problematic: order, atomCount, stereo AtomType maxBondOrder bondOrderSum vanderwaalsRadius covalentRadius id problematic: all Well, I won't bother continuing this list... it is clear that the problem is wide spread... So here's the RFC: ------------------------------------------------------------------------------ RFC: Dealing with unset fields in the core classes Proposer: Egon Willighagen (Miguel, Peter, if you want to second this proposal, then I'll add you to the list) Date: 2003-12-06 PROPOSAL Core classes must have a mechanism to check wether a field has been set or not. Class fields (String AtomType.id, Point3d Atom.point3d, etc) must have a default null value signaling the unset state. Fields with native types (double Atom.charge, double Bond.order, etc) must have a method named hasX() where X represents the field name, e.g. Bond.hasOrder() matching Bond.getOrder(). REASON Recent discussion resulted in the notice that the core classes do not have a mechanism in common that allow identifying unset fields; some classes use a null value default for unset values using Class types, e.g. String, but not even consistenly. A quick survey of three core classes (Atom, Bond and AtomType) revealed that for only 3 out of 16 fields it was clear how to detect wether it was set or not. A few alternatives were proposed: 1. throw an Exception when an unset fiels was get()-ed 2. use an akward value as a constant (e.g. -MAX) 3. use Class types everywere, e.g. Integer instead of int, with a null default 4. a hasX() method for fields with native types (int, double, etc) Some developers thought 1. was misuse of Exceptions, which should only signal problems in the normal process of the algorithm (though the Java community in general does not seem to have reached a general view on this...) Option 2. has the downside that it cannot lead to consistent behaviour between classes and fields. Some fields are principally, but probably practically not, unlimited, such as an atomic charge, and can be both positive and negative. The third option is expected to give a too big overhead, and disregarded for that reason. The dicussion on cdk-devel seemed to favor the last option, and is thus used in this proposal. ------------------------------------------------------------------------------ |
From: Peter Murray-R. <pm...@ca...> - 2003-12-06 15:12:19
|
At 16:59 06/12/2003 +0100, Egon Willighagen wrote: >On Saturday 06 December 2003 02:54, Egon Willighagen wrote: > > I'll come back to this, after I have inventorized the core classes to see > > how wide spread this problem is... > >To summarize the problem: >there is no well defined way to see if fields are defined >for which the get() method returns a native type (int, double, etc). >For other fields, CDK tends to use a null value default for >unset fields. However, this is not formalized and not every >core class uses this mechanism: not all fields are defaulted to >null! > >Class Field is set? > >Atom point2D getPoint2D() != null > point3D getPoint3D() != null > fractionalPoint3D > hydrogenCount > stereoParity > charge > formalCharge > >problematic: fractionalPoint3D, hydrogenCount, stereoParity, > charge and formalCharge > >Bond order > atomCount > atoms getAtom[].length > 0 > stereo > >problematic: order, atomCount, stereo > >AtomType maxBondOrder > bondOrderSum > vanderwaalsRadius > covalentRadius > id > >problematic: all > >Well, I won't bother continuing this list... it is clear that the problem >is wide spread... So here's the RFC: > >------------------------------------------------------------------------------ > >RFC: Dealing with unset fields in the core classes > >Proposer: Egon Willighagen >(Miguel, Peter, if you want to second this proposal, then I'll add you to the > list) I'm happy to second this. >Date: 2003-12-06 > >PROPOSAL > >Core classes must have a mechanism to check wether a field has been set or >not. >Class fields (String AtomType.id, Point3d Atom.point3d, etc) must have a >default >null value signaling the unset state. Fields with native types (double >Atom.charge, double Bond.order, etc) must have a method named hasX() where >X represents the field name, e.g. Bond.hasOrder() matching Bond.getOrder(). > >REASON > >Recent discussion resulted in the notice that the core classes do not have a >mechanism in common that allow identifying unset fields; some classes use a >null value default for unset values using Class types, e.g. String, but not >even consistenly. > >A quick survey of three core classes (Atom, Bond and AtomType) revealed that >for only 3 out of 16 fields it was clear how to detect wether it was set or >not. Comments on proposals >A few alternatives were proposed: > >1. throw an Exception when an unset fiels was get()-ed I liked the article on Java exceptions which distinguished between checked (i.e. must catch) exceptions and unchecked (only caught at toplevel = main). Following the argument of that article checked exceptions are thrown when the thrower (== CDK) assumes the client knows exactly what is going on and must take care of it. Examples (can't remember exact names): readMolecule(filename) throws FileNotFoundException. Here the client should have code which instructs the user to select another filename generate2DCoordinates() throws RuntimeException() this is likely to be either a pathological molecule or a programming bug. The client could probably have code like: for (int i = 0; i < nmols; i++) { try { //... generate2DCoordinates(); //.. predictNMRShifts(); // catch (RuntimeException rte) { // error message and stack trace } // skip to next molecule } >2. use an akward value as a constant (e.g. -MAX) >3. use Class types everywere, e.g. Integer instead of int, with a null default >4. a hasX() method for fields with native types (int, double, etc) > >Some developers thought 1. was misuse of Exceptions, which should only signal >problems in the normal process of the algorithm (though the Java community in >general does not seem to have reached a general view on this...) I shall probably put exceptions in CMLDOM... >Option 2. has the downside that it cannot lead to consistent behaviour between >classes and fields. Some fields are principally, but probably practically not, >unlimited, such as an atomic charge, and can be both positive and negative. > Don't like it. >The third option is expected to give a too big overhead, and disregarded for >that reason. Agreed >The dicussion on cdk-devel seemed to favor the last option, and is thus used >in this proposal. I'm happy to support this for CDK - will anyhow use HasX() in my own stuff. P. >------------------------------------------------------------------------------ Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |