From: Egon W. <ego...@gm...> - 2010-10-21 10:54:09
|
On Thu, Oct 21, 2010 at 12:33 PM, Jules Kerssemakers <jul...@go...> wrote: > After a good night's sleep over this issue, I indeed think refactoring > Isotope to a field rather than a parent class is the best approach. > As for Egon's suggested new interface: Shouldn't IElement be a part of > IIsotope, rather than a seperate atom field? Good question... but sometimes the isotope info is undefined... more commonly, we define a carbon than a 13C or so... and a plain carbon does not imply 12C... > You seem to be suggesting > > atomA.setElement("Carbon"); > atomA.setIsotope(13); No, instead (redundant casting just for clarity): atomA.setElement((IElement)Elements.CARBON) atomA.setIsotope((IIsotope)Isotopes.CARBON13) Though, I can very well imagine that setIsotope() calls setElement via: public void setIsotop(IIsotope isotope) { this.isotope = isotope; setElement((IElement)isotope.getElement()); } > which would by extent allow the faulty construct of > > atomA.setElement("Hydrogen") which would be solved with the above implementation... In general, scientists are scholarly good at doing it wrong... the current CDK does not prevent that, and actually, it is our opinion it should not... the data model should allow to represent the broken chemistry we have around very abundantly... it would not help if we could not represent KEGG data as is :) > without touching the isotope/massnumber, resulting in a Hydrogen-13. I > can easily imagine this happening, and it just seems wrong to me to > allow this in the datamodel. > Since mass number and Element are very intimately linked, I believe > they should be one single class, with fields for element (=proton > number) and isotope/mass number (=proton+neutron number, "mass number" > is the preffered term). Indeed.... Elements.CARBON would have those values preset... and if we throw in a proper immutable pattern (which we should), then those will be linked forever... > If I misunderstand and you plan to put element in the isotope class, > it still shouldn't be duplicated in IAtom. Duplication of information > allows for contradiction of information, and that should be avoided. That's effectively what we would be doing, but for a different purpose: allowing atoms without isotope details... Alternatively, we define an UndefinedIsotope class, which formalizes that we do not know the isotope, and another class for MixedIsotope (for natural abundance mixtures, or so), etc, etc... > As an added bonus, this allows us to keep track of properties that > differ between isotopes of the same element, such as exact mass, > boiling point, radioactive half-life etc. If you split isotopes in two > classes (element and isotope), there's nowhere to put this stuff. I think we can do this currently as well... <snip> > Additionally this allows us to build constructors that check if the > values entered make sense, if I type in "new Isotope("H", 13)" it > would be nice to get slapped on the wrist for making an impossible > isotope. Which is what I am against... see above. > (Of course it may be nice to make an override contructor > "Isotope(String symbol, Integer mass-number, boolean > IKnowWhatImDoingOverride)" for when the CDK eventually gets adopted > into the field of High-energy physics, where strange stuff like that > could probably happen ;-) ) The current CDK design is to decouple validation from the data model... so, we have validation classes... reasons include: - make the data model faster, as no checking is needed - people may disagree on what is valid ("IKnowWhatImDoingOverride") - it allows for representing uncommon stuff > And a thought on internationalization: > Should the Isotope class contain a field for "longName" in addition to > its symbol? so "carbon" for "C", if yes, which language should this > be? (As a Dutchie, i'd like "koolstof" better ;-) ) Or would this be > better off with a lookup function elementSymbolToName(symbol, > language)? i18n is something uncovered indeed... > I'm also pondering if the element-symbols should get their own enum. > They are fairly rigid, but not completely immutable; new elements DO > get discovered, and they are also renamed, Roentgenium used to be > UnUnUniium before it was confirmed. Indeed... and we likely need to support both... think literature, think historic databases... > Some sense-checking is required somewhere, but I guess this could also > happen in the Isotope class setters/constructors. See above. > Singletons could still work, but then we would need a helluva lot of > classes for every element-massnumber combination. Which we can autogenerate :) > class carbon8 extends isotope {..} > class carbon9 extends isotope {..} > .. > class carbon22 extends isotope {..} > > That's 14 hard-coded classes just for carbon! I have no problem with that. The two do not have to exclude each other: a custom, mutable Isotope implementation, and immutable singleton classes... Egon -- Dr E.L. Willighagen Postdoctoral Research Associate University of Cambridge Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |