Hello everyone,

Interesting discussion, let me add my 2 cents.

I agree with Egon's last statement, the CDK isn't responsible for there being crap out there, but we can at least do our best to support it.
Sadly we're not in a dominant enough position to 'force' users to improve their wicked ways, all we'd be doing is alienating users. (Though we could always spam the logfiles with warnings "This molecule is chemically WRONG, improve your input, you silly chemist", or something along those lines ;-) )

I would be against Rajarshi's convenience method if the results can be inconsistent.
One method suggests one algorithm, so a method that can return two different, potentially inconsistent results (partial vs formal charge) depending on non-obvious parameters (namely "some obscure" atom-property not being set for all atoms) is, in my opinion, a bad idea.
Even more annoying is that to the end user, it won't be obvious which algorithm ended up being used, you just get an int. (Not unless he runs over all atoms first and checks their properties, which defeats the purpose of the so-called 'convenience' method entirely)

If I'm working with a large dataset, I'd want consistent results over the entire set, not an algorithm that changes from molecule to molecule.
I know that I could always "fall back" on getTotalFormalCharge() for internal consistency, but not every user will know this, leading to a lot of accidental misuse-due-to-stupidity.

As an analogy: I'd rather not give users a double-bladed sword with all the opportunities for accidental self-stabbing, when there exist two normal, non-accident-prone swords too.

If the user wants a result so badly, she can always use the getTotalFormalCharge(), or write the tiny 4-line "if not null .. else ..." wrapper herself (which is what the entire getTotalCharge() would end up being anyway, if I read correctly).

(Ok, that may have ended up being 3 cents worth of typing, but yeah...)

On 6 May 2011 22:32, Egon Willighagen <egon.willighagen@gmail.com> wrote:
On Fri, May 6, 2011 at 7:02 PM, Rajarshi Guha <rajarshi.guha@gmail.com> wrote:
> And that'd be a bug, for which the CDK is not responsible for (though
> a check could always be made on our side)

People have been convincing me very hard to support all sorts of crap
with MDL molfiles, SMILES, what so fort... consensus has always been
that the CDK is not responsible, but should not be incompatible

> I'd still argue for a single function to get total charge. If you want
> specifics go down to atom level accessors

If no one object against removing the current algorithmic approaches,
I won't object either...


Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
Cdk-devel mailing list