Thanks. Your reply is actually very helpful. Yes, I just care whether there is no overall charge. If the whole compound is neutral, I am supposed to do nothing with it. The case mentioned is from a compound database. I need to make sure that each compound is neutral before my program goes to the next step. In your last example, I believe either output would be enough to me.

My knowledge of Inchi is very limited. Could you please help me to verify that if a Inchi has no†layer starting with /p or /q, this Inchi must be neutral?

I am also confused by the following part:

My guess would be the correct way would be to order the neutralisation of charges using pKa? In which case you need a pKa predictor, again, not simple [1].

Firstly, what is pKa?†

Secondly, what does [1] refer to?†

Yes, I did try RDKit and got help from developers. It looks their solution may also have problems on multiple charge cases. I hope the number of cases with this problem will be reduced after I add a function by calculating the overall charge.

Again, thank you very much.

Merry Christmas and Happy New Year!


On Mon, Dec 23, 2013 at 12:49 PM, John May <> wrote:
Hi Yingfeng,

In short, no. I donít think itís easy to provide a comprehensive solution for neutralisation. However approximations such as the RDKit SMARTS youíve tried offer a good approach for most cases.

What might be easier is to understand why you need to neutralise the compounds?

Anyways, Iím not a chemist but Iíll try my best to answer as to why itís not simple. Firstly in an InChI string you can tell if there is a charge when a layer starts with /p or /q.†


You could also check the atoms where the formal charge is not 0. I guess what you really mean by charged is whether there is no overall charge.

C[C@@H](O)[C@H]([NH3+])C([O-])=O uncharged
C[C@@H]([O-])[C@H]([NH3+])C([O-])=O charged

Again a simple procedure of summing all the charges in a connected structure will tell you this:

int sum = 0;
for (IAtom a : m.atoms())
† † sum += a.getFormalCharge();
boolean charged = sum == 0;

As for neutralising, thatís more tricky. There may be something in a dusty corner of the CDK code but Iím not aware of it. The neutralisation of one atom is easily made by adding/removing protons or breaking/making bonds. However when there are multiple charges it is non-trival as it involves a decision. Considering the example from earlier.

C[C@@H]([O-])[C@H]([NH3+])C([O-])=O charged

how do we decide which neutralised form is correct, these are both have no overall charge:

C[C@@H](O)[C@H]([NH3+])C([O-])=O uncharged
C[C@@H]([O-])[C@H]([NH3+])C(O)=O uncharged

My guess would be the correct way would be to order the neutralisation of charges using pKa? In which case you need a pKa predictor, again, not simple [1].

Neutralisation reduces to finding the ionisation a given pH (i.e. find the pH where the compound is neutral). ChemAxon offer this functionality but I have been told of examples where given two ionisation states of the same compound (one > desired pH, one < desired pH) the tool produces different output.†

Sorry I canít be of more help.


[1] Lee and Crippen, Predicting pKa†

On 21 Dec 2013, at 14:05, Yingfeng Wang <> wrote:

I have a compound with Inchi


First of, is there is a way to know whether it is charged?

Secondly, is CDK able to neutralize it if it is charged?



Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
Cdk-user mailing list