while trying to fix bug #1579231 "Smiles parser failure: 239-64-5" some
questions came up that I would like to discuss with you.
The problem is the very long and finally wrong parsing of the SMILES
"c1ccc4c(c1)ccc5c3ccc2ccccc2c3nc45" (CAS #239-64-5). More specifically
the problem is an aromatic 5-ring with nitrogen.
From my debugging I guess that the actually problem is the call to
The aromatic nitrogen in the 5-ring doesn't get a hydrogen atom added to
it. Due to this the call to:
cannot go well since the aromatic nitrogen needs to be hydrogenated.
As can be seen from the simple case of Imidazole "C1=NC=CN1" aromatic
nitrogens in 5-rings might or might not have a hydrogen. If there are
two hydrogens in a 5-ring, one possesses a hydrogen and the other
doesn't. If there's only one nitrogen in a 5-ring, it should be
hydrogenated. The same applies to oxygen, sulfur and phosphor and
mixtures of all four.
Hence, the ValencyChecker should be able to work with alternatives when
it comes to hydrogenation. Right now the ValencyChecker requires
implicit hydrogens being set prior to saturate a molecule. I think this
would need to be changed.
I know that Egon is working on a new atom type perception. Maybe that
might help in finding a solution?
I think if we find a solution the the above problem, quite a few bugs
can be closed that are related to it.
Looking forward to your input.
PhD student in the research group of Prof. Dr. Gisbert Schneider
Johann Wolfgang Goethe University
Beilstein-endowed Chair for Chemoinformatics
Building B - 3rd floor
60323 Frankfurt am Main
Tel.: +49 69 798 24879
Fax: +49 69 798 24880