From: Geoffrey H. <ge...@ge...> - 2006-12-01 00:43:52
|
On Nov 28, 2006, at 8:55 PM, Craig A. James wrote: > CC(=O)[O-].[Na+] > In nature, the two oxygens are symmetrically equivalent. But the > valence model of chemistry has no way to represent half charges, so > Open Babel represents this as an asymmetrical molecule. Well, the valence model of chemistry has no problem with half charges. Computer representations of the valence model? That's a different story. I think most chemists are happy to draw "1.5" bond orders and such. To some degree, that's the valence bond definition of conjugated bonds. Yes, it's sometimes hard to put that in a computer. The valence bond model can become "tricky" because it's really a way of qualitatively describing quantum mechanics. Your C(=O)[O-] example is only one. Another case is for nitro groups: N(=O)=O vs. [N+]([O-])=O. Any other conjugated system could cause problems. > However, in OB's symmetry analysis, OBMol::GetGIDVector(), OB > considers the two oxygen atoms of this molecule to be symmetrically > equivalent. ... The result is two atoms that are declared > "identical" when they plainly are not. You said this was a "philosophical" discussion. So let's stick to philosophy. Clearly the two oxygen atoms in this ion are *chemically* identical. The carboxylate is, in fact, symmetric. IMHO, GetGIDVector () is, in fact, returning the chemically correct answer. > There is a disasterous practical consequence of this > "philosophical" debate. Many algorithms "walk the graph" of a > molecule to find features, and use a symmetry analysis for > efficiency to cut down on redundant traversal. Imagine, for > example, a fingerprinting algorithm that enumerates short paths. OK, but in my humble philosophical opinion (IMHPO), if your algorithm is supposed to fingerprint molecules and it doesn't have some element of a chemical expert system, it may in fact make chemically incorrect statements. > It walks down the C-C=O path and adds it to the fingerprint. Then > it looks at the other oxygen and says, "hey, that's identical to > the one I just looked at, so I'll skip it!" Then, the next time > this substructure shows up, it might just happen that the > fingerprinter walks down the "C-C-[O-]" path first, and skips the > "C-C=O" path. You have two identical functional groups, but two > different fingerprints. While I agree that using symmetry analysis can improve efficiency, you've hit on a key point. Sometimes forcing bonds to have integer orders and two end points results in asymmetric *bond* representations of symmetric molecules. (Which is why most QM programs ignore the concept of bonds altogether). Now if you're saying that the OBBond class is really limited, I'd agree. (Let's not get into multi-center bonds, hydrogen bonds, agostic interactions...) If you're saying that the integer formal charges in OBMol are limited, I'd agree. But these are also common limitations of chemical formats (SMILES and SDF snap to mind, although these are hardly alone). If you want my personal philosophy, it's that GetGIDVector() is, in fact, representing the correct chemistry and your fingerprint algorithm needs to be careful. For example, perform an initial pass to make sure carboxylate, nitro, or other groups are in a canonical form. Just my $0.02, but it's a good discussion to continue. Cheers, -Geoff |