I tried to read particular ChEbi molfiles, created with Marvin. This failed, it turned out that these molfiles contain bond types > 4.
An example is this appropriately Christmas tree-like compound http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:51730
According to the CTFfile spec, the bond types go up to 8.
1 = Single, 2 = Double,
3 = Triple, 4 = Aromatic,
5 = Single or Double,
6 = Single or Aromatic,
7 = Double or Aromatic, 8 = Any
However, MDLV200Reader stops at 4, and you end up with null bonds in the atomcontainer if the bond type is 5-8.
Is it okay if I create a patch for this? I think the CDK should support these peculiar bond types, so we'd need some new IBond.Orders.
The patch should be easy but I don't know if there are objections as these bond types look somewhat odd.
If CDK doesn't accept the bonds, a clear error should be thrown.
I think the patch looks simpler than it will really be... you will have to examine all CDK code, and identify all situations where the IBond.Order is used.
I am pretty sure there are many algorithms now depending on having 4 types of IBond.Orders. This is not to discourage you in adding these types, but you should just realize this is a major API change that in itself will break depending CDK code. And you cannot rely on unit tests here, because very much code is not unit tested yet.
About the MDLV2000Reader... that should indeed throw an error when these bonds are encountered, or throw a warning, or so...
BTW, the MDL molfile specification reserves bond types 4-8 for queries. So, I file with such bond types should really be parsed into an IQueryAtomContainer... but we current do not have any support for reading MDL molfiles with SSS entries...
Oh, and I'll bring up the problem that the MDL molfile for the ChEBI entry is not a structure but a search request.
File a clarification request at:
https://sourceforge.net/tracker/?func=detail&aid=2914036&group_id=125463&atid=702608
Thanks for the comment Egon. You're right it won't be that simple. The MDL spec lists the eight bond types (see above under 'Details') from 1..8 as a coherent group. However, the CDK uses IBond.Order for single..quadruple, but the aromaticity is modeled with a boolean flag. So to model for example type 6 (single or aromatic) actually does not fit the CDK design at all.
Marking this as feature request: reading such files into a IQueryAtomContainer is interesting.
Patch pending:
https://sourceforge.net/tracker/?func=detail&aid=3154364&group_id=20024&atid=320024