When reading an sdf into a ChemFile with MDLReader, the first property for each molecule is skipped (this does not happen with MDLV2000Reader).
It appears that MDLReader.readMolecule() does not consume the final "M END" line from the MOL block, and confuses the code in MDLReader.readChemFile() as it reads the properties.
The attached code demonstrates the issue. When run with a MDLV2000Reader, the output looks like:
Properties of first molecule:
{cdk:Title=Molecule_Title, cdk:Remark=null, first=first, second=second}
But the output with MDLReader is:
Properties of first molecule:
{cdk:Title=Molecule_Name, second=second}
(Not sure which Milestone to report against, this is using CDK 1.5.10 from the maven repo)
Hi Jessen,
IRRC properties are deliberately not part of the MDLReader, in fact properties are really only in the SDfile. The distinction between the MDLReader and MDLV2000Reader is historic and so thin (I actually think it's only that the former ignored properties and 'M CHG' etc) that frankly I would just remove the old one to avoid the obvious confusion.
ps. you should avoid ChemFile in favour of the following idiom
J
Last edit: John May 2015-01-15
On Thu, Jan 15, 2015 at 10:20 AM, John May jwmay@users.sf.net wrote:
MDLReader file (so, the format before V2000) has less columns in the
Atom Block... that was of the original reasons to split it out.
Egon
--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/EgonWillighagen
V2000 will read variable column numbers.
On Thu, Jan 15, 2015 at 12:51 PM, John May jwmay@users.sf.net wrote:
Yes, but there is also the issue of reporting errors while reading in
a validation mode...
Egon
--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/EgonWillighagen
I see this mistake so often though. There was another one the other say on the mailing list.
Although as you show it is getting the 'second=...' out which is inconsistent. Will fix, thanks for the report.
Resolved - patch https://github.com/cdk/cdk/pull/117