Menu

#1356 MdlReader skips first property when reading sdfs

cdk-1.0.x
closed
nobody
None
1
2015-02-11
2015-01-14
Jessen
No

When reading an sdf into a ChemFile with MDLReader, the first property for each molecule is skipped (this does not happen with MDLV2000Reader).

It appears that MDLReader.readMolecule() does not consume the final "M END" line from the MOL block, and confuses the code in MDLReader.readChemFile() as it reads the properties.

The attached code demonstrates the issue. When run with a MDLV2000Reader, the output looks like:
Properties of first molecule:
{cdk:Title=Molecule_Title, cdk:Remark=null, first=first, second=second}

But the output with MDLReader is:
Properties of first molecule:
{cdk:Title=Molecule_Name, second=second}

(Not sure which Milestone to report against, this is using CDK 1.5.10 from the maven repo)

2 Attachments

Discussion

  • John May

    John May - 2015-01-15

    Hi Jessen,

    IRRC properties are deliberately not part of the MDLReader, in fact properties are really only in the SDfile. The distinction between the MDLReader and MDLV2000Reader is historic and so thin (I actually think it's only that the former ignored properties and 'M CHG' etc) that frankly I would just remove the old one to avoid the obvious confusion.

    ps. you should avoid ChemFile in favour of the following idiom

    MDLV2000Reader reader = new MDLV2000Reader(...);
    IAtomContainer container;
    while ((container = reader.read(new AtomContainer(0,0,0,0)) != null) {
      // process or add to a List etc
    }
    

    J

     

    Last edit: John May 2015-01-15
    • Egon Willighagen

      On Thu, Jan 15, 2015 at 10:20 AM, John May jwmay@users.sf.net wrote:

      IRRC properties are deliberately not part of the MDLReader, in fact properties are really only in the SDfile. The distinction between the MDLReader and MDLV2000Reader is historic and so thin (I actually think it's only that the former ignored properties and 'M CHG' etc) that frankly I would just remove the old one to avoid the obvious confusion.

      MDLReader file (so, the format before V2000) has less columns in the
      Atom Block... that was of the original reasons to split it out.

      Egon

      --
      E.L. Willighagen
      Department of Bioinformatics - BiGCaT
      Maastricht University (http://www.bigcat.unimaas.nl/)
      Homepage: http://egonw.github.com/
      LinkedIn: http://se.linkedin.com/in/egonw
      Blog: http://chem-bla-ics.blogspot.com/
      PubList: http://www.citeulike.org/user/egonw/tag/papers
      ORCID: 0000-0001-7542-0286
      ImpactStory: https://impactstory.org/EgonWillighagen

       
      • John May

        John May - 2015-01-15

        V2000 will read variable column numbers.

         
        • Egon Willighagen

          On Thu, Jan 15, 2015 at 12:51 PM, John May jwmay@users.sf.net wrote:

          V2000 will read variable column numbers.

          Yes, but there is also the issue of reporting errors while reading in
          a validation mode...

          Egon

          --
          E.L. Willighagen
          Department of Bioinformatics - BiGCaT
          Maastricht University (http://www.bigcat.unimaas.nl/)
          Homepage: http://egonw.github.com/
          LinkedIn: http://se.linkedin.com/in/egonw
          Blog: http://chem-bla-ics.blogspot.com/
          PubList: http://www.citeulike.org/user/egonw/tag/papers
          ORCID: 0000-0001-7542-0286
          ImpactStory: https://impactstory.org/EgonWillighagen

           
          • John May

            John May - 2015-02-11

            I see this mistake so often though. There was another one the other say on the mailing list.

             
  • John May

    John May - 2015-01-15

    Although as you show it is getting the 'second=...' out which is inconsistent. Will fix, thanks for the report.

     
  • John May

    John May - 2015-02-11
     
  • John May

    John May - 2015-02-11
    • status: open --> closed