#1090 MDLV3000Reader Problem(s)

  • Info:
    Checked out the project on: 2010-08-25
    If I am mistaken please tell me, so I can adapt my code.
    MDLV3000Reader was used for reading .sdf-files in the V3000 Molfile format, then the SSSRFinder was used to detect rings; it through the exception "no such vertex in graph". I tracked down the problems and want to report them here.
    Since I know to little about the project and don't have the time to take a closer look I won't submit any source code (unless you explicitly ask for it).

  • Description of the problems:

  • 1) Method: readAtomBlock:
    A new atom is created,
    Identifier of the atom is set
    The element token is read
    A new atom is created out of the element token >> The identifier information is lost...
  • 2) Method: readSGroup
    If the atom is not already a pseudo atom it will be casted into one, the label will be set and the the original atom will replaced by the PseudoAtom in the AtomContainer (readData.atoms()).
    However, the pseudo atom will get a new ID, somehow the old ID is lost during the creation process of the PseudoAtom. Since the bonds are not updated this can cause severe problems because bonds can still point to the original atom. Hence if for example the SSSRFinder is called, the graph can not be built correctly.
  • 3) Note: The AtomTypeName is never set in the reader. I guessed it was for distinguishing between f.ex. super-atoms and decorated atoms. I had to distinguish between S-Groups and R-Groups, however in both cases the PseudoAtom will be used for representing it. I think it would be a good way to set the AtomTypeName for the different types.

  • Possible ways to solve them:
    Since I don't have the time to do that I will just report what I thought about possible solutions and attach a file where I marked several part with TODOs.

  • for 2) : There are several ways to solve that, I used a quite dirty (but quick) solution since I have to produce some statistical information asap; Instead of replacing the old atom I just add the pseudo atom to the list of atoms. This is not a good solution because the atom is somehow duplicated, however for the purpose of detecting rings it's ok, since the bonds will point to existing atoms.
    Another solution would be deleting the bond reference of the replaced atom.
    I think the best solution would be to really pass all of the atom's information to the pseudo atom. I guess some comparator for an ID is used to compare atoms, so this should solve the problem (assumed there is a unique identifier for each atoms that is used for comparison).

  • Example test-file (sdf):
    BLDK-I23498738429 1 0.034568 0.000000 0
    BlaBla some header
    0 0 0 0 0 999 V3000
    M V30 COUNTS 8 8 1 0 0
    M V30 1 C -72.71 6.325 0 0
    M V30 2 C -74.0437 5.555 0 0
    M V30 3 C -74.0437 4.015 0 0
    M V30 4 C -72.71 3.245 0 0
    M V30 5 C -71.3763 4.015 0 0
    M V30 6 C -71.3763 5.555 0 0
    M V30 7 Cl -71.94 7.6587 0 0
    M V30 8 N -69.795 7.645 0 0
    M V30 END ATOM
    M V30 1 1 1 2
    M V30 2 2 2 3
    M V30 3 1 3 4
    M V30 4 2 4 5
    M V30 5 1 5 6
    M V30 6 2 1 6
    M V30 7 1 1 7
    M V30 8 1 7 8
    M V30 END BOND
    M V30 1 SUP 0 ATOMS=(2 7 8) XBONDS=(1 7) LABEL=ClNH3 ESTATE=E
    M V30 END CTAB
    M END


  • DaKa

    DaKa - 2010-08-27

    Changed MDLV3000Reader.java

  • Egon Willighagen

    There is this helper method for this issue:

    AtomContainerManipulator.replaceAtomByAtom(container, atom, atom)