Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#254 MDL readers to deal with D and T atom symbols

Mark Rijnbeek
master (162)
Mark Rijnbeek

Currently the MDL readers parse atom symbols D (Deuterium) and T (Tritium) into Pseudo atoms with label "D" or "T".
This patch fixes that, D and T resulting in heavy hydrogens with mass 2 or 3.


  • Mark, how does that relate to the 'interpretHydrogenIsotopes' IO settings?

    The MDLV2000Reader converts the pseudo atoms into real atoms when that parameter is set, around lines 842-868...

  • Mark Rijnbeek
    Mark Rijnbeek

    hi Egon, sorry I missed that IO setting. It seems strange though that interpretHydrogenIsotopes is true by default, but I do get Pseudo atoms anyway. I will have a further look, I will alter/ditch this patch.

  • Mark Rijnbeek
    Mark Rijnbeek

    Attached a new patch, reworked to use the IO setting. In my case the setting did not work because Chembl puts "D" and "T" symbols in their molfiles without a corresponding "M ISO" line. I added two of these Chembl molfiles to the patch and unit tests.
    The method fixHydrogenIsotopes is now a bit more lenient and sets mass number itself for D and T using an IsoptopeFactory.

  • Mark, you should use the IChemObjectBuilder pattern, instead of instantiating a particular interface implementations directly. Also, the assert() pattern is first the expected value, then the tested value. Please review the two patches attached.

    Also, in the two news tests, you set the IO property for one, not the other... why is that?

  • Rajarshi Guha
    Rajarshi Guha

    applied and pushed