Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#591 Deuterium / Tritium fix in MDLV2000

Needs_Revision
closed
nobody
None
master
1
2013-09-24
2012-11-28
SBeisken
No

The "fixHydrogenIsotopes" method in the MDLV2000 reader does not set the atomic number of the isotopes (D, T). In the reader, the atom number is set for all other atoms that are not pseudo atoms. I have provided additional arguments in the constructor to fix this issue. There might be a more elegant solution to the problem but I could not find the correct factory method to configure the isotopes based on the mass number and element symbol only.

1 Attachments

Discussion

  • John May
    John May
    2012-11-29

    Looks good - all tests pass. Would it be possible to add a bit more description to the commit in future :-).

    Thanks,

    master-accepted

     
  • John May
    John May
    2012-11-29

    • milestone: Needs_Review --> Accepted
     
    • milestone: Accepted --> Needs_Revision
     
  • OK, thanx for the fix. Important one.

    John, I'll overrule your 'accepted' here, on the basis of a "best practice" we adopted that people from independent research groups should do the review to give the final 'accept'. But, I like to stress, that I am really happy you review (support) it still!

    Comments I like to see fixed:

    1. a central paradigm is that no information not in the file ends up in the data model. Thus, it should not set atomic masses...

      • IIsotope isotope = molecule.getBuilder().newInstance(IIsotope.class, 1, "H", 2, 2.014101778, 0.0115);
      • isotopeFactory.configure(newAtom, isotope);

    Can this be updated to something like:

    IIsotope isotope = molecule.getBuilder().newInstance(IIsotope.class,"H", 2);
    

    isotope.setMassNumber(1);

    for both D and T.

    1. Please use assertEquals, which is more informative in the error report may it ever break again.

      Assert.assertTrue(1 == deuterium.getAtomicNumber());
      868
      Assert.assertTrue(2.014101778 == deuterium.getExactMass());
      869

     
    Last edit: Egon Willighagen 2012-12-02
  • John May
    John May
    2012-12-02

    John, I'll overrule your 'accepted' here, on the basis of a "best practice" we adopted that people from independent research groups should do the review to give the final 'accept'. But, I like to stress, that I am really happy you review (support) it still!

    That's fine, but I did not know about these patches until Stephan posted them on the tracker. If it's a new feature/something I knew about I do not review it (hence I've left Stephan's rendering patches and the broken comparators). This case I would consider it a defect/unexpected behaviour which I would think was important to get fixed ASAP.

    a central paradigm is that no information not in the file ends up in the data model. Thus, it should not set atomic masses...

    true but don't these symbol explicitly indicate hydrogen isotopes? the symbols are also MDL specific - i.e. although D is a common symbol for Deuterium it's not in the CDK PeriodicTable. Hence to correctly convert in to something that can be handled elsewhere you'd have to set the symbol to 'H'. If it is left as 'D' you need to add another conditional somewhere else - probably more then one place (i.e. AtomTyper and IsotopFactory) that would be more work to maintain then simply converting the MDL representation to the CDK representation of the corresponding hydrogen isotope.

    Perhaps I'm missing something but in this case.. the correct representation of D/T in the CDK data model is a hydrogen isotope?

     
    • This case I would consider it a defect/unexpected behaviour which I would think was important to get fixed ASAP.

      Yeah, and again, it is good you reviewed it! It really helps.

      BTW, if it really is a serious bug, we should get it applied to cdk-1.4.x, or?

      true but don't these symbol explicitly indicate hydrogen isotopes?

      Yes, but the isotope masses info is not implied by the molfile specs.

      Oh... or do you mean masses are currently set for any atom already? then it would be consistent! Wrong for all atoms, but consistent!

       
  • John May
    John May
    2012-12-02

    I think the trouble is that Stephan has a separate branch for Knime which is a fork from master.

    Oh... or do you mean masses are currently set for any atom already? then it would be consistent! Wrong for all atoms, but consistent!

    Kind of... you could take the stance to not set the mass number for anything, but, then it wouldn't be very nice to use. It's a very thin line between doing to much and doing too little. I've also got a patch which sets the implicit hydrogen count based on the valency from the MDL. This is actually a feature of the spec but should we ignore the it and not set the implicit h count? The way I'm thinking of it, is of the problems it causes down steam of reading a file. Setting the mass/implicit hydrogen count doesn't cause many/any? problems once the file is read. In fact I think you said in a previous patch the atom typer can use implicit hydrogen count from MDL (which aren't currently being read).

    With the case of adding a hydrogen to chiral centres in the read this can cause problems because the hydrogen has null coordinates etc. This then means the user has to do extra work to place the hydrogens and ensure the values are configured correctly. With this case - setting the isotope mass just means less work later on, is relatively safe and very useful.

     
  • you could take the stance to not set the mass number for anything

    No, mass number I am fine with. That, I think, is part of the specs... I meant the actual mass (2.0xxxx for D)... that should not be set, IMHO. (And if it must, then it must at least take the values from the IsotopeFactory, which takes the data from the Blue Obelisk Data Repository... and not have a hardcoded value in the reader)

    I've also got a patch which sets the implicit hydrogen count based on the valency
    from the MDL

    That is also specified in the specs, so that is supposed to be done, because that would be implicitly in the file format.

     
  • John May
    John May
    2012-12-02

    Okay I think see now... it just shouldn't set the absolute/exact mass. I think I got confused by the 'isotope.setMassNumber(1);' in your original response - was that meant to be atomic number?

     
  • John May
    John May
    2012-12-02

    Actually perhaps a better solution would be to make the Element(String) constructor lookup the atomic number in the PeriodicTable as the Atom(String) constructor does?

     
  • Yes, the atomic number is the number of protons, the mass number is the sum of neutrons an protons. That is OK to be set. But I have problems with the exact mass.

    Autosetting that atomic number has been considered in the past. The PeriodicTable, however, is a heavy beast, and in the past depended on the data module... However, maybe we can use Elements, extend that with all elements to have the atomic number set? That class nowadays has a local custom immutable IElement impl...

     
  • John May
    John May
    2012-12-02

    That's fine I thought you were saying the atomic number should be left as null :/. Basically just remove the exact mass attribute.

    Autosetting that atomic number has been considered in the past.

    The Atom constructor already does that but the isotope doesn't. Moving the logic to autoset the atomic number to the 'Element' constructor would then mean all instances would behave the same. As you say this could be using the element implementations in 'Elements' instead of the PeriodicTable.

    I'm surprised periodic table is that heavy - chemicalElements.xml is only 83Kb if it takes longer then a couple of ms to pass then there's probably something wrong with the parser.

     
  • John May
    John May
    2013-09-23

    Okay this is easy to fix - will do this tonight.

     
  • John May
    John May
    2013-09-23

    This was already applied - but there was a few tweaks needed.
    - isotope factory used instead of hard coded values
    - atomic number set
    - MDL Valence model should be applied after fixing the hydrogens (ensures non-null implicit hydrogen count).

    branch :patches/p592+603 - note did on same branch as another as the other one was tiny

     
  • John May
    John May
    2013-09-24

    • status: open --> closed