Currently the MDL readers parse atom symbols D (Deuterium) and T (Tritium) into Pseudo atoms with label "D" or "T".
This patch fixes that, D and T resulting in heavy hydrogens with mass 2 or 3.
Mark, how does that relate to the 'interpretHydrogenIsotopes' IO settings?
The MDLV2000Reader converts the pseudo atoms into real atoms when that parameter is set, around lines 842-868...
hi Egon, sorry I missed that IO setting. It seems strange though that interpretHydrogenIsotopes is true by default, but I do get Pseudo atoms anyway. I will have a further look, I will alter/ditch this patch.
Attached a new patch, reworked to use the IO setting. In my case the setting did not work because Chembl puts "D" and "T" symbols in their molfiles without a corresponding "M ISO" line. I added two of these Chembl molfiles to the patch and unit tests.
The method fixHydrogenIsotopes is now a bit more lenient and sets mass number itself for D and T using an IsoptopeFactory.
Mark, you should use the IChemObjectBuilder pattern, instead of instantiating a particular interface implementations directly. Also, the assert() pattern is first the expected value, then the tested value. Please review the two patches attached.
Also, in the two news tests, you set the IO property for one, not the other... why is that?
Based on top of 0001-MDLV2000-reader-interprets-D-and-T-without-M-ISO-lin.patch.
applied and pushed
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.