Menu

#697 Hydrogen erroneously added when converting to InChI

open
nobody
5
2012-10-23
2011-01-13
Daniel Lowe
No

When doing some round tripping testing in OpenBabel 2.3.0 I came across the following cases where hydrogen was erroneously added to atoms with explicitly no hydrogen in the conversion from SMILES to InChI

Group 1/2 metals and In/Sn/Sb/Tl/Pb/Bi/Po/At.

e.g. [Li] -->InChI=1S/Li.H instead of InChI=1S/Li
[Bi] --> InChI=1S/Bi.3H instead of InChI=1S/Bi

Possibly relatedly [BiH4] gives InChI=1S/Bi.5H instead of InChI=1S/Bi.4H

Discussion

  • Daniel Lowe

    Daniel Lowe - 2011-01-13

    I(=O)=O also ends up with an extra hydrogen on conversion to InChI

     
  • Daniel Lowe

    Daniel Lowe - 2011-01-13

    S(C)(C)C exhibits a fault in SMILES reading, the S is interpreted as having 0 rather than 1 hydrogen.

     
  • Daniel Lowe

    Daniel Lowe - 2011-01-13

    [SeH-] seems to erroneously pick up H+ when converted to InChI

     
  • Daniel Lowe

    Daniel Lowe - 2011-01-13

    [Se]=P([O-])([O-])[O-] produces InChI=1S/O3PSe/c1-4(2,3)5/q-3 but should produce InChI=1S/H3O3PSe/c1-4(2,3)5/h(H3,1,2,3,5)/p-3
    Not too sure what's gone wrong in that case. There were a few other similar cases.

     
  • Noel O'Boyle

    Noel O'Boyle - 2011-01-13

    At least some of these problems don't exist with the current dev version. Can you try these with the dev version?

     
  • Noel O'Boyle

    Noel O'Boyle - 2011-01-13

    BTW, how exactly are you finding these problem cases? If we could monitor our performance on the dataset with time we could ensure we don't cause regressions....

     
  • Daniel Lowe

    Daniel Lowe - 2011-01-13

    I will give it a try on the latest version and report back. I suppose really I should have a bash at fixing the problems rather than giving you work :-P

    The set I was using was based on the ChEBI database as of about 9 months ago. What I was actually hoping to round trip was not OpenBabel, but a SMILES writer I have just written. The hope being to convert from SMILES to InChI and check that its output is equivalent to the InChI output from my program. I did indeed find a few corner cases my SMILES writer failed which I've now fixed but I haven't gone through all the discrepancies....mainly as most of them were coming from OpenBabel :-P

    Such a set could be created directly from ChEBI although I have a feeling that for some records the InChI and SMILES don't quite correspond to each other e.g. differences like connected vs disconnected [H+] on acids, which may give false failures.

     
  • Daniel Lowe

    Daniel Lowe - 2011-01-13

    The problems with atoms having explicitly no hydrogen has been fixed.
    The following issues remain:
    [BiH4] is interpreted as [BiH5]. The reason is on line 344 of inchiformat.cpp InChI is being allowed to implicitly add hydrogens.

    The [Se]=P([O-])([O-])[O-] , [SeH-] and S(C)(C)C problems remain.

     
  • Noel O'Boyle

    Noel O'Boyle - 2012-09-04

    [BiH4] nows gives InChI=1S/Bi.4H
    [SeH-] can be roundtripped through InChI

    This leaves only [Se]=P([O-])([O-])[O-] (see below).
    The S(C)(C)C I will file as a separate bug as it has nothing to do with all of this.