Menu

StructureIDs

The assignment of structure ids is always a tricky issue in structure databases and is closely related to the even more tricky issue of structure identity.

The policy in nmrshiftdb2 is not to change structure IDs (structure IDs are shown in the headers of the details display and they can be entered on the front page and can be used for external linking). If somebody edits the structure of a spectrum (you can't change a structure as a such, it's always bound to a spectrum), the new structure will be created with a new id if not yet existing and the reference of the spectrum is changed to the new structure. The old one will stay (if it has no spectrum left, it won't be found in the interface because search is again spectrum bound, but if a spectrum is later submitted for the old structure, the id will "reappear"). So in theory IDs should not disappear, in practice it can happen, if a structure turns out to be duplicate and is merged.

There the issue of structure identification kicks in. This is a problem not completely solved and and an extensive solution is out of the scope of nmrshiftdb2. According to my experience, there are some issues here making things difficult:

  • Tautomers: Probably possible to solve
  • Stereoisomers==wedge bonds: Tricky, but in theory possible
  • E/Z configurations: Impossible, according to my experience, since it is unclear what is meant in drawings. With respect to wedge-stereochemistry, there is an undefined drawing (the one without wedges). There are many issues left, but at least if people draw wedges, they mean something. With E/Z, if people draw atoms around a double bond in certain positions it may be by accident, so you do not even know if they mean something, let alone what (acoording to IUPAC, there is an "undefined" for E/Z, namely the wiggly bond, but this is rarely used).

Because of all of this, the policy is to generate Inchi and stereo smiles (via CDK) when something is submitted. Only if both match, the structure will be considered a duplicate. If manual inspection later yields duplicates, they may be merged. This may give duplicates, but we have decided this is better than merges of structures people don't want to see. Undefined stereochemistry counts as a different structure in nmrshiftdb2, so we might have e. g. Cis-2-butene, Trans-2-butene and "undefined"-2-butene with 3 IDs. There are no hierarchies or so, for that integrating an ontology or so would be needed.


Related

Wiki: StructureDatabase
Wiki: TechnicalTopics

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.