From: D. T. <dtc...@ni...> - 2005-07-29 15:56:41
|
At 02:28 AM 7/29/2005, Reinhard Dunkel wrote: >Greetings, > >I would like to restrict the InChI generated descriptions. The InChI >string creation options allow some control over which description layers >to include. The rI.sed and rSP3.sed InChI-1.zip scripts allow me to remove >the isotope and sp3 (while keeping the sp2) stereochemistry. -DT: You should be careful about sp3 (/t/m/s) segments removal. The remaining sp2 (/b) layer may be different from those of a stereoisomer (here I mean stereoisomer, not a geometric(al) isomer). This happens, for example, when two fragments connected by single bonds to atom A in a structure >A=B< are (a) mirror images of each other or (b) identical. Consider the smallest example I could create, 2-ethylidene-1,3-dithietane 1,3-dioxide (naming by ACD/ChemSketch): O || S CH3 / \ / CH2 C=CH \ / S || O SMILES: O=S1CS(=O)C1=CC InChI treats atoms S in this structure as possibly stereogenic. Here I would like to discuss 3 cases. 1. If both atoms O are above the plane of the double bond >C=C< or both O are below the plane, which are two (R,S)-configurations, then these two enantiomers have (1a) InChI=1/C4H6O2S2/c1-2-4-7(5)3-8(4)6/h2H,3H2,1H3/b4-2-/t7-,8+/m1/s1 (1b) InChI=1/C4H6O2S2/c1-2-4-7(5)3-8(4)6/h2H,3H2,1H3/b4-2-/t7-,8+/m0/s1 The rotation of CH3 around the >C=C< double bond's axis by 180 degrees converts (1a) into (1b) and back. 2. If one O is above the plane and the other is below the plane of the double bond >C=C< which may be described as two (R,R)-configurations, then the rotation of CH3 around this double bond's axis by 180 degrees does not change the configuration of structure. Therefore, the double bond is not stereogenic. As the result, /b segment is not present, and the two enantiomers have (2a) InChI=1/C4H6O2S2/c1-2-4-7(5)3-8(4)6/h2H,3H2,1H3/t7-,8-/m1/s1 (2b) InChI=1/C4H6O2S2/c1-2-4-7(5)3-8(4)6/h2H,3H2,1H3/t7-,8-/m0/s1 3. If all tetrahedral (sp3) stereo is not marked (flat 2D drawing) then (3) InChI=1/C4H6O2S2/c1-2-4-7(5)3-8(4)6/h2H,3H2,1H3 and no stereo is present. Conclusions. Applying rSP3.sed to (1a) or (1b) will produce same result; this result is different from applying rSP3.sed to (2a) or (2b) although these four structures differ only in tetrahedral stereo. Therefore, in general case, rSP3.sed does not completely eliminate all ramifications of tetrahedral (sp3) stereo. One more thing I would like to emphasize: stripping /t/m/s segments using rSP3.sed does NOT necessarily create a valid InChI (I assume that "valid" InChI means there exists such a structure that would produce this InChI). For example, there is no structure for InChI=1/C4H6O2S2/c1-2-4-7(5)3-8(4)6/h2H,3H2,1H3/b4-2- (this InChI was obtained by applying rSP3.sed to (1a) or (1b). > My remaining problems are: > >How can I remove the radical information (PubChem CID 138083 should give >the same InChI string as 3776)? -DT: One may argue the compounds are different and behave in chemically different ways. The alcohol may be considered quite different from the unstable radical known in chemical kinetics. In principle, you may try to compare 1) chemical formulas (C3H8O vs C3H7O) and find that the difference is one H 2) connection table (/c1-3(2)4) and find they are identical 3) /h segments (/h3-4H,1-2H3 vs. /h3H,1-2H3) and find that the difference is in one H on atom 4 only It is up to you what conclusion may be drawn from these facts. >How can I get an unknown cis/trans configuration around a double bond >ignored (CID 7644 should give the same output as 637520)? (When the >cis/trans information of a double bond is not specified, it should be >derived from the MOL file atom coordinates.) -DT: The Molfile conventions allow to mark double bond stereo as "unknown". When the stereo of all double bonds is unknown and/or undefined, then by default the /b segment is not included in InChI. This is what happened to PUBCHEM_COMPOUND_CID 7644 You may force /b to be included with option /SUU. When the double bond bond is not marked as "unknown" in the Molfile the stereo is derived from the coordinates if possible. This is what happened to PUBCHEM_COMPOUND_CID 637520. If you want to exclude all stereo from InChI: use option /SNON or use the appropriate sed script (rS.sed) to remove all stereo. InChI does not have an option to assign a "known" double stereo according to the given in a Molfile coordinates of the atoms when the double bond has "unknown stereo" marking. >Are there ways to not create such InChI information? -DT: /SNON suppresses all stereo > Are there scripts to delete such information from generated InChI strings? -DT: rS.sed removes all stereo from InChI > Or would modifying the affected PubChem MOL files be likely easier than > modifying their resulting InChI strings? -DT: To remove radical or "unknown stereo" you need to edit the source. In this way you may uniformly mark all possibly stereogenic atoms as having "Either" stereo. You may also feed InChI through its API. In this case you may make more changes, effectively creating your own pre-normalization. However, in this case I would advise you to call the obtained in this way Identifiers by any name you want but InChI. Regards Dmitrii >ThanX, >Reinhard > > >------------------------------------------------------- >SF.Net email is Sponsored by the Better Software Conference & EXPO September >19-22, 2005 * San Francisco, CA * Development Lifecycle Practices >Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA >Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf >_______________________________________________ >InChI-discuss mailing list >InC...@li... >https://lists.sourceforge.net/lists/listinfo/inchi-discuss |