Re: [Rdkit-discuss] Request for Assistance: Understanding InChI to Mol Conversion Issue in RDKit
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Jan H. J. <ja...@bi...> - 2023-12-13 07:22:04
|
You can also cross-check with standard InChI to see if this is an RDKit issue or a more general InChI issue. To convert InChI strings (and optionally AuxInfo) to SDF format with the standard inchi-1 executable, put the InChI string and AuxInfo into a text file and convert it like this. P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>*type test.txt* InChI=1/Ca.2H AuxInfo=1/0/N:1;2;3/rA:3Ca0H0H0/rB:;;/rC:;;; P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>*inchi-1.exe /InChI2Struct /OutputSDF test.txt* InChI version 1, Software v. 1.06 (inchi-1 executable) Windows 32-bit Build (MS VS 2015) of Dec 18 2020 20:45:14 Opened log file 'test.txt.log' Opened input file 'test.txt' Opened output file 'test.txt.txt' Opened problem file 'test.txt.prb' The command line used: "inchi-1.exe /InChI2Struct /OutputSDF test.txt" Converting InChI(s) to structure(s) in MOL format Output SDfile only without stereochemical information and atom coordinates Input format: InChI (plain identifier) Output format: SDfile only (without stereochemical info and atom coordinates) Timeout per structure: 60000 msec Up to 1024 atoms per structure Finished processing 1 structure: 0 errors, processing time 0:00:00.00 Elapsed walltime: 15 msec. P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>type test.txt.txt Structure #1. InChIV10 3 0 0 0 0 0 0 0 0 0 1 V2000 0.0000 0.0000 0.0000 Ca 0 0 0 0 15 0 0 0 0 0.0000 0.0000 0.0000 H 0 0 0 0 15 0 0 0 0 0.0000 0.0000 0.0000 H 0 0 0 0 15 0 0 0 0 M END $$$$ P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit> Cheers -- Jan On 2023-12-12 07:59, S Joshua Swamidass wrote: > Perhaps provide some examples were this failure happens. > > Sent from Gmail Mobile > > > On Tue, Nov 28, 2023 at 7:35 PM 李大舟 <lid...@sy...> wrote: > > Dear RDKit Developers and Maintainers, > > I hope this email finds you well. My name is Dr. Dazhou Li, and I > am a researcher working on the development of a tool for > extracting chemical compound structures recognized by OCR (Optical > Character Recognition) technology. I have been using the RDKit > library for a crucial step in this process, specifically the > rdkit.Chem.inchi.MolFromInchi() function, to convert InChI-format > strings into Mol format representations. > > Firstly, I would like to express my gratitude for the excellent > work you have done in developing and maintaining the RDKit > library, which has been an invaluable resource in my research. The > library has consistently delivered high-quality results in various > aspects of chemical informatics, and I appreciate your dedication > to its development. > > However, I have encountered a specific issue with the > rdkit.Chem.inchi.MolFromInchi() function that I hope you can help > me understand and resolve. When attempting to convert InChI-format > strings generated by my tool, some of them fail with an error > message reporting "NaN." Since the rdkit.Chem.inchi.MolFromInchi() > function calls C++ code, I am unable to directly inspect its > execution or source code to diagnose the issue. > > My primary request is for assistance in understanding the internal > workings of the rdkit.Chem.inchi.MolFromInchi() function, > specifically the checking process or generation step that leads to > the "NaN" error when certain InChI-format strings are processed. > It is crucial for my research to determine at which point in the > execution of this function my generated InChI-formatted strings > are considered unreasonable, as this information will help me > refine my tool's output to be compatible with RDKit. > > I understand that the RDKit library is a complex and comprehensive > toolkit, and I appreciate the complexity involved in diagnosing > such issues. However, any insights or guidance you can provide > regarding the problematic cases and the internal processes of the > rdkit.Chem.inchi.MolFromInchi() function would be immensely > valuable to me and would help me ensure the compatibility of my > tool with RDKit. > > If possible, I would be grateful for access to relevant > documentation or insights into the specific error conditions that > may lead to the "NaN" result. Additionally, any suggestions or > best practices for generating InChI-format strings that are more > likely to be successfully processed by RDKit would be greatly > appreciated. > > Thank you for your time and consideration. I look forward to your > response and hope that we can collaborate to resolve this issue > and enhance the compatibility of my tool with the RDKit library. > > Please feel free to reach out to me if you require any additional > information or if there are specific details about my tool or the > InChI-format strings that would aid in diagnosing the issue. > > Best regards, > > Dr. Dazhou Li > Shenyang University of Chemical Technology > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |