Re: [Rdkit-discuss] Failed Expression: pick >= 0
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Andrew D. <da...@da...> - 2012-05-02 11:35:05
|
Hi George, > This is probably not going to solve the problem at hand but it may be useful to you or others in the future: > ChEMBLdb maintains a molecular hierarchy table where you can retrieve the parent (=desalted - using Pipeline Pilot) structures for each molecular entity. > You may try something like this: > > select distinct cs.molregno, cs.molfile, cs.canonical_smiles > from compound_structures cs, molecule_hierarchy mh > where cs.molregno = mh.parent_molregno I confess pure ignorance here. While I've worked with databases, it's far from the list of things I know well. Reading the ERD is not simple for me, I don't have MySQL or Oracle installed on my machines, and I don't know how to browse through the schema and tables like I've seen those who are more database proficient than I do. So while I have an idea of what you are talking about, it's not something I can easily put into place. But as you say, it's not the problem, because RDKit's failure exception comes even using the original, unprocessed/un-de-salted record. Since you're here -- how come ChEMBL doesn't put an identifier on the first line of the SD record? Nearly all of them are blank; the exceptions are a dozen with mostly useless titles like: Acetic acid 6-(1-phenyl-ethyl)-6-aza-bicyclo[3.2.1]oct-3-yl ester 4-(4-Fluoro-phenyl)-2-methylsulfanyl-thiophene-3-carbonitrile 6-amino-9-(5-{[(1,2,3,3-tetrahydroxy-1,2,3-trioxidotriphosphanyl)oxy]methyl}tetr 2-Methyl-2,3-dihydro-benzofuran-7-carboxylic acid 8-methyl-8-aza-bicyclo[3.2.1]o (S)-N-((S)-1,6-diamino-1-oxohexan-2-yl)-1-((S)-5-guanidino-2-((2S,3S)-2-((S)-5-g Acetic acid 6-(1-phenyl-ethyl)-6-aza-bicyclo[3.2.1]oct-3-yl ester I end up doing a mol.SetProp("_Name", mol.GetProp("chembl_id")) so that my output SMILES have an identifier tied to them, and that seems like a needless extra step. Andrew da...@da... |