[Rdkit-devel] PostgreSQL cartridge: Function to check whether a SMILES string is valid or not in RD
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Adrian S. <am...@ca...> - 2010-10-28 09:34:13
|
Hi Greg, I moved to PostgreSQL 9.0 recently and installed the RDKit cartridge which was easy to compile and install. It is also one of the fastest cartridges I have used so far, definitely a great extension of RDKit! At the moment I am trying to create rdkit molecules and fingerprints for the latest version of the ChEMBL database, there is however a problem which can be solved easily. Since I already had the ChEMBL database (including the SMILES strings), I added tables to hold the mol and fp types for every compound and tried to populate it through "insert into... select mol_in(ism::cstring)..." which works nicely unless it finds a SMILES string it cannot parse such as this "CCCC1234B567B89%10B%11%12%13B8%14%15B%11%16%17B%12%18%19B59%13B16%18C2%16%19(c%20ccc(Oc%21ccc(cc%21)C(=O)O)cc%20)B3%14%17B47%10%15" (ChEBI: http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A105310). The query will fail then, and the only way to populate these table is to download the SMILES and filter them manually in Python. Is it possible to have a boolean function (is_valid(cstring)?) in the cartridge that simply checks if a SMILES can be parsed with RDKit or not? This would make it possible to add this check to a where clause in a query and as a result make the creation of mol types much easier. Cheers Adrian |