Re: [Rdkit-devel] PostgreSQL cartridge: Function to check whether a SMILES string is valid or not i
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2010-11-02 06:44:35
|
Dear Adrian, Quick one: the svn version of the cartridge code now has the function is_valid_smiles() On Fri, Oct 29, 2010 at 5:08 PM, Adrian Schreyer <ma...@ad...> wrote: > > Creating the mol types is actually not that slow I thought, compared > to creating the indexes (based on the latest version of chembl). The index generation is certainly slower than loading the molecules. > Simplifying the database creation is definitely worth it, and those > costly operations are done only once. Another suggestion I had was to > refactor the naming of the functions in the cartridge to make them > more similar to the underlying functions in the Python/C++ libraries, > e.g. mol_in ~ mol_from_smiles / mol_out ~ mol_to_smiles. I believe that the current naming scheme is more consistent with the way postgresql does things. That's not an automatic "it has to be done that way", but I think it is an argument for maintaining the current naming. > > Given the context, maybe it is possible to reduce the number of > instances where a smiles string cannot be parsed. From what I have > seen so far, this happens in three cases: the smiles string contains a > non-daylight extension (not rdkit's fault), exotic inorganic chemistry > (negligible) or the valence system in rdkit is violated. The last case > is something where I often encounter problems, for example bromic acid > > Chem.MolFromSmiles('OBr(=O)=O') > [15:21:29] Explicit valence for atom # 1 Br, 5, is greater than permitted > > Is there a definition for the valence model used in rdkit somewhere in > the source tree? I assume the valence model is nothing that can be > manually changed by the user without breaking other aspects of the > software. If you want to allow additional valence states for atoms, you can edit $RDBASE/Code/GraphMol/atomic_data.cpp, but that's about the only way to do changes at the moment. The topic of supporting other valences, like 5 for Br, has come up in the past. If we can identify a limited number of additional valences that are required, I'd be willing to add them to the standard RDKit parameters.[1] I'm happy to hear suggestions about what those additional valences should be. We should probably do this in a separate thread on the -discuss list. Best Regards, -greg [1] Within reason... I will balk at changing anything in the first row of the periodic table and will probably resist changing anything in the second row other than chlorine (to allow chlorate anions). |