Re: [Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: theozh <th...@gm...> - 2020-05-19 16:01:30
|
Hi Paolo, thank you very much for your detailed answer. I tried to reproduce your last suggestion (but I don't have Jupyter Notebook). However, my bonds are still SINGLE and DOUBLE instead of UNSPECIFIED. Does this maybe depend on the RDKit Version, I have 2019.03... ? Maybe, I should update and need to investigate further. Theo. Am 19.05.2020 um 16:44 schrieb Paolo Tosco: > Hi Theo, > > the lack of match is due to different aromaticity flags on atoms and bonds in the larger molecule. > > This gist provides some explanation and a possible solution: > > https://gist.github.com/ptosco/e410e45278b94e8f047ff224193d7788 > > Cheers, > p. > > On 19/05/2020 14:13, theozh wrote: >> Dear RDKit-users, >> >> I would like to do a very simple substructure search. >> The chapter 3.5 "Substructure Searching" in RDKit Documentation (2019.09.1) is pretty short and doesn't point to a solution. So far, I've learned that you can create your search pattern via Chem.MolFromSmiles() or Chem.MolFromSmarts(). >> >> In the below copy&paste minimal example, I want to use the first SMILES in the list as search pattern. I expect 2 matches but I get either 1 or 0 matches. So, I'm doing something wrong. What am I missing? >> Is it about implicit/explicit aromatic and aliphatic bonds or some explicit/implicit hydrogen? >> How to find the first structure in both SMILES? >> >> thank you for any hints, >> Theo. >> >> ### simple substructure search (but doesn't find what is expected) >> from rdkit import Chem >> >> smiles_strings = ''' >> C12=CC=CN1NCCC2 >> C12=CC=CC(C=C3)=C1N3NCC2 >> ''' >> smiles_list = smiles_strings.splitlines()[1:] >> print(smiles_list) >> >> pattern = Chem.MolFromSmiles(smiles_list[0]) # MolFromSmiles >> matches = [x for x in smiles_list if Chem.MolFromSmiles(x).HasSubstructMatch(pattern)] >> print(len(matches)) # result: 1, why not 2? >> >> pattern = Chem.MolFromSmarts(smiles_list[0]) # MolFromSmarts >> matches = [x for x in smiles_list if Chem.MolFromSmiles(x).HasSubstructMatch(pattern)] >> print(len(matches)) # result: 0, why not 2? >> ### end of code >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |