Re: [Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Paolo T. <pao...@gm...> - 2020-05-19 14:44:29
|
Hi Theo, the lack of match is due to different aromaticity flags on atoms and bonds in the larger molecule. This gist provides some explanation and a possible solution: https://gist.github.com/ptosco/e410e45278b94e8f047ff224193d7788 Cheers, p. On 19/05/2020 14:13, theozh wrote: > Dear RDKit-users, > > I would like to do a very simple substructure search. > The chapter 3.5 "Substructure Searching" in RDKit Documentation (2019.09.1) is pretty short and doesn't point to a solution. So far, I've learned that you can create your search pattern via Chem.MolFromSmiles() or Chem.MolFromSmarts(). > > In the below copy&paste minimal example, I want to use the first SMILES in the list as search pattern. I expect 2 matches but I get either 1 or 0 matches. So, I'm doing something wrong. What am I missing? > Is it about implicit/explicit aromatic and aliphatic bonds or some explicit/implicit hydrogen? > How to find the first structure in both SMILES? > > thank you for any hints, > Theo. > > ### simple substructure search (but doesn't find what is expected) > from rdkit import Chem > > smiles_strings = ''' > C12=CC=CN1NCCC2 > C12=CC=CC(C=C3)=C1N3NCC2 > ''' > smiles_list = smiles_strings.splitlines()[1:] > print(smiles_list) > > pattern = Chem.MolFromSmiles(smiles_list[0]) # MolFromSmiles > matches = [x for x in smiles_list if Chem.MolFromSmiles(x).HasSubstructMatch(pattern)] > print(len(matches)) # result: 1, why not 2? > > pattern = Chem.MolFromSmarts(smiles_list[0]) # MolFromSmarts > matches = [x for x in smiles_list if Chem.MolFromSmiles(x).HasSubstructMatch(pattern)] > print(len(matches)) # result: 0, why not 2? > ### end of code > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |