[Rdkit-discuss] Replace matched patterns with wildcards
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Marawan H. <mar...@ya...> - 2023-12-21 17:55:55
|
Hi, I am trying to use rdkit to replace matched SMARTS patterns in a molecule with a wildcard (*), and return a SMARTS string where the original molecule is an instance of this returned SMARTS string, I tried the following:########from rdkit import Chem def generate_modified_smarts(smiles, smarts_patterns, num_patterns_to_replace): molecule = Chem.MolFromSmiles(smiles) patterns_replaced = 0 for smarts in smarts_patterns: if patterns_replaced >= num_patterns_to_replace: break pattern = Chem.MolFromSmarts(smarts) while molecule.HasSubstructMatch(pattern) and patterns_replaced < num_patterns_to_replace: match_indices = molecule.GetSubstructMatch(pattern) # Extract segments before and after the match before_match, after_match = "", "" if match_indices[0] > 0: before_match = Chem.MolFragmentToSmarts(molecule, atomsToUse=list(range(match_indices[0]))) if match_indices[-1] < molecule.GetNumAtoms() - 1: after_match = Chem.MolFragmentToSmarts(molecule, atomsToUse=list(range(match_indices[-1] + 1, molecule.GetNumAtoms()))) # Combine parts with a wildcard modified_smarts = before_match + '*' + after_match molecule = Chem.MolFromSmarts(modified_smarts) patterns_replaced += 1 return Chem.MolToSmarts(molecule) example_smiles = "CCOC1=C(C=C2C(=C1)N=CC(=C2NC3=CC(=C(C=C3)OCC4=CC=CC=N4)Cl)C#N)NC(=O)C=CCN(C)C"smarts_patterns = ["C=O", "C#N"]num_patterns_to_replace = 2 modified_smarts = generate_modified_smarts(example_smiles, smarts_patterns, num_patterns_to_replace)print(f"Modified molecule SMARTS pattern: {modified_smarts}")####### While it seems to work for C=O, it does not for C#N and the connectivity is messed up for C#N, even if I use it alone, i.e. without the carbonyl. The matched patterns could be anywhere in the molecule and could be more complex than this, but I just tried some simple cases to see how robust is this approach. It worked for "CCO", but did not work when i tried "Cl". I am wondering if this is something you can help with, Marawan |