Re: [Rdkit-discuss] Doing substructure search as quickly as possible...
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Maciek W. <ma...@wo...> - 2020-02-10 19:39:32
|
Alexis, I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is the function you are looking for here. More advanced usage and code snippets you can find on RDKit blog post that Greg has put together here: https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html Best, Maciek ---- Pozdrawiam, | Best regards, Maciek Wójcikowski ma...@wo... pon., 10 lut 2020 o 16:10 Alexis Parenty <ale...@gm...> napisał(a): > Dear Rdkiters, > > I am interested in doing substructure searches between many thousands > structures and many thousands of fragments, as quickly as possible, with > reasonable accuracy (> 0.95)... > > I did read Greg's excellent post on that subject: > > > http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html > > I was using the rdkit pattern fingerprint approach to filter out any > fragments that have no chance of matching the bigger structure through the > slow and more accurate molecular graph approach, saving a lot of time. > > However, I realized that this rdkit pattern fingerprint approach only > works well if we compared smiles with smiles: > > > > def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles): > pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag)) > pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles)) > > frag_bits = set(pfp_frag.GetOnBits()) > structure_bits = set(pfp_structure.GetOnBits()) > > if frag_bits.issubset(structure_bits): > return True > else: > return False > > > > Unfortunately, some of my fragments are Smarts that are not valid Smiles: > Using Chem.MolFromSmarts(smarts) gives really poor result (Many False > Positives leading to poor Specificity). Interestingly, there is no False > Negative, leading to a Sensitivity of 1! > > > > def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles): > pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag)) > pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles)) > > frag_bits = set(pfp_frag.GetOnBits()) > structure_bits = set(pfp_structure.GetOnBits()) > > if frag_bits.issubset(structure_bits): > return True > else: > return False > > > > Is there a way to use pattern fingerprint (or other method) for > substructure searches independently of the Smiles/Smarts format of the > fragments? If not, is mol_struct.HasSubstructMatch(mol_frag) the only way I > am left with? > > Many thanks, > > Alexis > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |