From: Yannick .D. <y.d...@gm...> - 2016-05-19 18:41:41
|
Hi John, Thanks a lot for the quick answer. I will be switching to the new tools. I tried both the SMARTSpattern and the VentoFoggia. Both are working for me. If I understood correctly, is the VentoFogia more suitable if I want to run substructure matching on a large scale? I also realized that some of my SMARTS patterns should be modified a bit. It is cumbersome when they are generated with one tool and tested/used with another. Thanks again. Best, Yannick On Thu, May 19, 2016 at 1:51 AM, John M <joh...@gm...> wrote: > Hi Yannick, > > This should be much similar now. First off, you're using some old APIs, > SQT still works but it's preferred now to go through 'Pattern'. The > SmartsPattern does all the setup needed, other implementations can be > faster and more customisable (see later) if you have many SMARTS against > one molecule but only recommended if needed. > The SMSD classes are some specific to SMSD so unless you need MCS don't > use them. > > I've attached the code below but if think the real problem here is the > really SMARTS don't match molecule using Daylight's aromaticity model. All > ring atoms there are aromatic and an explicit '=' in SMARTS doesn't match > an aromatic atom ('=,:' is the way to do that). > > You can try out SMARTS on CDKDepict: > http://cdkdepict-openchem.rhcloud.com/depict.html > >> COC1=C(O)C=C2OC=C(C(=O)C2=C1)C3=CC=C(O)C=C3 >> > [O;X1]=[#6;R1]-,:1-,:[#6;R1](=,:[#6;R1]-,:[#8]-,:c2ccccc-,:12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1 >> Correct SMARTS > > > Doubly confirmed with OpenBabel > >> >> *[sovereign ~/Downloads]: obgrep >> '[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-c2ccccc-12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1' >> glycitein.sdf [sovereign ~/Downloads]: obgrep >> '[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-[#6]-2=[#6]-[#6]=[#6]-[#6]=[#6]-1-2)-[#6;R1]-1=[#6;R1]-[#6;R1]=[#6;R1]-[#6;R1]=[#6;R1]-1' >> glycitein.sdf * > > > Here would be the normal code if SMARTS were changed. SmartsPattern does > aromaticity automatically. > > *IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();* >> >> *Pattern ptrn1 = >>> SmartsPattern.create("[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-c2ccccc-12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1", >>> null);* >> >> *Pattern ptrn2 = >>> SmartsPattern.create("[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-[#6]-2=[#6]-[#6]=[#6]-[#6]=[#6]-1-2)-[#6;R1]-1=[#6;R1]-[#6;R1]=[#6;R1]-[#6;R1]=[#6;R1]-1", >>> null);* >> >> >>> *try (MDLV2000Reader mrdr = new MDLV2000Reader(new >>> FileReader("/Users/john/Downloads/glycitein.sdf"))) {* >> >> * IAtomContainer mol;* >> >> * while ((mol = mrdr.read(bldr.newInstance(IAtomContainer.class, 0, 0, >>> 0, 0))) != null) {* >> >> * System.err.println("p1: " + ptrn1.matches(mol));* >> >> * System.err.println("p2: " + ptrn2.matches(mol));* >> >> * }* >> >> *}* >> >> > Here's the code where we use a different aromaticity model. This is lower > level hence some more setup is needed. > > >> >> >> >> >> >> >> >> >> >> >> >> >> *IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();Pattern >> ptrn1 = >> VentoFoggia.findSubstructure(SMARTSParser.parse("[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-c2ccccc-12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1", >> null));Pattern ptrn2 = >> VentoFoggia.findSubstructure(SMARTSParser.parse("[O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-[#6]-2=[#6]-[#6]=[#6]-[#6]=[#6]-1-2)-[#6;R1]-1=[#6;R1]-[#6;R1]=[#6;R1]-[#6;R1]=[#6;R1]-1", >> null));Aromaticity arom = new Aromaticity(ElectronDonation.piBonds(), >> Cycles.all(6));try (MDLV2000Reader mrdr = new >> MDLV2000Reader(new FileReader("/Users/john/Downloads/glycitein.sdf"))) { >> IAtomContainer mol; while ((mol = >> mrdr.read(bldr.newInstance(IAtomContainer.class, 0, 0, 0, 0))) != null) { >> arom.apply(mol); SmartsMatchers.prepare(mol, true); >> System.err.println("p1: " + ptrn1.matches(mol)); >> System.err.println("p2: " + ptrn2.matches(mol)); }}* > > > > Regards, > John W May > joh...@gm... > > On 19 May 2016 at 06:55, Yannick .Djoumbou <y.d...@gm...> wrote: > >> Hi all, >> >> I am having some issues with the CDK library. >> >> I have the molecule "glycitein" in the attached file (glycitein.sdf). I >> am running the SMARTSQueryTool to perform structure search. The SMARTS >> patterns are the following: >> >> >> P1: [O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-c2ccccc-12)-[c;R1]1[c;R1][c;R1][c;R1][c;R1][c;R1]1 >> >> >> P2: [O;X1]=[#6;R1]-1-[#6;R1](=[#6;R1]-[#8]-[#6]-2=[#6]-[#6]=[#6]-[#6]=[#6]-1-2)-[#6;R1]-1=[#6;R1]-[#6;R1]=[#6;R1]-[#6;R1]=[#6;R1]-1 >> >> For each of those, the query tool returns false, which is >> really surprising. I imagine it still has to do with the Aromaticity >> detection or a related issue. I have tried many things and it seems that >> they do not always work as they should. >> >> 1) I therefore preprocessed the molecule using the code below (from a >> previous chat I had on a forum): >> >> SMSDNormalizer.percieveAtomTypesAndConfigureAtoms(molecule); >> >> CDKHydrogenAdder.getInstance(molecule.getBuilder()) >> >> .addImplicitHydrogens(molecule); >> >> for (IBond bond : molecule.bonds()) { >> >> if (bond.getFlag(CDKConstants.SINGLE_OR_DOUBLE)) { >> >> bond.setFlag(CDKConstants.ISAROMATIC, true); >> >> bond.getAtom(0).setFlag(CDKConstants.ISAROMATIC, true); >> >> bond.getAtom(1).setFlag(CDKConstants.ISAROMATIC, true); >> >> >> } >> >> } >> >> >> SMSDNormalizer.aromatizeMolecule(molecule); >> >> >> I attached the resulting structure in SDF format as returned by CDK >> ((glycitein_processed.sdf)), which in most editors is shown as in the >> attached picture. It seems that all the aromatic bonds (marked as 4) in the >> SDF are perceived as single bonds. >> >> Therefore, the result of the structure search is still "FALSE". >> >> By the way, trying a combination of AtomContainerManipulator (to >> perceive atom types) and Aromaticity >> <http://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/aromaticity/Aromaticity.html> >> did not help either >> >> >> >> 2) Instead of aromatizing, I removed the SMSDNormalizer lines, and added >> the following: >> >> AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(molecule >> ); >> >> Kekulization.kekulize(molecule); >> >> >> The SDF of the resulting molecule is the same. The result also. >> >> >> How can I process these molecules efficiently? >> >> >> I am writing a function that will take SDF files, and run >> the SMARTSQueryTool to match certain patterns. Therefore, I need an >> efficient way to preprocess these molecules. >> >> >> Can someone help me out here? >> >> >> Thank you in advance. >> >> >> Best, >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Mobile security can be enabling, not merely restricting. Employees who >> bring their own devices (BYOD) to work are irked by the imposition of MDM >> restrictions. Mobile Device Manager Plus allows you to control only the >> apps on BYO-devices by containerizing them, leaving personal data >> untouched! >> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j >> _______________________________________________ >> Cdk-user mailing list >> Cdk...@li... >> https://lists.sourceforge.net/lists/listinfo/cdk-user >> >> > |