You can subscribe to this list here.
2002 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
(3) |
Oct
(1) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(1) |
Feb
|
Mar
(5) |
Apr
(2) |
May
(5) |
Jun
|
Jul
(10) |
Aug
(4) |
Sep
(2) |
Oct
(13) |
Nov
(7) |
Dec
(6) |
2004 |
Jan
(16) |
Feb
(23) |
Mar
(14) |
Apr
(19) |
May
(18) |
Jun
(7) |
Jul
(8) |
Aug
(15) |
Sep
(8) |
Oct
(22) |
Nov
(13) |
Dec
(73) |
2005 |
Jan
(19) |
Feb
(21) |
Mar
(32) |
Apr
(64) |
May
(64) |
Jun
(43) |
Jul
(15) |
Aug
(13) |
Sep
(14) |
Oct
(10) |
Nov
(34) |
Dec
(50) |
2006 |
Jan
(25) |
Feb
(26) |
Mar
(30) |
Apr
(34) |
May
(5) |
Jun
(12) |
Jul
(48) |
Aug
(21) |
Sep
(10) |
Oct
(16) |
Nov
(28) |
Dec
(15) |
2007 |
Jan
(43) |
Feb
(19) |
Mar
(41) |
Apr
(22) |
May
(45) |
Jun
(27) |
Jul
(46) |
Aug
(49) |
Sep
(80) |
Oct
(22) |
Nov
(27) |
Dec
(30) |
2008 |
Jan
(19) |
Feb
(45) |
Mar
(7) |
Apr
(49) |
May
(57) |
Jun
(35) |
Jul
(30) |
Aug
(26) |
Sep
(8) |
Oct
(23) |
Nov
(33) |
Dec
(8) |
2009 |
Jan
(9) |
Feb
(32) |
Mar
(32) |
Apr
(47) |
May
(69) |
Jun
(21) |
Jul
(40) |
Aug
(19) |
Sep
(61) |
Oct
(17) |
Nov
(49) |
Dec
(16) |
2010 |
Jan
(25) |
Feb
(29) |
Mar
(33) |
Apr
(38) |
May
(21) |
Jun
(33) |
Jul
(47) |
Aug
(27) |
Sep
(58) |
Oct
(55) |
Nov
(20) |
Dec
(45) |
2011 |
Jan
(18) |
Feb
(35) |
Mar
(44) |
Apr
(28) |
May
(12) |
Jun
(26) |
Jul
(61) |
Aug
(39) |
Sep
(28) |
Oct
(46) |
Nov
(53) |
Dec
(31) |
2012 |
Jan
(8) |
Feb
(29) |
Mar
(50) |
Apr
(23) |
May
(22) |
Jun
(8) |
Jul
(10) |
Aug
(4) |
Sep
(7) |
Oct
(13) |
Nov
(25) |
Dec
(14) |
2013 |
Jan
(5) |
Feb
|
Mar
(10) |
Apr
(10) |
May
(21) |
Jun
(16) |
Jul
(12) |
Aug
(20) |
Sep
(39) |
Oct
(43) |
Nov
(23) |
Dec
(10) |
2014 |
Jan
(7) |
Feb
(48) |
Mar
(28) |
Apr
(14) |
May
(7) |
Jun
(4) |
Jul
(7) |
Aug
(25) |
Sep
(26) |
Oct
(16) |
Nov
(33) |
Dec
(22) |
2015 |
Jan
(17) |
Feb
(7) |
Mar
(7) |
Apr
(14) |
May
(20) |
Jun
(4) |
Jul
(20) |
Aug
(11) |
Sep
(11) |
Oct
(15) |
Nov
(27) |
Dec
(5) |
2016 |
Jan
(30) |
Feb
(33) |
Mar
(11) |
Apr
(17) |
May
(25) |
Jun
(19) |
Jul
|
Aug
(30) |
Sep
(8) |
Oct
(8) |
Nov
(14) |
Dec
(8) |
2017 |
Jan
(29) |
Feb
(14) |
Mar
(9) |
Apr
(22) |
May
(9) |
Jun
(24) |
Jul
(20) |
Aug
(9) |
Sep
(3) |
Oct
(8) |
Nov
|
Dec
(5) |
2018 |
Jan
(11) |
Feb
(6) |
Mar
(5) |
Apr
|
May
(7) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(21) |
Oct
(18) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
(9) |
Mar
|
Apr
(4) |
May
(3) |
Jun
|
Jul
(11) |
Aug
(22) |
Sep
(3) |
Oct
|
Nov
(14) |
Dec
(3) |
2020 |
Jan
(1) |
Feb
(14) |
Mar
|
Apr
(4) |
May
|
Jun
(4) |
Jul
(3) |
Aug
(3) |
Sep
(3) |
Oct
(1) |
Nov
(3) |
Dec
(15) |
2021 |
Jan
(11) |
Feb
|
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
(2) |
Oct
(6) |
Nov
(11) |
Dec
|
2022 |
Jan
(6) |
Feb
(2) |
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
(12) |
Aug
(9) |
Sep
(15) |
Oct
(9) |
Nov
(11) |
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
(4) |
May
(4) |
Jun
|
Jul
(9) |
Aug
(4) |
Sep
(12) |
Oct
(3) |
Nov
(3) |
Dec
|
2024 |
Jan
(25) |
Feb
(10) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(4) |
Oct
|
Nov
(5) |
Dec
(1) |
2025 |
Jan
(6) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(6) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: John M. <joh...@gm...> - 2025-07-09 09:23:26
|
Hi Uli, For computing the maximum common substructure there is UniversalIsomorphismTester or Small Molecule Subgraph Detector SMSD (in cdk-legacy). There is a newer version of SMSD in a separate repo ( https://github.com/asad/SMSD) which was a rewrite, we were going to integrate this in but there were too many API changes and differences (test failures) to make sense. Edmund Duesbury also wrote one during his PhD which I think does pretty well and has some nice properties/tunability - https://pubmed.ncbi.nlm.nih.gov/25602464/ - I think there is code in Knime to use. Personally I try to avoid using MCS as there is usually a better way to solve the problem which is also one of the reasons I've not written one. What reason do you need MCS for - Reaction Atom Mapping? Thanks, John On Mon, 7 Jul 2025 at 16:37, Uli Fechner <ul...@pe...> wrote: > Hi, > > I am looking for a method to find the maximum common subgraph of two > IAtomContainer. For each solution, I would like to get the mappings of > matched atoms for both input structures. > > The UniversalIsomorphismTester looked like an obvious choice. There is a > method to find the maximum common subgraph (getOverlaps), but it does not > look obvious to me how to then get the atom mappings. > > Looking through the code of UniversalIsomorphismTester, it also does not > inspire a lot of confidence (e.g., usage of clone(), the javadoc comment > 'this implementation of the algorithm has not been optimized for speed' in > RGraph) so I am wondering if I am looking in the right place. > > Any pointer in the right direction is much appreciated. > > Best > Uli > _______________________________________________ > Cdk-user mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-user > |
From: Uli F. <ul...@pe...> - 2025-07-07 15:37:00
|
Hi, I am looking for a method to find the maximum common subgraph of two IAtomContainer. For each solution, I would like to get the mappings of matched atoms for both input structures. The UniversalIsomorphismTester looked like an obvious choice. There is a method to find the maximum common subgraph (getOverlaps), but it does not look obvious to me how to then get the atom mappings. Looking through the code of UniversalIsomorphismTester, it also does not inspire a lot of confidence (e.g., usage of clone(), the javadoc comment 'this implementation of the algorithm has not been optimized for speed' in RGraph) so I am wondering if I am looking in the right place. Any pointer in the right direction is much appreciated. Best Uli |
From: Andrew D. <da...@da...> - 2025-06-25 14:05:10
|
Hi John, > On Jun 25, 2025, at 10:15, John Mayfield <joh...@gm...> wrote: > Even if you don't add any listeners there is an overhead of dispatching the edit events so it is better to avoid this. I will, to use language I learned from WWII submarine fiction, rig for silent running. > Molecule Standard Form > > We (CDK) try to impose very little automation/sanitisation by default, rather than Daylight's dt_mod on/off and RDKit's sanitization it is more similar to OEChem in that the molecule comes out of the readers as they were described in the input. I can appreciate that. As I recall (it's been years since I looked at the OEChem docs), the OEChem docs listed the recommended set of operations for those using the low-level API. For example, my code used to do OEParseSmiles(mol, content, canon, strict) OEAssignAromaticFlags(mol, aromaticity_model) They later added a single function call variant: OEReadMolFromBytes(mol, oeformat, flavor, gzip, content) which handles the appropriate steps. This simplified my code as I don't need that flexibility. > We go a little further and don't even do ring perception (is in ring: true/false). Most common formats (SMILES/MOLfile/InChI/CML) will set the hydrogen counts for you but some older formats (PDB/XYZ) will not. Is there documentation for the needed steps? I want to make sure I support the primary formats correctly. As for the less common formats, when I added CDK support back in 2021 I tried to support the XYZ format, but ended up noting "I can't figure out how to read an XYZ file and assign the correct bond types (RebondTool only assigns single bonds and FixBondOrdersTool doesn't add them." I also noted "can't get mol2 to create a SMILES so only do basic tests". That said, I don't think mol2 or XYZ format support is all that useful. I haven't come anyone using either format for a long time. As I recall, Greg Landrum's viewpoint is that people should use Open Babel to convert to a more mainstream format. There are also readers I don't even touch, like Mopac7Reader or ShelXReader. :) > A Pattern for matching a single SMARTS query against multiple target compounds. The class can be used for efficiently matching many queries against a single target if setPrepare(boolean) is disabled (prepare(IAtomContainer)) should be called manually once for each molecule. Yes, now that I know what I'm looking at, I can see that getBitFingerprint() for both PubchemFingerprinter and MACCSFingerprinter call: SmartsPattern.prepare(container); If I follow the code correctly this means SMARTS-based fingerprinting always triggers aromaticity re-perception. For example, if I use the same molecule to generate both MACCS and Pubchem fingerprints then both will do: Cycles.markRingAtomsAndBonds(target); Aromaticity.apply(Aromaticity.Model.Daylight, target); even if input processing has already done this step. It also means that if input processing uses a different model, like Aromaticity.Model.Mdl (picking one available from that class), then I need to pass a copy to the fingerprinter if I don't want the assignments to possibly change. > If you have multiple patterns to match what you want to do is something like this: > > 0. patterns <- load SMARTS/prepare patterns, set prepare false > 1. Read Molecule (mol) > 2. Set ring flags > 3. Set aromaticity > 4. for pat in patterns: pat.match(mol) > > Steps 2/3 can be replaced with prepare, if you have pre-calculated and store aromaticity (e.g. in SMILES) then you can skip step 3 as the input aromaticity flags will be preserved. Because of the chemfp design, my input reader doesn't know if the created molecules will be used for fingerprinting or for format conversion, so I need to alway do 2 and 3. I also don't have a way to distinguish between the built-in CDK fingerprint types which always prepare, and my own fingerprint types which expect prepared molecules. I think this means, at least for chemfp, that I should always prepare the molecules as I read them, using the Daylight model, so that my own fingerprint types can assume the inputs are always properly prepared. > Sorry I meant if you knew the steps to reproduce/which aromaticity model did you use..? The standard Daylight model used by the SMARTS matcher would find the externeral porphyrin ring aromatic hence I'm not sure how you would get that unless you used a different aromaticity model (e.g. tighter ring set) before writing to SMILES. The problem is that I didn't use any explicit aromaticity perception. Here's my reproducible: ============= import jpype # Must install JPype to interface to the CDK jar import jpype.imports # configure the import hooks import jpype.nio jpype.startJVM(None, '-Djava.awt.headless=true') from org.openscience import cdk from org.openscience.cdk.smiles import ( SmilesParser, SmilesGenerator, SmiFlavor) smiles = ( "OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccccc2)" "=c2/cc/c(n21)=C(\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1 CHEMBL2369103") _default_builder = cdk.DefaultChemObjectBuilder.getInstance() smiles_parser = SmilesParser(_default_builder) mol = smiles_parser.parseSmiles(smiles) if 0: # Missing perception from org.openscience.cdk.graph import Cycles from org.openscience.cdk.aromaticity import Aromaticity Cycles.markRingAtomsAndBonds(mol) Aromaticity.apply(Aromaticity.Model.Daylight, mol) for flavor_name, flavor in ( ("Default", SmiFlavor.Default), ("Default|UseAromaticSymbols", SmiFlavor.Default | SmiFlavor.UseAromaticSymbols), ): smiles_generator = SmilesGenerator(flavor) out_smiles = str(smiles_generator.create(mol)) print(f"-- {flavor_name}:") print(out_smiles) print() ============= The above prints -- Default: OCCO[P+]1(OCCO)N2C3=CC=C2/C(/C4=CC=CC=C4)=C\5/C=CC(=N5)C(C6=CC=CC=C6)=C7C=CC(N71)=C(C8=CC=CC=C8)C9=NC(=C3C%10=CC=CC=C%10)C=C9 -- Default|UseAromaticSymbols: OCCO[P+]1(OCCO)n2c3ccc2/C(/c4ccccc4)=C\5/C=CC(=N5)C(c6ccccc6)=c7ccc(n71)=C(c8ccccc8)C9=NC(=C3c%10ccccc%10)C=C9 With the missing perception step enabled (change the "if 0:" to "if 1:") then I get what I expected from using CDK Depict. -- Default: OCCO[P+]1(OCCO)N2C3=CC=C2/C(/C4=CC=CC=C4)=C\5/C=CC(=N5)C(C6=CC=CC=C6)=C7C=CC(N71)=C(C8=CC=CC=C8)C9=NC(=C3C%10=CC=CC=C%10)C=C9 -- Default|UseAromaticSymbols: OCCO[P+]1(OCCO)n2c3ccc2c(-c4ccccc4)c5C=Cc(n5)c(-c6ccccc6)c7ccc(n71)c(-c8ccccc8)c9nc(c3-c%10ccccc%10)C=C9 > Hopefully that covers everything but let me know if you have any more questions/thoughts. I think it does. Thanks! Andrew da...@da... |
From: John M. <joh...@gm...> - 2025-06-25 08:16:26
|
Hi Andrew, *Default vs Silent* >From an end user perspective there is little difference between Default and Silent. Silent is what you want and in CDK v3.0 it will become the new default/standard, silent used to be called NoNotify. Internally Default allows you to add listeners which will be notified for every update to the molecule and or its atoms - this was useful for JChemPaint I believe (but actually it doesn't use it any more). Even if you don't add any listeners there is an overhead of dispatching the edit events so it is better to avoid this. *Molecule Standard Form* We (CDK) try to impose very little automation/sanitisation by default, rather than Daylight's dt_mod on/off and RDKit's sanitization it is more similar to OEChem in that the molecule comes out of the readers as they were described in the input. We go a little further and don't even do ring perception (is in ring: true/false). Most common formats (SMILES/MOLfile/InChI/CML) will set the hydrogen counts for you but some older formats (PDB/XYZ) will not. Since standard SMARTS has expressions which require ring flags (true/false) and aromaticity to function correctly we err on the side of caution and will do these automatically unless asked not to (the molecule is prepared for matching). For a single pattern it is a bit smarter and will inspect the expressions and work out if ring flags or aromaticity is needed - something like *[#6]~Cl* does not need these prepared for example. If you have a whole bunch of patterns to run this is obviously inefficient so it is better to prepare the molecule once and then match each pattern. That is where the static function SmartsPattern.prepare() comes in - it is just a convenience utility which does ring finding + Daylight aromaticity. The SmartsPattern <https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html> is the higher level API and why it does these things automatically you can also load a SMARTS pattern and use a normal substructure matcher. *A Pattern <https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/isomorphism/Pattern.html> for matching a single SMARTS query against multiple target compounds. The class can be used for efficiently matching many queries against a single target if setPrepare(boolean) <https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html#setPrepare(boolean)> is disabled (prepare(IAtomContainer) <https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html#prepare(org.openscience.cdk.interfaces.IAtomContainer)>) should be called manually once for each molecule.* I will update this documentation to make it more explicit when it is/isn't needed and why it is done. Here is an example of using the lower level APIs which require manually preparing for the pattern. IAtomContainer query = SilentChemObjectBuilder.getInstance().newAtomContainer(); if (!Smarts.parse(query, "C=C-C=N")) { // bad pattern } Pattern pat = Pattern.findSubstructure(query); IAtomContainer mol = ...; SmartsPattern.prepare(mol); if (pat.matches(mol)) { // is a match } *Standard Workflow* If you have multiple patterns to match what you want to do is something like this: 0. patterns <- load SMARTS/prepare patterns, set prepare false 1. Read Molecule (mol) 2. Set ring flags 3. Set aromaticity 4. for pat in patterns: pat.match(mol) Steps 2/3 can be replaced with prepare, if you have pre-calculated and store aromaticity (e.g. in SMILES) then you can skip step 3 as the input aromaticity flags will be preserved. > I'm not sure how you got that output: Because I was confused when I wrote the code in the first place? Sorry I meant if you knew the steps to reproduce/which aromaticity model did you use..? The standard Daylight model used by the SMARTS matcher would find the externeral porphyrin ring aromatic hence I'm not sure how you would get that unless you used a different aromaticity model (e.g. tighter ring set) before writing to SMILES. Hopefully that covers everything but let me know if you have any more questions/thoughts. It's always a tough balance between doing too much/little automatically and in this case we want simple inputs (e.g. kekulé benzene) to be handled correctly by novice users - the side effect is that there are obviously molecules where the aromaticity is in debate/opinion and it can be confusing since the input in your case wasn't the same at was actually matched on. Fortunately these are relatively rare. P.S. I am considering moving the ring flag setting to the IO readers for CDK v3.0 which is more akin to what OEChem doesn - this is only now possible since it's much faster than it used to be. Best, John On Tue, 24 Jun 2025 at 22:56, Andrew Dalke <da...@da...> wrote: > Thank you John and Jonas for your answers. > > One big issue is I still don't have a good grasp of how CDK does things. > The second is that I'm doing it through Python and the Pype bridge. > > The third is that I last looked at this part of the code over a year ago, > and wrote most of the code about 4 years ago. > > > On Jun 24, 2025, at 17:34, John Mayfield <joh...@gm...> > wrote: > > First off for the SMARTS matcher you can turn off the "prepare" or use > the lower level APIs and work on the input aromaticity. > > > > IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); > > Is there a reason for using SilentChemObjectBuilder instead of what I use, > which is: > > cdk.DefaultChemObjectBuilder.getInstance() > > ? I see cinfony does what you suggest, as does Jonas: > > > SmartsPattern pat = SmartsPattern.create("C=CC=N"); > > pat.setPrepare(false); // turn off auto ring+arom perception > > Jonas also mentioned the prepare method: > > //prevent the SMARTS pattern from perceiving aromaticity > > pattern.setPrepare(false); > > I've never used this method. > > With the default of true, does each SMARTS match re-perceive aromaticity > each time? > > > John: > > Cycles.markRingAtomsAndBonds(mol); > > Aromaticity.apply(Aromaticity.Model.Daylight, mol); > > Hmmm. It looks like I don't understand who is supposed to be in charge of > doing perception, or what the processing steps to get a fully prepared > structure. > > What I've been doing is using SmilesParser(_default_builder).parseSmiles() > and assuming the molecule was in the right state. > > I then use one of the fingerprinters, or do the SMARTS matches for a > couple of my own fingerprint types. > > Am I always supposed to perceive rings and aromaticity if I use > SmilesParser? Is there any reason to not use the same aromcity perception > steps in CDK Depict, using Daylight aromaticity? > > What about if I use MDLV2000Reader/MDLV3000Reader? Or IteratingSDFReader > or IteratingSMILESReader with hasNext()/next() to get the molecules? Do I > need to perceive those too? > > Also, I'm looking at SubstructureFingerprinter.java and see: > > SmartsPattern.prepare(atomContainer) > > Do I need this too? Jonas wrote "SmartPattern.matchAll() is called in the > web app, which internally calls SmartsPattern.prepare", so I don't think I > need it. > > John: > > I'm not sure how you got that output: > > Because I was confused when I wrote the code in the first place? > > I can spend some time pulling the CDK-specific code out of chemfp to get a > stand-alone reproducible, but it's probably a better use of my time to just > get the processing steps done correctly. > > Andrew > da...@da... > > > > > > _______________________________________________ > Cdk-user mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-user > |
From: Andrew D. <da...@da...> - 2025-06-24 21:56:12
|
Thank you John and Jonas for your answers. One big issue is I still don't have a good grasp of how CDK does things. The second is that I'm doing it through Python and the Pype bridge. The third is that I last looked at this part of the code over a year ago, and wrote most of the code about 4 years ago. > On Jun 24, 2025, at 17:34, John Mayfield <joh...@gm...> wrote: > First off for the SMARTS matcher you can turn off the "prepare" or use the lower level APIs and work on the input aromaticity. > > IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); Is there a reason for using SilentChemObjectBuilder instead of what I use, which is: cdk.DefaultChemObjectBuilder.getInstance() ? I see cinfony does what you suggest, as does Jonas: > SmartsPattern pat = SmartsPattern.create("C=CC=N"); > pat.setPrepare(false); // turn off auto ring+arom perception Jonas also mentioned the prepare method: > //prevent the SMARTS pattern from perceiving aromaticity > pattern.setPrepare(false); I've never used this method. With the default of true, does each SMARTS match re-perceive aromaticity each time? John: > Cycles.markRingAtomsAndBonds(mol); > Aromaticity.apply(Aromaticity.Model.Daylight, mol); Hmmm. It looks like I don't understand who is supposed to be in charge of doing perception, or what the processing steps to get a fully prepared structure. What I've been doing is using SmilesParser(_default_builder).parseSmiles() and assuming the molecule was in the right state. I then use one of the fingerprinters, or do the SMARTS matches for a couple of my own fingerprint types. Am I always supposed to perceive rings and aromaticity if I use SmilesParser? Is there any reason to not use the same aromcity perception steps in CDK Depict, using Daylight aromaticity? What about if I use MDLV2000Reader/MDLV3000Reader? Or IteratingSDFReader or IteratingSMILESReader with hasNext()/next() to get the molecules? Do I need to perceive those too? Also, I'm looking at SubstructureFingerprinter.java and see: SmartsPattern.prepare(atomContainer) Do I need this too? Jonas wrote "SmartPattern.matchAll() is called in the web app, which internally calls SmartsPattern.prepare", so I don't think I need it. John: > I'm not sure how you got that output: Because I was confused when I wrote the code in the first place? I can spend some time pulling the CDK-specific code out of chemfp to get a stand-alone reproducible, but it's probably a better use of my time to just get the processing steps done correctly. Andrew da...@da... |
From: <jon...@gm...> - 2025-06-24 15:41:53
|
Hi Andrew, if I dug this up correctly in the CDK Depict and CDK code, the web application always applies the CDK Daylight aromaticity model to the input structure to prepare it for SMARTS matching, and this way overrides all existing aromaticity flags (see here: https://github.com/cdk/depict/blob/21169bbe14668a947331164d0cd17a72f46c620e/ cdkdepict-lib/src/main/java/org/openscience/cdk/app/DepictController.java#L1 212 and here: https://github.com/cdk/cdk/blob/ffa903da9e44ea03e4c29fe1831eaeda3be8e9ac/too l/smarts/src/main/java/org/openscience/cdk/smarts/SmartsPattern.java#L112 -> SmartPattern.matchAll() is called in the web app, which internally calls SmartsPattern.prepare(), which applies the aromaticity perception; this can be turned off, but not in the web app, as far as I can see). If I do this explicitly in CDK code, I also get the four aromatic n: SmilesParser smiPar = new SmilesParser(SilentChemObjectBuilder.getInstance()); IAtomContainer mol = smiPar.parseSmiles("OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccc cc2)=c2/cc/c(n21)=C(\\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1"); Cycles.markRingAtomsAndBonds(mol); Aromaticity.apply(Aromaticity.Model.Daylight, mol); SmilesGenerator smiGen = new SmilesGenerator(SmiFlavor.Canonical | SmiFlavor.UseAromaticSymbols); System.out.println(smiGen.create(mol)); Output: OCCO[P+]1(OCCO)n2c3ccc2c(c4nc(C=C4)c(-c5ccccc5)c6ccc(c(c7nc(C=C7)c3-c8ccccc8 )-c9ccccc9)n61)-c%10ccccc%10 > Given a molecule, how do I generate a SMILES which reflects the internal aromaticity used? Not sure I understand this correctly, but what you are doing towards the end, parsing and re-generating the SMILES code using CDK code (without aromaticity perception, so basically without line 4 in my example - I assume) with the SmiFlavor.UseAromaticSymbols, reproduces exactly the aromaticity information given in the input SMILES code in the output as well (I depicted your input and output with aromaticity display turned on and compared them; they appear to be the same). When it comes to SMARTS matching, you can turn the aromaticity perception off in CDK code, e.g.: SmilesParser smiPar = new SmilesParser(SilentChemObjectBuilder.getInstance()); IAtomContainer mol = smiPar.parseSmiles("OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccc cc2)=c2/cc/c(n21)=C(\\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1"); //no cycle and aromaticity perception SmilesGenerator smiGen = new SmilesGenerator(SmiFlavor.Canonical | SmiFlavor.UseAromaticSymbols); SmartsPattern pattern = SmartsPattern.create("C=CC=N"); //prevent the SMARTS pattern from perceiving aromaticity pattern.setPrepare(false); System.out.println(pattern.matchAll(mol).count()); Output: 2 Does this help you? I guess the bad news is that you cannot use the CDK depict web app, but the good news is that it is possible via code. Kind regards, Jonas ________________ Dr Jonas Schaub jon...@un... http://orcid.org/0000-0003-1554-6666 https://github.com/JonasSchaub https://www.researchgate.net/profile/Jonas-Schaub Postdoctoral Researcher Steinbeck Research Group Friedrich Schiller University Jena, Germany http://cheminf.uni-jena.de Institute for Inorganic and Analytical Chemistry Lessingstr. 8 07743 Jena -----Ursprüngliche Nachricht----- Von: Andrew Dalke <da...@da...> Gesendet: Dienstag, 24. Juni 2025 13:51 An: CDK users list <cdk...@li...> Betreff: [Cdk-user] preserve aromaticity on SMILES output Hi all, Given a molecule, how do I generate a SMILES which reflects the internal aromaticity used? I'm cross-comparing some work using RDKit with CDK. The differences appear to be due to differences in aromaticity perception, as expected. I'm trying to figure out how to verify these differences. Consider the following input SMILES: OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccccc2)=c2/cc/c(n21)=C( \c1ccccc1)C1=NC(=C3c2ccccc2)C=C1 CHEMBL2369103 and SMARTS: C=CC=N While the SMARTS seems like it would match the "C=CC(=N2)" in the SMILES, toolkits of course can perceive their own aromaticity. Testing with CDK Depict shows CDK perceives all four nitrogens as aromatic. A SMARTS which does match is C=C-c:n and using "a" for the SMARTS verifies that all nitrogens are aromatic. I wanted to verify this by visual inspection of the SMILES. When I generate the SMILES with the default flavor I get, as I should have expected, a Kekule form: C1=CC=C(C=C1)/C/2=C/3\\C=CC(=N3)C(=C4C=CC5=C(C6=CC=CC=C6)C7=NC(=C(C8=CC=CC=C 8)C9=CC=C2N9[P+](N45)(OCCO)OCCO)C=C7)C%10=CC=CC=C%10 When I remembered to add UseAromaticSymbols to the flavor I get: c1ccc(cc1)/C/2=C/3\C=CC(=N3)C(=c4ccc5=C(c6ccccc6)C7=NC(=C(c8ccccc8)c9ccc2n9[ P+](n45)(OCCO)OCCO)C=C7)c%10ccccc%10 This shows two aromatic nitrogens and two aliphatic nitrogens, which I expected four "n" terms. This SMILES contains "C=CC(=N3)" which I would expect to match the SMARTS "C=CC=N", so I can't use this approach for manual verification. I didn't see any other relevant flavors to add. Is there something else I should do? Cheers, Andrew da...@da... _______________________________________________ Cdk-user mailing list Cdk...@li... https://lists.sourceforge.net/lists/listinfo/cdk-user |
From: John M. <joh...@gm...> - 2025-06-24 15:34:55
|
Hi Andrew, First off for the SMARTS matcher you can turn off the "prepare" or use the lower level APIs and work on the input aromaticity. IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); SmartsPattern pat = SmartsPattern.create("C=CC=N"); pat.setPrepare(false); // turn off auto ring+arom perception IAtomContainer mol = new SmilesParser(bldr).parseSmiles("OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccccc2)=c2/cc/c(n21)=C(\\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1 CHEMBL2369103"); Cycles.markRingAtomsAndBonds(mol); // we need to do this manually because System.err.println(pat.matchAll(mol).count()); I'm not sure how you got that output: Aromaticity.apply(Aromaticity.Model.Daylight, mol); System.err.println(new SmilesGenerator(SmiFlavor.Default + SmiFlavor.UseAromaticSymbols).create(mol)); Gives me: OCCO[P+]1(OCCO)n2c3ccc2c(-c4ccccc4)c5C=Cc(n5)c(-c6ccccc6)c7ccc(n71)c(-c8ccccc8)c9nc(c3-c%10ccccc%10)C=C9 On Tue, 24 Jun 2025 at 13:09, Andrew Dalke <da...@da...> wrote: > Hi all, > > Given a molecule, how do I generate a SMILES which reflects the internal > aromaticity used? > > I'm cross-comparing some work using RDKit with CDK. The differences appear > to be due to differences in aromaticity perception, as expected. > > I'm trying to figure out how to verify these differences. Consider the > following input SMILES: > > OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccccc2)=c2/cc/c(n21)=C(\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1 > CHEMBL2369103 > > and SMARTS: > > C=CC=N > > While the SMARTS seems like it would match the "C=CC(=N2)" in the SMILES, > toolkits of course can perceive their own aromaticity. > Testing with CDK Depict shows CDK perceives all four nitrogens as aromatic. > > A SMARTS which does match is C=C-c:n and using "a" for the SMARTS verifies > that all nitrogens are aromatic. > > I wanted to verify this by visual inspection of the SMILES. When I > generate the SMILES with the default flavor I get, as I should have > expected, a Kekule form: > > > C1=CC=C(C=C1)/C/2=C/3\\C=CC(=N3)C(=C4C=CC5=C(C6=CC=CC=C6)C7=NC(=C(C8=CC=CC=C8)C9=CC=C2N9[P+](N45)(OCCO)OCCO)C=C7)C%10=CC=CC=C%10 > > When I remembered to add UseAromaticSymbols to the flavor I get: > > > c1ccc(cc1)/C/2=C/3\C=CC(=N3)C(=c4ccc5=C(c6ccccc6)C7=NC(=C(c8ccccc8)c9ccc2n9[P+](n45)(OCCO)OCCO)C=C7)c%10ccccc%10 > > This shows two aromatic nitrogens and two aliphatic nitrogens, which I > expected four "n" terms. > > This SMILES contains "C=CC(=N3)" which I would expect to match the SMARTS > "C=CC=N", so I can't use this approach for manual verification. > > I didn't see any other relevant flavors to add. Is there something else I > should do? > > Cheers, > > Andrew > da...@da... > > > > > > _______________________________________________ > Cdk-user mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-user > |
From: Andrew D. <da...@da...> - 2025-06-24 12:08:43
|
Hi all, Given a molecule, how do I generate a SMILES which reflects the internal aromaticity used? I'm cross-comparing some work using RDKit with CDK. The differences appear to be due to differences in aromaticity perception, as expected. I'm trying to figure out how to verify these differences. Consider the following input SMILES: OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccccc2)=c2/cc/c(n21)=C(\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1 CHEMBL2369103 and SMARTS: C=CC=N While the SMARTS seems like it would match the "C=CC(=N2)" in the SMILES, toolkits of course can perceive their own aromaticity. Testing with CDK Depict shows CDK perceives all four nitrogens as aromatic. A SMARTS which does match is C=C-c:n and using "a" for the SMARTS verifies that all nitrogens are aromatic. I wanted to verify this by visual inspection of the SMILES. When I generate the SMILES with the default flavor I get, as I should have expected, a Kekule form: C1=CC=C(C=C1)/C/2=C/3\\C=CC(=N3)C(=C4C=CC5=C(C6=CC=CC=C6)C7=NC(=C(C8=CC=CC=C8)C9=CC=C2N9[P+](N45)(OCCO)OCCO)C=C7)C%10=CC=CC=C%10 When I remembered to add UseAromaticSymbols to the flavor I get: c1ccc(cc1)/C/2=C/3\C=CC(=N3)C(=c4ccc5=C(c6ccccc6)C7=NC(=C(c8ccccc8)c9ccc2n9[P+](n45)(OCCO)OCCO)C=C7)c%10ccccc%10 This shows two aromatic nitrogens and two aliphatic nitrogens, which I expected four "n" terms. This SMILES contains "C=CC(=N3)" which I would expect to match the SMARTS "C=CC=N", so I can't use this approach for manual verification. I didn't see any other relevant flavors to add. Is there something else I should do? Cheers, Andrew da...@da... |
From: Christoph S. <chr...@un...> - 2025-02-26 13:14:23
|
Dear all, the International Workshop on Open Molecular Informatics (IWOMI) is a small, non-predatory [1] and annual workshop dealing with topics around open data, open standards and open source. This year’s IWOMI (12.-16.05.25) will focus on the curation of open molecular data using multimodal large language models (LLM). You’ll find more information on the concept, in particular the requirement for active participation, at https://iwomi.net <https://iwomi.net/> If you are interested, please register at https://www.iwomi.net/?page_id=1123. Kind regards, Chris [1] non-predatory in some sense. Most participants, however, show successful predatory behaviour in the context of fulfilling basic human needs such as good food, joy, social interactions, curiosity and learning, etc, during the workshop. — Prof. Dr. Christoph Steinbeck Analytical Chemistry - Cheminformatics and Chemometrics Friedrich-Schiller-University Jena, Germany Phone Team Assistant: +49-3641-948171 http://cheminf.uni-jena.de http://orcid.org/0000-0001-6966-0814 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. — Prof. Dr. Christoph Steinbeck Analytical Chemistry - Cheminformatics and Chemometrics Friedrich-Schiller-University Jena, Germany Phone Team Assistant: +49-3641-948171 http://cheminf.uni-jena.de http://orcid.org/0000-0001-6966-0814 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |
From: John M. <joh...@gm...> - 2025-01-28 16:50:23
|
The other option is you need to use the molecule to get the bond list (which is always awkward hence I added these many years ago but it needed a slow introduction). public static void main(String[] args) throws Exception { MDLV2000Reader parser = new MDLV2000Reader( new FileInputStream("../data/example.mol"), IChemObjectReader.Mode.RELAXED); IAtomContainer mol = parser.read(SilentChemObjectBuilder.getInstance().newAtomContainer()); IAtom atom = mol.getAtom(0); List<IBond> bonds = mol.getConnectedBondList(atom); } On Tue, 28 Jan 2025 at 16:47, John Mayfield <joh...@gm...> wrote: > Hi Tim, > > Which version are you using? That should work in CDK 2.10 but before that > you need to go via the builder. We changed the internals so atoms can know > about the molecule they belong to... but an atom can still appear in > multiple molecule at once. > > public class MolTest { > > public static void main(String[] args) throws Exception { > > MDLV2000Reader parser = new MDLV2000Reader( > new FileInputStream("../data/example.mol"), > IChemObjectReader.Mode.RELAXED); > IAtomContainer mol = > parser.read(SilentChemObjectBuilder.getInstance().newAtomContainer()); > IAtom atom = mol.getAtom(0); > Iterable<IBond> bonds = atom.bonds(); > > } > > more info: https://github.com/cdk/cdk/wiki/AtomContainer2 (since 7/8 > years ago :p) > > On Tue, 28 Jan 2025 at 11:31, Tim Dudgeon <tdu...@gm...> wrote: > >> I've hit a strange problem with org.openscience.cdk.silent.Atom in that >> several methods just throw a UnsupportedOperationException. This doesn't >> make much sense to me. >> I'm reading a molfile which generates that type of Atom, but if lots of >> its methods are unsupported then that molecule isn't much use. What am I >> missing? >> This is using CDK 2.9. >> >> package org.squonk.cdk.depict; >> >> import org.openscience.cdk.interfaces.IAtom; >> import org.openscience.cdk.interfaces.IAtomContainer; >> import org.openscience.cdk.interfaces.IBond; >> import org.openscience.cdk.io.IChemObjectReader; >> import org.openscience.cdk.io.MDLV2000Reader; >> import org.openscience.cdk.silent.AtomContainer; >> >> import java.io.FileInputStream; >> >> public class MolTest { >> >> public static void main(String[] args) throws Exception { >> >> MDLV2000Reader parser = new MDLV2000Reader( >> new FileInputStream("../data/example.mol"), >> IChemObjectReader.Mode.RELAXED); >> IAtomContainer mol = parser.read(new AtomContainer()); >> IAtom atom = mol.getAtom(0); >> Iterable<IBond> bonds = atom.bonds(); >> >> } >> } >> _______________________________________________ >> Cdk-user mailing list >> Cdk...@li... >> https://lists.sourceforge.net/lists/listinfo/cdk-user >> > |
From: John M. <joh...@gm...> - 2025-01-28 16:48:06
|
Hi Tim, Which version are you using? That should work in CDK 2.10 but before that you need to go via the builder. We changed the internals so atoms can know about the molecule they belong to... but an atom can still appear in multiple molecule at once. public class MolTest { public static void main(String[] args) throws Exception { MDLV2000Reader parser = new MDLV2000Reader( new FileInputStream("../data/example.mol"), IChemObjectReader.Mode.RELAXED); IAtomContainer mol = parser.read(SilentChemObjectBuilder.getInstance().newAtomContainer()); IAtom atom = mol.getAtom(0); Iterable<IBond> bonds = atom.bonds(); } more info: https://github.com/cdk/cdk/wiki/AtomContainer2 (since 7/8 years ago :p) On Tue, 28 Jan 2025 at 11:31, Tim Dudgeon <tdu...@gm...> wrote: > I've hit a strange problem with org.openscience.cdk.silent.Atom in that > several methods just throw a UnsupportedOperationException. This doesn't > make much sense to me. > I'm reading a molfile which generates that type of Atom, but if lots of > its methods are unsupported then that molecule isn't much use. What am I > missing? > This is using CDK 2.9. > > package org.squonk.cdk.depict; > > import org.openscience.cdk.interfaces.IAtom; > import org.openscience.cdk.interfaces.IAtomContainer; > import org.openscience.cdk.interfaces.IBond; > import org.openscience.cdk.io.IChemObjectReader; > import org.openscience.cdk.io.MDLV2000Reader; > import org.openscience.cdk.silent.AtomContainer; > > import java.io.FileInputStream; > > public class MolTest { > > public static void main(String[] args) throws Exception { > > MDLV2000Reader parser = new MDLV2000Reader( > new FileInputStream("../data/example.mol"), > IChemObjectReader.Mode.RELAXED); > IAtomContainer mol = parser.read(new AtomContainer()); > IAtom atom = mol.getAtom(0); > Iterable<IBond> bonds = atom.bonds(); > > } > } > _______________________________________________ > Cdk-user mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-user > |
From: Tim D. <tdu...@gm...> - 2025-01-28 11:30:58
|
I've hit a strange problem with org.openscience.cdk.silent.Atom in that several methods just throw a UnsupportedOperationException. This doesn't make much sense to me. I'm reading a molfile which generates that type of Atom, but if lots of its methods are unsupported then that molecule isn't much use. What am I missing? This is using CDK 2.9. package org.squonk.cdk.depict; import org.openscience.cdk.interfaces.IAtom; import org.openscience.cdk.interfaces.IAtomContainer; import org.openscience.cdk.interfaces.IBond; import org.openscience.cdk.io.IChemObjectReader; import org.openscience.cdk.io.MDLV2000Reader; import org.openscience.cdk.silent.AtomContainer; import java.io.FileInputStream; public class MolTest { public static void main(String[] args) throws Exception { MDLV2000Reader parser = new MDLV2000Reader( new FileInputStream("../data/example.mol"), IChemObjectReader.Mode.RELAXED); IAtomContainer mol = parser.read(new AtomContainer()); IAtom atom = mol.getAtom(0); Iterable<IBond> bonds = atom.bonds(); } } |
From: Egon W. <ego...@gm...> - 2025-01-22 20:51:02
|
Hi everyone, here is the overdue update. I have set up a webpage for the meeting ( https://cdk.github.io/nwo-openscience-2024/, with the first three talks), a rough schedule for the two days, some initial travel information (questions most welcome), and a registration form (so that we know how many (vega) lunches need to be ordered). The form allows you to indicate if you want to present on the first day your own CDK-using work and on which days you will participate (both, I hope). Please contact me if you have any questions, with kind regards, Egon On Sun, 3 Nov 2024 at 20:20, Egon Willighagen <ego...@gm...> wrote: > Hi everyone, > > mark your calendar: 10-11 March 2025 in Maastricht, The Netherlands we > will hold a Chemistry Development Kit User Group Meeting. One day will have > presentations from CDK developers and CDK users, while the second day is > unconference/hackathon style. > > Please email me if you are interested in presenting your CDK-based tools > or research. Maastricht is a beautiful city [0], one of the two oldest > cities in The Netherlands. > > More details will follow soon. > > Egon > > 0.https://en.wikipedia.org/wiki/Maastricht > > -- > Dr E.L. Willighagen > Department of Translational Genomics > NUTRIM Institute of Nutrition and Translational Research in Metabolism > Maastricht University > Blog: https://chem-bla-ics.linkedchemistry.info/ > Mastodon: https://social.edu.nl/@egonw > PubList: https://orcid.org/0000-0001-7542-0286 > -- -- E.L. Willighagen Department of Translational Genomics NUTRIM Institute of Nutrition and Translational Research in Metabolism Maastricht University Blog: https://chem-bla-ics.linkedchemistry.info/ Mastodon: https://social.edu.nl/@egonw PubList: https://orcid.org/0000-0001-7542-0286 |
From: Christoph S. <chr...@un...> - 2025-01-10 13:35:11
|
Fantastic progress! Thanks to everyone who contributed and especially to Dr. Who. Kind regards, Chris — Prof. Dr. Christoph Steinbeck Analytical Chemistry - Cheminformatics and Chemometrics Friedrich-Schiller-University Jena, Germany Phone Team Assistant: +49-3641-948171 http://cheminf.uni-jena.de http://orcid.org/0000-0001-6966-0814 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. > On 10. Jan 2025, at 10:42, John Mayfield <joh...@gm...> wrote: > > Dear CDK Users, > > A new release of CDK 2.10 is available! You can read the full release notes here: https://github.com/cdk/cdk/releases/tag/cdk-2.10 > > In summary some of the key/new features are: > - SMIRKS > - RInChI > - RDfile reading > - FunctionalGroupFinder > > Bests wishes and Happy New Year > John > _______________________________________________ > Cdk-user mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-user |
From: John M. <joh...@gm...> - 2025-01-10 09:43:21
|
Dear CDK Users, A new release of CDK 2.10 is available! You can read the full release notes here: https://github.com/cdk/cdk/releases/tag/cdk-2.10 In summary some of the key/new features are: - SMIRKS - RInChI - RDfile reading - FunctionalGroupFinder Bests wishes and Happy New Year John |
From: Christoph S. <chr...@un...> - 2024-12-04 08:19:18
|
Dear all, we are organising a week-long retreat on open databases in chemistry with a focus on using multimodal LLM for their curation as part of a workshop series called International Workshop on Open Molecular Informatics (IWOMI) from 12.-16. May 2025. The meeting place is located in the mountains next to Bolzano in Italy and a wonderful place to retreat and enjoy hands-on tutorials, hackathons and scientific and non-scientific conversations. The IWOMI is organised as an unconference. The content and the hackathon sessions are decided by the participants before and during the workshop. Please be prepared for a week of activity rather than passive consumption of talks :) If you are interested, please register at https://www.iwomi.net/ and let me know if you would like to contribute a talk, tutorial, or an idea for a hackathon session. The registration deadline is the end of February. Kind regards, Chris — Prof. Dr. Christoph Steinbeck Vice President for Digitalisation of the Friedrich-Schiller-University Jena Analytical Chemistry - Cheminformatics and Chemometrics Friedrich-Schiller-University Jena, Germany Phone Team Assistant: +49-3641-948171 http://cheminf.uni-jena.de http://orcid.org/0000-0001-6966-0814 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |
From: John M. <joh...@gm...> - 2024-11-25 10:52:51
|
There is an option to recompute the hydrogens... (RECOMPUTE_HYDROGENS) https://github.com/cdk/cdk/blob/main/tool/smarts/src/main/java/org/openscience/cdk/smirks/SmirksOption.java#L80 Note the explicit [H] is better than setting H1 because unless you specify H0 you may have invalid valence, e.g. [OH-] => [OH] would become. The way the explicit H works is efficient in that it +1 to hydrogen count rather than create an explicit atom. On Mon, 25 Nov 2024 at 00:30, Uli Fechner <ul...@pe...> wrote: > There is a fair bit of information associated with the original PR: > https://github.com/cdk/cdk/pull/916 > > It's my understanding that hydrogens are not automatically adjusted, that > is, modifying the charge on an atom requires adjusting the hydrogen count. > > In this comment are references to more information regarding the different > matching modes: > https://github.com/cdk/cdk/pull/916#issuecomment-1273164172 > > Best > Uli > > On Mon, Nov 25, 2024 at 4:59 AM Jonas Schaub via Cdk-user < > cdk...@li...> wrote: > >> Hi Egon, >> >> >> >> I also started playing with SMIRKS only a few days ago but I think you >> need to specify the neutral charge and hydrogen saturation in the product >> explicitly. I tried “[O-:1]>>[O;+0;H1:1]” and it produced the output you >> are looking for (O=C(O)C). >> >> >> >> > let's say we have multiple carboxylic acids and amine groups, how would >> one enumerate all possible charge states? Is that possible with just SMIRKS? >> >> >> >> If you use the transform mode “Unique”, you will get as many structures >> returned as there are matches to your SMIRKS in the molecule. So in each >> returned structure, one carboxylic acid would be neutralised. But I guess >> this is only the first step towards what you are looking for. I am actually >> faced with a similar problem right now, i.e. enumerating all possible >> transformation combinations for one molecule and multiple SMIRKS… >> >> >> >> Hope this was of help. >> >> >> >> Kind regards, >> >> Jonas >> >> >> >> *From:* Egon Willighagen <ego...@gm...> >> *Sent:* Saturday, November 23, 2024 6:52 PM >> *To:* CDK users list <cdk...@li...> >> *Subject:* [Cdk-user] SMIRKS in 2.10-SNAPSHOT >> >> >> >> >> Hi John, all, >> >> >> >> I am playing with SMIRKS but it's pretty new to me. >> >> >> >> Transform neutralAcid = Smirks.compile("[O-:1]>>[O:1]"); >> >> IAtomContainer cdkStruct = parser.parseSmiles("CC(=O)[O-]"); >> Iterable<IAtomContainer> iterable = neutralAcid.apply(cdkStruct, >> Transform.Mode.Exclusive); >> for (IAtomContainer neutral : iterable) { >> String neutralSmiles = generator.createSMILES(neutral); >> >> System.out.println(neutralSmiles); >> } >> >> >> >> (I hope I did not make any copy/paste typos) >> >> >> >> Why am I not getting this as output: CC(=O)O ? >> >> >> >> (Actually, let's assume I can get that working, let's say we have >> multiple carboxylic acids and amine groups, how would one enumerate all >> possible charge states? Is that possible with just SMIRKS?) >> >> >> >> Egon >> >> >> >> -- >> >> [NL] WikiPathways in actie voor MetaKids: We zamelen geld in voor >> MetaKids met een actie rond WikiPathways, zie >> https://sr24.wikipathways.org/. Doneer via >> https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids >> >> >> >> [EN] WikiPathways in action for MetaKids: We are fundraising for the >> Dutch charity MetaKids by improving metabolic disorder pathways, see >> https://sr24.wikipathways.org/. Donate at >> https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids >> >> >> >> -- >> >> E.L. Willighagen >> Department of Translational Genomics >> >> NUTRIM Institute of Nutrition and Translational Research in Metabolism >> Maastricht University >> Blog: https://chem-bla-ics.linkedchemistry.info/ >> Mastodon: https://social.edu.nl/@egonw >> PubList: https://orcid.org/0000-0001-7542-0286 >> _______________________________________________ >> Cdk-user mailing list >> Cdk...@li... >> https://lists.sourceforge.net/lists/listinfo/cdk-user >> > _______________________________________________ > Cdk-user mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-user > |
From: Uli F. <ul...@pe...> - 2024-11-25 00:29:26
|
There is a fair bit of information associated with the original PR: https://github.com/cdk/cdk/pull/916 It's my understanding that hydrogens are not automatically adjusted, that is, modifying the charge on an atom requires adjusting the hydrogen count. In this comment are references to more information regarding the different matching modes: https://github.com/cdk/cdk/pull/916#issuecomment-1273164172 Best Uli On Mon, Nov 25, 2024 at 4:59 AM Jonas Schaub via Cdk-user < cdk...@li...> wrote: > Hi Egon, > > > > I also started playing with SMIRKS only a few days ago but I think you > need to specify the neutral charge and hydrogen saturation in the product > explicitly. I tried “[O-:1]>>[O;+0;H1:1]” and it produced the output you > are looking for (O=C(O)C). > > > > > let's say we have multiple carboxylic acids and amine groups, how would > one enumerate all possible charge states? Is that possible with just SMIRKS? > > > > If you use the transform mode “Unique”, you will get as many structures > returned as there are matches to your SMIRKS in the molecule. So in each > returned structure, one carboxylic acid would be neutralised. But I guess > this is only the first step towards what you are looking for. I am actually > faced with a similar problem right now, i.e. enumerating all possible > transformation combinations for one molecule and multiple SMIRKS… > > > > Hope this was of help. > > > > Kind regards, > > Jonas > > > > *From:* Egon Willighagen <ego...@gm...> > *Sent:* Saturday, November 23, 2024 6:52 PM > *To:* CDK users list <cdk...@li...> > *Subject:* [Cdk-user] SMIRKS in 2.10-SNAPSHOT > > > > > Hi John, all, > > > > I am playing with SMIRKS but it's pretty new to me. > > > > Transform neutralAcid = Smirks.compile("[O-:1]>>[O:1]"); > > IAtomContainer cdkStruct = parser.parseSmiles("CC(=O)[O-]"); > Iterable<IAtomContainer> iterable = neutralAcid.apply(cdkStruct, > Transform.Mode.Exclusive); > for (IAtomContainer neutral : iterable) { > String neutralSmiles = generator.createSMILES(neutral); > > System.out.println(neutralSmiles); > } > > > > (I hope I did not make any copy/paste typos) > > > > Why am I not getting this as output: CC(=O)O ? > > > > (Actually, let's assume I can get that working, let's say we have multiple > carboxylic acids and amine groups, how would one enumerate all possible > charge states? Is that possible with just SMIRKS?) > > > > Egon > > > > -- > > [NL] WikiPathways in actie voor MetaKids: We zamelen geld in voor MetaKids > met een actie rond WikiPathways, zie https://sr24.wikipathways.org/. > Doneer via > https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids > > > > [EN] WikiPathways in action for MetaKids: We are fundraising for the Dutch > charity MetaKids by improving metabolic disorder pathways, see > https://sr24.wikipathways.org/. Donate at > https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids > > > > -- > > E.L. Willighagen > Department of Translational Genomics > > NUTRIM Institute of Nutrition and Translational Research in Metabolism > Maastricht University > Blog: https://chem-bla-ics.linkedchemistry.info/ > Mastodon: https://social.edu.nl/@egonw > PubList: https://orcid.org/0000-0001-7542-0286 > _______________________________________________ > Cdk-user mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-user > |
From: Jonas S. <jon...@gm...> - 2024-11-24 17:59:24
|
Hi Egon, I also started playing with SMIRKS only a few days ago but I think you need to specify the neutral charge and hydrogen saturation in the product explicitly. I tried “[O-:1]>>[O;+0;H1:1]” and it produced the output you are looking for (O=C(O)C). > let's say we have multiple carboxylic acids and amine groups, how would one enumerate all possible charge states? Is that possible with just SMIRKS? If you use the transform mode “Unique”, you will get as many structures returned as there are matches to your SMIRKS in the molecule. So in each returned structure, one carboxylic acid would be neutralised. But I guess this is only the first step towards what you are looking for. I am actually faced with a similar problem right now, i.e. enumerating all possible transformation combinations for one molecule and multiple SMIRKS… Hope this was of help. Kind regards, Jonas From: Egon Willighagen <ego...@gm...> Sent: Saturday, November 23, 2024 6:52 PM To: CDK users list <cdk...@li...> Subject: [Cdk-user] SMIRKS in 2.10-SNAPSHOT Hi John, all, I am playing with SMIRKS but it's pretty new to me. Transform neutralAcid = Smirks.compile("[O-:1]>>[O:1]"); IAtomContainer cdkStruct = parser.parseSmiles("CC(=O)[O-]"); Iterable<IAtomContainer> iterable = neutralAcid.apply(cdkStruct, Transform.Mode.Exclusive); for (IAtomContainer neutral : iterable) { String neutralSmiles = generator.createSMILES(neutral); System.out.println(neutralSmiles); } (I hope I did not make any copy/paste typos) Why am I not getting this as output: CC(=O)O ? (Actually, let's assume I can get that working, let's say we have multiple carboxylic acids and amine groups, how would one enumerate all possible charge states? Is that possible with just SMIRKS?) Egon -- [NL] WikiPathways in actie voor MetaKids: We zamelen geld in voor MetaKids met een actie rond WikiPathways, zie <https://sr24.wikipathways.org/> https://sr24.wikipathways.org/. Doneer via <https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids> https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids [EN] WikiPathways in action for MetaKids: We are fundraising for the Dutch charity MetaKids by improving metabolic disorder pathways, see <https://sr24.wikipathways.org/> https://sr24.wikipathways.org/. Donate at <https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids> https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids -- E.L. Willighagen Department of Translational Genomics NUTRIM Institute of Nutrition and Translational Research in Metabolism Maastricht University Blog: <https://chem-bla-ics.linkedchemistry.info/> https://chem-bla-ics.linkedchemistry.info/ Mastodon: <https://social.edu.nl/@egonw> https://social.edu.nl/@egonw PubList: <https://orcid.org/0000-0001-7542-0286> https://orcid.org/0000-0001-7542-0286 |
From: Egon W. <ego...@gm...> - 2024-11-23 17:52:54
|
Hi John, all, I am playing with SMIRKS but it's pretty new to me. Transform neutralAcid = Smirks.compile("[O-:1]>>[O:1]"); IAtomContainer cdkStruct = parser.parseSmiles("CC(=O)[O-]"); Iterable<IAtomContainer> iterable = neutralAcid.apply(cdkStruct, Transform.Mode.Exclusive); for (IAtomContainer neutral : iterable) { String neutralSmiles = generator.createSMILES(neutral); System.out.println(neutralSmiles); } (I hope I did not make any copy/paste typos) Why am I not getting this as output: CC(=O)O ? (Actually, let's assume I can get that working, let's say we have multiple carboxylic acids and amine groups, how would one enumerate all possible charge states? Is that possible with just SMIRKS?) Egon -- [NL] WikiPathways in actie voor MetaKids: We zamelen geld in voor MetaKids met een actie rond WikiPathways, zie https://sr24.wikipathways.org/. Doneer via https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids [EN] WikiPathways in action for MetaKids: We are fundraising for the Dutch charity MetaKids by improving metabolic disorder pathways, see https://sr24.wikipathways.org/. Donate at https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids -- E.L. Willighagen Department of Translational Genomics NUTRIM Institute of Nutrition and Translational Research in Metabolism Maastricht University Blog: https://chem-bla-ics.linkedchemistry.info/ Mastodon: https://social.edu.nl/@egonw PubList: https://orcid.org/0000-0001-7542-0286 |
From: Egon W. <ego...@gm...> - 2024-11-03 19:21:20
|
Hi everyone, mark your calendar: 10-11 March 2025 in Maastricht, The Netherlands we will hold a Chemistry Development Kit User Group Meeting. One day will have presentations from CDK developers and CDK users, while the second day is unconference/hackathon style. Please email me if you are interested in presenting your CDK-based tools or research. Maastricht is a beautiful city [0], one of the two oldest cities in The Netherlands. More details will follow soon. Egon 0.https://en.wikipedia.org/wiki/Maastricht -- Dr E.L. Willighagen Department of Translational Genomics NUTRIM Institute of Nutrition and Translational Research in Metabolism Maastricht University Blog: https://chem-bla-ics.linkedchemistry.info/ Mastodon: https://social.edu.nl/@egonw PubList: https://orcid.org/0000-0001-7542-0286 |
From: John M. <joh...@gm...> - 2024-09-04 22:21:35
|
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto">What does your code look like<br id="lineBreakAtBeginningOfSignature"><div dir="ltr">- John</div><div dir="ltr"><br><blockquote type="cite">On 4 Sep 2024, at 21:03, Velusamy Velu <koo...@gm...> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif">Hi John:</div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">I appreciate your feedback. I'm using version 2.9 of CDK. Our requirement needs to construct the Porphyrin molecule programmatically as instance of the IAtomContainer and use the depictor. The >N-H decision is a matter of choice. When I replaced the >N-H as you have done (>NH) the result I received was still with the second lines of double bonds out.<br><br><img alt="image.png" src="cid:ii_m0m1u3rq5"><br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><span style="font-family:verdana,sans-serif">Thanks<span class="gmail_default" style="font-family:verdana,sans-serif"> for the pointers you provided, I will review those documents.</span></span><br></div><div dir="ltr"><font face="verdana, sans-serif"><br><font color="#3333FF">Velusamy K. Velu</font></font></div><div dir="ltr"><span style="color:rgb(0,51,187);font-size:12.8px"><span title="Call with Google Voice"><span title="Call with Google Voice"><span title="Call with Google Voice"><span title="Call with Google Voice"><span title="Call with Google Voice"><span title="Call with Google Voice"><span title="Call with Google Voice"><span title="Call with Google Voice"><font face="verdana, sans-serif"><span title="Call with Google Voice"><span title="Call with Google Voice"><span title="Call with Google Voice"><span title="Call with Google Voice">614-323-9649</span></span></span></span></font></span></span></span></span></span></span></span></span></span><font face="verdana, sans-serif"><font color="#3333FF"><br></font></font></div><div dir="ltr"><a href="https://peruselab.com/" target="_blank"><img src="https://docs.google.com/uc?export=download&id=1wH3f0WwPszuy-YZephU9pJY_269GNDnY&revid=0B2lughdDZotkTU12MCtMeDBmL3Ivc1Nvc1QrTFFnanlqb1hZPQ" data-unique-identifier=""></a> <a href="https://www.linkedin.com/in/vkvelu/" style="color:rgb(17,85,204)" target="_blank"><img src="https://docs.google.com/uc?export=download&id=1f5perwOmCI2AIVNCEfzq_FQt-N1L01hF&revid=0B2lughdDZotkMXZSdEVYTjI3UTdCeEF5LzlPZ0RpN1VEZjZVPQ" data-unique-identifier=""></a> <a href="https://twitter.com/PeruseLab" style="color:rgb(17,85,204)" target="_blank"><img src="https://docs.google.com/uc?export=download&id=1GIu72DdRXDojkVUNKm02E6ff2Npf8DrR&revid=0B2lughdDZotkRVRBMXZlQUllQVZlQWNWQ1FaTEN2TzhaVndNPQ" data-unique-identifier=""></a> <a href="https://www.facebook.com/PeruseLab/" style="color:rgb(17,85,204)" target="_blank"><img src="https://docs.google.com/uc?export=download&id=1D5LPXJC_8ROf36b4jd9SM3UVTevPvo7L&revid=0B2lughdDZotkMVpkNU9XSTY2NkUxdWZMVGNEeXoxZ0l5RFowPQ" data-unique-identifier=""></a></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 3, 2024 at 2:33 AM John Mayfield <<a href="mailto:joh...@gm..." target="_blank">joh...@gm...</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">It's true we do still push the hydrogens out like that (see below), it's a bit tricky to fix as is due the macrocycle processor.<div>However since you had that it hints perhaps you have just found some old documentation on how to depict molecules are using the wrong classes (generators).</div><div><br></div><div>There is lots of info here: <a href="https://github.com/cdk/cdk/wiki/Standard-Generator" target="_blank">https://github.com/cdk/cdk/wiki/Standard-Generator</a> but start here: <a href="https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/depict/DepictionGenerator.html" target="_blank">https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/depict/DepictionGenerator.html</a><br><div><br></div><div><br></div><div><img alt="image.png" src="cid:ii_m0m1u3rq5"><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 3 Sept 2024 at 07:27, John Mayfield <<a href="mailto:joh...@gm..." target="_blank">joh...@gm...</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">You are probably using a very ancient version.<div><br></div><div>Pasting the SMILES for Porphyrin on Wikipedia: C1=CC2=CC5=CC=C(C=C4C=CC(C=C3C=CC(=CC1=N2)N3)=N4)N5</div><div>in here: <a href="https://www.simolecule.com/cdkdepict/depict.html" target="_blank">https://www.simolecule.com/cdkdepict/depict.html</a></div><div>I get:</div><div><br></div><div><img alt="image.png" src="cid:ii_m0m1u3rq5"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 3 Sept 2024 at 04:36, Velusamy Velu <<a href="mailto:vv...@pe..." target="_blank">vv...@pe...</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif">Hi I ran into an issue with the image of the porphyrin molecule. </div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">The original image from Wikipedia is:</div><div class="gmail_default" style="font-family:verdana,sans-serif"><img alt="image.png" src="cid:ii_m0m1u3rq5"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">I drew it as below:<br><img alt="image.png" src="cid:ii_m0m1u3rq5"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">Then I let the CDK layout the image, which did a good job except for the placement of the second lines of double bonds, as shown below.<br><img alt="image.png" src="cid:ii_m0m1u3rq5"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">I thought the explicit bond between H & N was the cause, so I replaced those N-H instances with NH, then tried again. This time it still put those second lines out, as below.</div><div class="gmail_default" style="font-family:verdana,sans-serif"><img alt="image.png" src="cid:ii_m0m1u3rq5"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">Any idea how to fix this?</div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="color:rgb(34,34,34);font-family:verdana,sans-serif;font-size:12.8px">Thanks</span><br style="color:rgb(34,34,34);font-family:verdana,sans-serif;font-size:12.8px"><br style="color:rgb(34,34,34);font-family:verdana,sans-serif;font-size:12.8px"><font color="#3333FF" style="font-family:verdana,sans-serif;font-size:12.8px">Velusamy K. Velu<br></font><span style="font-family:verdana,sans-serif;font-size:12.8px;color:rgb(0,51,187)">(614) 323-9649</span></div><a href="https://peruselab.com" target="_blank"><img src="https://ci3.googleusercontent.com/mail-sig/AIorK4xkcoxSZUYZWFIphiKjjUkK3gu8I4hezAWkZqQLwepudCScKLQljDilFtfIhY73eOLT5FWWwfu4FGkT" data-unique-identifier=""></a> <a href="https://www.linkedin.com/company/peruselab/" target="_blank"><img src="https://ci3.googleusercontent.com/mail-sig/AIorK4zxJx9amOQntE3LyRtkGN8vawi_S_OSVBDU10Xy8_y6gOxbNmQuhVaO-yKOKZ_35cFOX9ErsflmYaN3" data-unique-identifier=""></a> <a href="https://twitter.com/PeruseLab" target="_blank"><img src="https://ci3.googleusercontent.com/mail-sig/AIorK4wYJouF2Ee_0vuDeSnNlDe-95dpRP6SzkdHz2UsdEbuIF26iigOlTT_VhsskjZjD-TrqtPhPk47_4zS" data-unique-identifier=""></a> <a href="https://www.facebook.com/PeruseLab/" target="_blank"><img src="https://ci3.googleusercontent.com/mail-sig/AIorK4yZCSFn3fDS9oQd2WawtJXIC84os1hlO9yzo4rIvDx7egtJEY7YS7sDF-t3RaTTQqacyUGKNz0dikBR" data-unique-identifier=""></a> <br></div></div></div></div> _______________________________________________<br> Cdk-user mailing list<br> <a href="mailto:Cdk...@li..." target="_blank">Cdk...@li...</a><br> <a href="https://lists.sourceforge.net/lists/listinfo/cdk-user" rel="noreferrer" target="_blank">https://lists.sourceforge.net/lists/listinfo/cdk-user</a><br> </blockquote></div> </blockquote></div> _______________________________________________<br> Cdk-user mailing list<br> <a href="mailto:Cdk...@li..." target="_blank">Cdk...@li...</a><br> <a href="https://lists.sourceforge.net/lists/listinfo/cdk-user" rel="noreferrer" target="_blank">https://lists.sourceforge.net/lists/listinfo/cdk-user</a><br> </blockquote></div> </div></blockquote></body></html> |
From: John M. <joh...@gm...> - 2024-09-03 06:32:48
|
It's true we do still push the hydrogens out like that (see below), it's a bit tricky to fix as is due the macrocycle processor. However since you had that it hints perhaps you have just found some old documentation on how to depict molecules are using the wrong classes (generators). There is lots of info here: https://github.com/cdk/cdk/wiki/Standard-Generator but start here: https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/depict/DepictionGenerator.html [image: image.png] On Tue, 3 Sept 2024 at 07:27, John Mayfield <joh...@gm...> wrote: > You are probably using a very ancient version. > > Pasting the SMILES for Porphyrin on > Wikipedia: C1=CC2=CC5=CC=C(C=C4C=CC(C=C3C=CC(=CC1=N2)N3)=N4)N5 > in here: https://www.simolecule.com/cdkdepict/depict.html > I get: > > [image: image.png] > > On Tue, 3 Sept 2024 at 04:36, Velusamy Velu <vv...@pe...> wrote: > >> Hi I ran into an issue with the image of the porphyrin molecule. >> >> The original image from Wikipedia is: >> [image: image.png] >> I drew it as below: >> [image: image.png] >> Then I let the CDK layout the image, which did a good job except for the >> placement of the second lines of double bonds, as shown below. >> [image: image.png] >> I thought the explicit bond between H & N was the cause, so I replaced >> those N-H instances with NH, then tried again. This time it still put those >> second lines out, as below. >> [image: image.png] >> Any idea how to fix this? >> >> Thanks >> >> Velusamy K. Velu >> (614) 323-9649 >> <https://peruselab.com> <https://www.linkedin.com/company/peruselab/> >> <https://twitter.com/PeruseLab> <https://www.facebook.com/PeruseLab/> >> _______________________________________________ >> Cdk-user mailing list >> Cdk...@li... >> https://lists.sourceforge.net/lists/listinfo/cdk-user >> > |
From: Velusamy V. <vv...@pe...> - 2024-09-03 03:36:03
|
Hi I ran into an issue with the image of the porphyrin molecule. The original image from Wikipedia is: [image: image.png] I drew it as below: [image: image.png] Then I let the CDK layout the image, which did a good job except for the placement of the second lines of double bonds, as shown below. [image: image.png] I thought the explicit bond between H & N was the cause, so I replaced those N-H instances with NH, then tried again. This time it still put those second lines out, as below. [image: image.png] Any idea how to fix this? Thanks Velusamy K. Velu (614) 323-9649 <https://peruselab.com> <https://www.linkedin.com/company/peruselab/> <https://twitter.com/PeruseLab> <https://www.facebook.com/PeruseLab/> |
From: Egon W. <ego...@gm...> - 2024-09-01 14:43:19
|
Thanks for posting! I have reshared an announcement based on this on Mastodon: https://fosstodon.org/@blueobelisk/113055702502809492 (I decided from the Blue Obelisk account, since this is not just for the CDK) Egon On Mon, 5 Aug 2024 at 10:44, Andrew Dalke <da...@da...> wrote: > Hi everyone, > > I have released chemfp 4.2. The new "simarray" functionality computes the > full comparison matrix as a NumPy array, eg, for use in some clustering > algorithms. It has built-in support for Tanimoto, Dice, cosine, and Hamming > comparisons, plus an option to get the individual "a", "b", "c", and "d" > components should you need a specialized metric. It processes roughly 100M > comparisons per second on my laptop, which means if you had 30 TB of free > disk space you could generate the NxN comparisons for ChEMBL in about a > day. (I'm curious if someone will do this!) > > Chemfp supports the CDK, RDKit, Open Babel, and OpenEye toolkits. Some of > the specific improvements for the chemfp/CDK interface are: > > - new "hydrogens" options for the SMILES and SDF readers ("as-is", > "make-explicit", "make-implicit", and "make-nonchiral-implicit") to change > between implicit and explicit hydrogens. > > - added support for the CDK 2.9 Pubchem fingerprint improvements > > - added support for jCompoundMapper fingerprints > > The jCompoundMapper and "hydrogens" option were added after I read > “Effectiveness of molecular fingerprints for exploring the chemical space > of natural products” by Boldini, Ballabio, Consonni, Todeschini, Grisoni, > and Sieber, J. Cheminform. (2024) 16:35 > https://doi.org/10.1186/s13321-024-00830-3 and realized there were a few > rough edges chemfp could help smooth out. > > For a full description of what's new in this release, see > https://chemfp.com/docs/whats_new_in_42.html . > > Chemfp may be the package you’ve been looking for, if you work with binary > cheminformatics fingerprints. Chemfp is perhaps best known for its > high-performance fingerprint similarity search. Its Taylor/Butina > clustering, MaxMin diversity selection, and sphere exclusion, (including > directed sphere exclusion) are equally world-class. Or, if you simply need > a 100K by 100K distance array to pass into scikit-learn, chemfp’s simarray > can generate that in less than a minute. > > The chemfp homepage is https://chemfp.com/ . To install a pre-compiled > chemfp for Linux-based OSes: > > python -m pip install chemfp -i https://chemfp.com/packages/ > > The default installation limits or disables a few chemfp features as > described in the base license agreement at > https://chemfp.com/BaseLicense.txt . To request a license key, which is > free for academic use, see https://chemfp.com/license/ . > > Best regards, > > Andrew Dalke > da...@da... > > > > _______________________________________________ > Cdk-user mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-user > -- Okay, you make FAIR. But why? We now can link FAIR maturity indicators to reuse case scenarios. You can top asking "Is my data FAIR?" and start asking "How FAIR do I need to be to allow that reuse?" Read about it in our new paper "FAIR assessment of nanosafety data reusability with community standards", https://www.nature.com/articles/s41597-024-03324-x |