rdkit-discuss Mailing List for RDKit

Open-Source Cheminformatics and Machine Learning

Brought to you by: glandrum

rdkit-discuss — Mailing list for discussion, questions and answers.

You can subscribe to this list here.

2006	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2007	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep (27)	Oct (4)	Nov (20)	Dec (4)
2008	Jan (12)	Feb (2)	Mar (23)	Apr (40)	May (30)	Jun (6)	Jul (35)	Aug (60)	Sep (31)	Oct (33)	Nov (35)	Dec (3)
2009	Jan (16)	Feb (77)	Mar (88)	Apr (57)	May (33)	Jun (27)	Jul (55)	Aug (26)	Sep (12)	Oct (45)	Nov (42)	Dec (23)
2010	Jan (64)	Feb (17)	Mar (30)	Apr (55)	May (30)	Jun (65)	Jul (112)	Aug (26)	Sep (67)	Oct (20)	Nov (67)	Dec (23)
2011	Jan (57)	Feb (43)	Mar (50)	Apr (66)	May (95)	Jun (73)	Jul (64)	Aug (47)	Sep (22)	Oct (56)	Nov (51)	Dec (34)
2012	Jan (64)	Feb (45)	Mar (65)	Apr (85)	May (76)	Jun (47)	Jul (75)	Aug (72)	Sep (31)	Oct (77)	Nov (61)	Dec (41)
2013	Jan (68)	Feb (63)	Mar (36)	Apr (73)	May (61)	Jun (69)	Jul (98)	Aug (60)	Sep (74)	Oct (102)	Nov (92)	Dec (63)
2014	Jan (112)	Feb (84)	Mar (72)	Apr (59)	May (96)	Jun (54)	Jul (91)	Aug (54)	Sep (38)	Oct (47)	Nov (33)	Dec (39)
2015	Jan (41)	Feb (115)	Mar (66)	Apr (87)	May (63)	Jun (53)	Jul (61)	Aug (59)	Sep (115)	Oct (42)	Nov (60)	Dec (20)
2016	Jan (52)	Feb (72)	Mar (100)	Apr (125)	May (61)	Jun (106)	Jul (62)	Aug (74)	Sep (151)	Oct (151)	Nov (117)	Dec (148)
2017	Jan (106)	Feb (75)	Mar (106)	Apr (67)	May (85)	Jun (144)	Jul (53)	Aug (73)	Sep (188)	Oct (106)	Nov (118)	Dec (74)
2018	Jan (96)	Feb (43)	Mar (40)	Apr (111)	May (77)	Jun (112)	Jul (64)	Aug (85)	Sep (73)	Oct (117)	Nov (97)	Dec (47)
2019	Jan (63)	Feb (112)	Mar (109)	Apr (61)	May (51)	Jun (41)	Jul (57)	Aug (68)	Sep (47)	Oct (126)	Nov (117)	Dec (96)
2020	Jan (84)	Feb (82)	Mar (80)	Apr (100)	May (78)	Jun (68)	Jul (76)	Aug (69)	Sep (76)	Oct (73)	Nov (69)	Dec (42)
2021	Jan (44)	Feb (30)	Mar (85)	Apr (65)	May (41)	Jun (72)	Jul (55)	Aug (9)	Sep (44)	Oct (44)	Nov (30)	Dec (40)
2022	Jan (35)	Feb (29)	Mar (55)	Apr (30)	May (31)	Jun (27)	Jul (49)	Aug (15)	Sep (17)	Oct (25)	Nov (15)	Dec (40)
2023	Jan (32)	Feb (10)	Mar (10)	Apr (21)	May (33)	Jun (31)	Jul (12)	Aug (17)	Sep (14)	Oct (12)	Nov (8)	Dec (12)
2024	Jan (10)	Feb (18)	Mar (7)	Apr (4)	May (6)	Jun (4)	Jul (5)	Aug (6)	Sep (8)	Oct (1)	Nov (1)	Dec
2025	Jan	Feb	Mar (3)	Apr	May	Jun	Jul (1)	Aug (2)	Sep (3)	Oct (2)	Nov	Dec

Flat | Threaded

1 2 3 .. 464 > >> (Page 1 of 464)

Re: [Rdkit-discuss] 9th Advanced In Silico Drug Design Workshop in Olomouc

From: Pavel P. <pav...@uk...> - 2025-10-31 21:04:17

There is a problem with the link to the workshop page. It is resolved in 
some countries to the wrong page, in particular this issue affects US. 
Sorry for this inconvenience. We will try to resolve the issue asap.
You may join the discord server to get actual information. Below is 
information about the registration.

Registration

The meeting will be in a hybrid format. Lectures and tutorial files will 
be available online, but the on-hand tutorials will be only on-site.
Registration is https://forms.gle/pSeF5Pd2wGP5TK6c7.
Attendance at the event is free of charge. The workshop room holds up to 
36 seats, lecture room holds up to 80 seats.

Posters

Participants can present their research through an on-site poster 
session (for students, it is an Exam requirement). Posters will be 
accompanied by flash talks. Registrations with poster presentations will 
be prioritised.
Poster registration is https://forms.gle/ZYcDhbNkXQfpWWdu9.

Kind regards,
Pavel

On 31/10/2025 15:09, Pavel Polishchuk wrote:
> Dear colleagues,
>
> 🎯 Join Us for the 9th Advanced In Silico Drug Design Workshop & 
> Challenge!
> 📅 26–30 January 2026
> 📍 Palacký University, Olomouc, Czech Republic
> 🌐 https://www.kfc.upol.cz/9add, https://discord.gg/Baf8ySAc
>
> We invite you to attend the 9th Advanced In Silico Drug Design 
> Workshop & Challenge, a hands-on event bringing together researchers, 
> students, and professionals passionate about computational drug 
> discovery.
> This year’s focus areas include:
> •    Machine Learning & Artificial Intelligence in Drug Design
> •    Structure- and Ligand-Based Design Tools
> •    Pharmacophore Modelling
> •    Molecular Docking & Dynamics
> •    De Novo Design
> •    …and more!
>
> Lectures & Tutorials by Leading Experts:
> Prof. Thierry Langer (Austria), Prof. Alexandre Varnek (France), Prof. 
> Johannes Kirchmair (Austria), Prof. Hanoch Senderowitz (Israel), and 
> others from across Europe.
>
> All lectures will be streamed online. On-site tutorials will emphasize 
> the use of open-source software.
>
> For On-Site Participants:
> •    Present your poster and give a flash talk
> •    Take part in the Drug Design Challenge — test your skills in 
> identifying active compounds from a large structure dataset!
> •    New this year: A prospective evaluation of predictions made with 
> the challenge.
>
> Participation Fee: None.
>
> Kind regards,
> Pavel
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] 9th Advanced In Silico Drug Design Workshop in Olomouc

From: Pavel P. <pav...@uk...> - 2025-10-31 14:33:43

Dear colleagues,

🎯 Join Us for the 9th Advanced In Silico Drug Design Workshop & Challenge!
📅 26–30 January 2026
📍 Palacký University, Olomouc, Czech Republic
🌐 https://www.kfc.upol.cz/9add,  https://discord.gg/Baf8ySAc

We invite you to attend the 9th Advanced In Silico Drug Design Workshop 
& Challenge, a hands-on event bringing together researchers, students, 
and professionals passionate about computational drug discovery.
This year’s focus areas include:
•    Machine Learning & Artificial Intelligence in Drug Design
•    Structure- and Ligand-Based Design Tools
•    Pharmacophore Modelling
•    Molecular Docking & Dynamics
•    De Novo Design
•    …and more!

Lectures & Tutorials by Leading Experts:
Prof. Thierry Langer (Austria), Prof. Alexandre Varnek (France), Prof. 
Johannes Kirchmair (Austria), Prof. Hanoch Senderowitz (Israel), and 
others from across Europe.

All lectures will be streamed online. On-site tutorials will emphasize 
the use of open-source software.

For On-Site Participants:
•    Present your poster and give a flash talk
•    Take part in the Drug Design Challenge — test your skills in 
identifying active compounds from a large structure dataset!
•    New this year: A prospective evaluation of predictions made with 
the challenge.

Participation Fee: None.

Kind regards,
Pavel

[Rdkit-discuss] ANN: chemfp 5.0

From: Andrew D. <da...@da...> - 2025-09-24 14:04:46

Hi RDKit users,

  I've released chemfp 5.0, my Python package for cheminformatics
fingerprint generation, search, and analysis. You can install it on
Linux-based OSes using:

    python -m pip install chemfp -i https://chemfp.com/packages/

(Append "--upgrade" if you have already installed it.)

For a description of the changes since 4.2 see

  https://chemfp.com/docs/whats_new_in_50.html .

The highlights are:

 • Update the FPB format to handle over 1 billion fingerprints.
 
 • New chemfp shardsearch command-line tool which does similarity
    search across multiple target files and merges the result.
   - Tested with the 977 million structures in GDB-13
 
 • New chemfp simhistogram / chemfp simhist command-line tool and
    corresponding chemfp.simhistogram() high-level API function
    to create a histogram of similarity scores.
 
 • Initial support for count fingerprints:
   - new text-based FPC format based on the FPS format
   - rdkit2fpc tool which uses RDKit's sparse fingerprint generators
   - fpc2fps tool with various method to convert sparse count
       fingerprints to binary fingerprints
 
 • Fast implementations of the 4860-bit Klekota-Roth fingerprint
    for the OpenEye and RDKit toolkits.

Cheers,
 
				Andrew Dalke
				da...@da...
--
Have useful but old in-house cheminformatics software in need of refurbishment?
No one left knows how it works or has the time? Perhaps I can help. Contact me.

[Rdkit-discuss] Cookbook Contribution: Batch Fetch from PubChem + RDKit Visualization

From: ang Ho <hex...@gm...> - 2025-09-20 15:06:10

Hi RDKit maintainers,

I would like to contribute a new entry to the RDKit Cookbook that
demonstrates a streamlined workflow for fetching chemical data from PubChem
and visualizing it with RDKit.

This example showcases the integration between ChemInformant (a robust
PubChem data acquisition library) and RDKit, addressing a common workflow
need: efficiently converting chemical identifiers to molecular
visualizations. ChemInformant handles the complexity of PubChem API
interactions, identifier resolution, network reliability, and data
validation, while RDKit provides the powerful molecular processing and
visualization capabilities.

Key benefits of this integration:
- Demonstrates real-world data acquisition workflows
- Shows how to handle mixed identifier types (names, CIDs, SMILES)
- Illustrates robust error handling and batch processing
- Provides a complete pipeline from data fetching to visualization

The example requires ChemInformant as a dependency (pip install
ChemInformant), which I believe adds value by showing users a practical,
production-ready approach to PubChem data integration.

Here is the content in .rst format. Please let me know if any changes are
needed.

Thanks!

Best regards,
Zhiang He (HzaCode)

--- RST CONTENT BELOW ---

Batch Fetch from PubChem + RDKit Visualization
从 PubChem 批量获取数据并用 RDKit 可视化
================================================

Author: Zhiang He (HzaCode)
Original Source: https://github.com/HzaCode/ChemInformant
Index ID#: RDKitCB_41
Summary: Demonstrates a streamlined workflow for fetching chemical data
from PubChem and visualizing it with RDKit. Uses ChemInformant for robust
data acquisition, then processes molecules with RDKit for annotated
visualization.

Dependencies: This example requires ChemInformant (``pip install
ChemInformant``)

.. testcode:: RDKitCB_41
   from rdkit import Chem
   from rdkit.Chem import Draw, Descriptors
   from rdkit.Chem.Draw import IPythonConsole
   import ChemInformant as ci

   IPythonConsole.ipython_useSVG = True

   # Example compound identifiers (names, CIDs, or SMILES)
   identifiers = ["aspirin", "caffeine", "2244"]  # mixed identifier types

   # Fetch molecular data from PubChem using ChemInformant
   # This handles identifier resolution, network retries, and caching
automatically
   df = ci.get_properties(identifiers, ["canonical_smiles",
"molecular_weight", "iupac_name"])

   print("Fetched data:")
   print(df[["input_identifier", "canonical_smiles",
"molecular_weight"]].head())

   # Convert to RDKit molecules
   molecules = []
   valid_names = []

   for idx, row in df.iterrows():
       if row["status"] == "OK" and row["canonical_smiles"]:
           mol = Chem.MolFromSmiles(row["canonical_smiles"])
           if mol:
               # Add atom indices as atom map numbers for visualization
               for atom in mol.GetAtoms():
                   atom.SetAtomMapNum(atom.GetIdx())
               molecules.append(mol)
               valid_names.append(row["input_identifier"])

   # Create legends with molecular weight information
   legends = []
   for i, name in enumerate(valid_names):
       mw = Descriptors.MolWt(molecules[i])
       legends.append(f"{name}: MW={mw:.1f}")

   # Generate annotated molecular grid
   img = Draw.MolsToGridImage(molecules, legends=legends, subImgSize=(250,
250))
   img

.. testoutput:: RDKitCB_41
   Fetched data:
     input_identifier              canonical_smiles  molecular_weight
   0          aspirin      CC(=O)OC1=CC=CC=C1C(=O)O            180.16
   1         caffeine  CN1C=NC2=C1C(=O)N(C(=O)N2C)C            194.19
   2             2244      CC(=O)OC1=CC=CC=C1C(=O)O            180.16

[Rdkit-discuss] history of RDKit's count Tanimoto

From: Andrew D. <da...@da...> - 2025-09-04 16:40:13

Hi all,

RDKit implements Tanimoto similarity for count fingerprints. I only last week realized there's been a change in what "Tanimoto similarity" means for count fingerprints, and RDKit seems to be the reason for the shift. I'm curious to know the history.

* Tanimoto #1 is Σaᵢbᵢ/(Σaᵢ²+Σbᵢ²-Σaᵢbᵢ), that is, it interprets count fingerprints as a vector

The oldest citation I have is Bawden, "Browsing and Clustering of Chemical Structures" on p147 of "Chemical structures" (1988) from the first ICCS.

A more accessible citation is Willett, "Chemical Similarity Searching" JCICS (1998) 38, 983-996 available at https://web.archive.org/web/20040218213916/http://www-personal.engin.umich.edu:80/~wildd/che697/willett98.pdf . See page 987, the "formula for continuous values" under "Tanimoto Coefficient".

My literature search shows it was the main definition for almost 30 years.

* Tanimoto #2 is Σmin(aᵢ,bᵢ)/Σmax(aᵢ,bᵢ), that is, what Wikipedia calls the "weighted Jaccard similarity."

This is what RDKit uses. It was committed to Code/DataStructs/SparseIntVect.h on 2009-Jun-18, as part of adding Tversky similarity, and a couple of years after adding Dice similarity.

I believe that as a result of RDKit's popularity, recent papers have taking to describing this as, for example, "the counted Tanimoto similarity" in like https://jcheminf.biomedcentral.com/articles/10.1186/s13321-025-01081-6 ("also known as the multiset coefficient calculation").

Does anyone here know how RDKit came to be the way it is?

In my literature search, I believe the similarity function for Tanimoto #2 was first proposed by Henry Allan Gleason, "Some Applications of the Quadrat Method", Bulletin of the Torrey Botanical Club, Vol. 47, No. 1 (Jan., 1920), pp. 21-33, starting on page 31 where he proposes adding species abundance to Jaccard's similarity. See https://archive.org/details/jstor-2480223/page/n11/mode/2up

Some people (and https://en.wikipedia.org/wiki/Jaccard_index) refer to this as Ruzicka similarity, from Ruzicka (1958), but on the Mastodon discussion at https://mstdn.science/@molecule/115142680945701031 you'll wim (@mol...@ms...) got a copy of the relevant part of Ruzicka's paper, and it appears to be identical to Gleason's extension to Jaccard similarity -- not even in the cool looking min/max formulation as attributed in, eg, https://archive.org/details/dictionaryofdist0000deza/mode/2up?q=Ruzicka .

The first paper which applied Tanimoto #2 to fingerprints appears to be introduced by Swamidass et al., "Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity", Bioinformatics, Volume 21, Issue suppl_1, June 2005, Pages i359–i368, https://doi.org/10.1093/bioinformatics/bti1055 where they call it the "MinMax" kernel and explicitly compare it to Tanimoto #1.

Some papers since then refer to Tanimoto #2 as MinMax.

Now, I was able to find a use of (1-Tanimoto #2) as a similarity measure ("measure" used in its mathematical meaning) in Thomas Ott, Albert Kern, Ausgar Schuffenhauer, Maxim Popov, Pierre Acklin, Edgar Jacoby, and Ruedi Stoop, "Sequential Superparamagnetic Clustering for Unbiased Classification of High-Dimensional Chemical Data", J. Chem. Inf. Comput. Sci. 2004, 44, 1358-1364 available from https://tilde.ini.uzh.ch/users/tott/public_html/jcheminf.pdf but it is unnamed -- and a measure, not a similarity.

That makes me quite curious on how RDKit ended up the way it does.

To be clear, I prefer the similarity function given in #2 over that of #1, though I think having two "Tanimoto" definitions is going to be confusing. If only the Sheffield folks back in the 1980s had known. But hey, that's how we ended up with "Tanimoto" instead of "Jaccard". :)

Best regards,

Andrew
da...@da...

P.S.
If anyone knows of older citation, please let me know. There aren't good search tools for finding this formula, so it's a lot of tedious manual work.

[Rdkit-discuss] novel fingerprint count simulation evalutation

From: Andrew D. <da...@da...> - 2025-08-22 16:15:16

Is anyone here interested in evaluating my new method to emulate count fingerprints using binary fingerprints?

I've added that feature to chemfp5.0b2, released yesterday, but I don't have the expertise to evaluate its effectiveness.

In short, for most Linux-based OSes, install chemfp, generate count fingerprints, and convert count fingerprints to binary fingerprints using the following steps:

python -m pip install chemfp==5.0b2 -i https://chemfp.com/packages/
chemfp rdkit2fpc dataset.sdf.gz -o dataset.fpc
chemfp fpc2fps dataset.fpc -o dataset.fps

then use chemfp's "simsearch" for similarity search of the FPS (or FPB) files, like:

simsearch --query 'c1ccccc1O' -k 5 --out csv dataset.fps

The "--help" for these commands are documented at https://chemfp.com/docs/tool_help.html . The "FPC" format is my new text-based exchange format for count fingerprints, described at https://chemfp.com/fpc_format/ .

Here's some background.

RDKit supports several count fingerprints (Morgan, RDKit fingerprints, Atom Pair, and Torsion). These can be viewed as a list of (feature id, count) pairs.

By default RDKit converts these into binary fingerprints by folding the feature id, that is, setting the binary fingerprint bit i to 1, where i = (feature id) modulo fpSize. This method ignores the counts.

These fingerprint generators also implement a countSimulation method, which sets additional bits based on count thresholds. For example, if the countBounds is 1,3,9 then it sets 1 bit if the count is at least 1, two bits if the count is at least 3, and three bits if the count is at least 9. (The actual algorithm is a bit more complicated than this.)

I've come up with a new method which is a cross between Calvin Mooers' superimposed coding and the Daylight RNG approach.

It's based on the observation that Morgan fingerprints are typically quite sparse, eg, for Morgan3 count fingerprints from ChEMBL 33 the average fingerprint has 71 distinct features, with an average feature count of 1.5. That means there are on average 107 distinct possible bits to set in the output binary fingerprint, assuming each count sets 1 bit, eg, that feature 2246728737 with count 2 can set 2 bits.

But how to choose those bits?

My new method uses the feature id to seed an RNG, which is then used to get `count` output bit positions, randomly chosen from the output fingerprint size.

output_fp = BinaryFingerprint(num_bits)
for feature_id, count in features:
rng = RNG(feature_id)
for _ in range(count):
bitno = rng.randrange(num_bits)
output_fp.SetOnBit(bitno)

There are a couple of tunable parameters: 1) the output fingerprint size, 2) the number of bits to set for each count, and 3) an upper bound for the feature count, so the full algorithm is a bit more complicated:

output_fp = BinaryFingerprint(num_bits)
for feature_id, count in features:
rng = RNG(feature_id)
for _ in range(min(count, max_count) * bits_per_count):
bitno = rng.randrange(num_bits)
output_fp.SetOnBit(bitno)

The reason for "bits_per_count" is to reduce the effect of collisions. Double the fingerprint size and double the count keeps the output density roughly unchanged, but should reduce the collision rate between two pairs of (feature id, specific count).

That's my hand-waving belief, but I don't have the specific experience in evaluating fingerprint effectiveness.

I know other RDKit users do, and might be able to help.

What I know so far is it's a bit better than RDKit's count simulation at predicting MW. https://mstdn.science/@molecule/115063149386391787 :)

The "fpc2fps" command supports other methods, like "scaled", which is a cross between superimposed and the RDKit count simulation. Rather than use `count` random numbers, it takes a lookup table of count thresholds to get the actual repeat to use. See the fpc2fps --help-methods for more complete details, or contact me.

This 5.0b2 release also includes a "simhistogram" method to generate a histogram from all possible Tanimoto scores, a "shardsearch" method to search multiple target files ("shards") and merge the results, and it has a reasonably performant implementation of the 4860-bit Klekota-Roth fingerprint.

See https://chemfp.com/docs/whats_new_in_50.html to learn more.

Best regards,

Andrew
da...@da...

[Rdkit-discuss] 2D Coordinates for RNA

From: <tho...@bo...> - 2025-08-13 09:37:32

Attachments: image001.png image002.png

Dear all,

I am working with peptides and RNA and want to convert sequences into 2D molecules.
As we use non-natural and proprietary monomers, I cannot apply the ususal workflows like MolFromHELM,
but have developed my own python code to build the macromolecules from their building blocks (basically
using Chem.CombineMols and then rdDepictor.Compute2DCoords,
see https://github.com/Boehringer-Ingelheim/pyPept/blob/master/src/pyPept/molecule.py).

While this works fine for even large peptides (>40 monomers), when doing the same for RNA I run into a problem:
after a certain size (about 12 or 13 nucleotides), the 2D embedding returns all coordinates as zeroes and all stereoinformation
is lost.

I tried the same using MolFromHELM, and there I do not see the same issue, I get valid 2D coordinates up to hundreds of nucleotides
(yes, other than what the documentation says, RNA and DNA work, too!).
Only if I first generate the molecule and then pass it through either rdCoordGen.AddCoords or Chem.rdDepictor.Compute2DCoords
I end up with coordinates as zero. So I suppose MolFromHELM knows sth about the general structure of the building blocks and uses that information,
whereas the all-purpose embedders cannot take that into account and subsequently fail. But then again, this MolFromHELM is not an option as I need non-natural
monomers (unless there is a way to teach rdkit about non-canonical monomers, but I haven't found anything on it).

Here is the relevant code snippet:

from rdkit import Chem
from rdkit.Chem import rdCoordGen

n_nucleotides = 20

polyA = ['R(A)P'] * n_nucleotides
polyA = '.'.join(polyA)
helm = f'RNA1{{{polyA}}}$$$$V2.0'

romol = Chem.MolFromHELM(helm)
#rdCoordGen.AddCoords(romol)

mb = Chem.MolToMolBlock(romol)

print(mb[1:300])

Now everything looks fine, but as soon as I uncomment the rdCoordGen line, the coordinates are zero.

Any ideas, suggestions what I could do?

Thanks,
Th.


Thomas Fox
NCE

Boehringer Ingelheim Pharma GmbH & Co. KG
Birkendorfer Str. 65 | 88397 Biberach

T +49 (7351) 54-7585<tel:+49%20(7351)%2054-7585>
E tho...@bo...<mailto:tho...@bo...>

[cid:image001.png@01DC0C43.6E1F9D20]<https://www.boehringer-ingelheim.com/de/>

 Save my contact
[cid:image002.png@01DC0C43.6E1F9D20]<https://eu.signature365.com/vcard/Kw7HIjoOKeNUKEl8-frtUBHxNbDdPdO1Z.vcf>



Pflichtangaben finden Sie unter: hier<https://www.boehringer-ingelheim.com/de/unser-unternehmen/gesellschaften-in-deutschland>
Mandatory information can be found at: here<https://www.boehringer-ingelheim.com/de/unser-unternehmen/gesellschaften-in-deutschland>

Datenschutzhinweis: Klicken Sie hier<https://www.boehringer-ingelheim.com/de/datenschutz>, um weitere Informationen auf der lokalen Unternehmensinternetseite des betreffenden Landes über Datenschutz bei Boehringer Ingelheim und zu Ihren Rechten zu erhalten. Privacy Notice: Click here<https://www.boehringer-ingelheim.com/de/datenschutz> for more information on the local company website of the respective country about data protection at Boehringer Ingelheim and your rights.

Diese E-Mail ist vertraulich zu behandeln. Sie kann besonderem rechtlichem Schutz unterliegen. Wenn Sie nicht der richtige Adressat sind, senden Sie bitte diese E-Mail an den Absender zurück, löschen die eingegangene E-Mail und geben den Inhalt der E-Mail nicht weiter. Jegliche unbefugte Bearbeitung, Nutzung, Vervielfältigung oder Verbreitung ist verboten. / This e-mail is confidential and may also be legally privileged. If you are not the intended recipient please reply to sender, delete the e-mail and do not disclose its contents to any person. Any unauthorized review, use, disclosure, copying or distribution is strictly prohibited.

[Rdkit-discuss] [Postdoc opportunity] ARISE2 postdoctoral fellowship call

From: Noel O'B. <bao...@gm...> - 2025-07-29 07:29:03

Hi all,

If you are familiar with RDKit and are finishing a PhD or postdoc, I
encourage you to take a look at the call for applications for an ARISE2
postdoctoral fellowship on our blog (
https://chembl.blogspot.com/2025/07/invite-to-apply-for-arise2-postdoctoral.html).
This is a chance to work in the Chemical Biology Services team at EMBL-EBI
improving the resources that many in the community rely upon, such as
ChEMBL and SureChEMBL.

If you are interested, please get in touch.

Regards,
Noel

Re: [Rdkit-discuss] canonical fragment SMILES

From: Pavel P. <pav...@uk...> - 2025-03-28 07:57:30

Thank you, Wim. It works. Even a simpler solution can be to remove all 
atoms except required ones. I had to guess :)
However, this is a bug in the recent RDKit versions. The function 
MolFragmentToSmiles works correctly in version 2023, but not in 2024.

On 28/03/2025 00:10, Wim Dehaen wrote:
> Pavel,
> this is a bit hacky, but you can try the below:
> ```
> def get_frag_smi(mol,frag_atoms):
>     if len(frag_atoms) > 1:
>         b2b = [] # bonds to break
>         fsmi = "" #fragment smiles
>         # get bonds outside of fragment
>         for b in mol.GetBonds():
>             b_idx = b.GetBeginAtomIdx()
>             e_idx = b.GetEndAtomIdx()
>             if e_idx not in frag_atoms\
>             or b_idx not in frag_atoms:
>                 b2b.append(b.GetIdx())
>         # break all bonds except those in fragments
>         fmol = Chem.FragmentOnBonds(mol,b2b,addDummies=0)
>         smis = Chem.MolToSmiles(fmol).split(".")
>         # retain the only fragment with more than one atom in there
>         while fsmi == "":
>             smi = smis.pop(0)
>             m = Chem.MolFromSmiles(smi,sanitize=False)
>             if len(m.GetAtoms()) > 1:
>                 fsmi = smi
>     else: #one atom, no canonicalize needed
>         fsmi = Chem.MolFragmentToSmiles(mol, frag_atoms)
>     return fsmi
> ```
> it is based on the observation/assumption that FragmentOnBonds() and 
> then MolToSmiles() canonizes the fragments cleanly.
> > print(get_frag_smi(mol,[1,2,3,17]))
> > print(get_frag_smi(mol,[9,10,11,12]))
> prints `cN(c)O` twice.
>
> best wishes,
> wim
>
> On Thu, Mar 27, 2025 at 12:23 PM Pavel Polishchuk 
> <pav...@uk...> wrote:
>
>     Hello,
>
>       I encountered an issue with SMILES of fragments. Maybe someone
>     may suggest a workaround.
>       I attached the notebook, but will also reproduce some code here.
>
>       We have a structure with two Ns and we take an N atom and
>     adjacent atoms to make a fragment SMILES and got different
>     results, while SMILES represent the same pattern (only the order
>     of atoms is different). I guess this happens due to
>     canonicalization algorithm, which takes into account some
>     additional information missing in the output SMILES (e.g. ring
>     membership). For instance, if we break a saturated cycle (bond
>     8-9), we get identical SMILES output.
>
>     mol = Chem.MolFromSmiles('CCn1c2cccc3CCn(c23)c2ccccc12')
>
>
>     print(Chem.MolFragmentToSmiles(mol, [1,2,3,17], canonical=True))
>     print(Chem.MolFragmentToSmiles(mol, [9,10,11,12], canonical=True))
>
>     cN(C)c
>     cN(c)C
>
>       So, the question is how to workaround this issue? We already
>     have millions of such patterns. So, it will work if we will be
>     able to canonicalize them. However, standard canonicalization does
>     not work, because we have disable sanitization during SMILES
>     parsing. It returns the same output as input SMILES. Any ideas are
>     appreciated.
>
>     print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(C)c', sanitize=False)))
>     print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(c)C', sanitize=False)))
>
>     cN(C)c
>     cN(c)C
>
>       This issue actually came from the code of identification of
>     functional groups.
>
>     Kind regards,
>     Pavel
>     _______________________________________________
>     Rdkit-discuss mailing list
>     Rdk...@li...
>     https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

Re: [Rdkit-discuss] canonical fragment SMILES

From: Wim D. <wim...@gm...> - 2025-03-27 23:10:53

Attachments: M2yVA0DoFomTHx83.png

Pavel,
this is a bit hacky, but you can try the below:
```
def get_frag_smi(mol,frag_atoms):
    if len(frag_atoms) > 1:
        b2b = [] # bonds to break
        fsmi = "" #fragment smiles
        # get bonds outside of fragment
        for b in mol.GetBonds():
            b_idx = b.GetBeginAtomIdx()
            e_idx = b.GetEndAtomIdx()
            if e_idx not in frag_atoms\
            or b_idx not in frag_atoms:
                b2b.append(b.GetIdx())
        # break all bonds except those in fragments
        fmol = Chem.FragmentOnBonds(mol,b2b,addDummies=0)
        smis = Chem.MolToSmiles(fmol).split(".")
        # retain the only fragment with more than one atom in there
        while fsmi == "":
            smi = smis.pop(0)
            m = Chem.MolFromSmiles(smi,sanitize=False)
            if len(m.GetAtoms()) > 1:
                fsmi = smi
    else: #one atom, no canonicalize needed
        fsmi = Chem.MolFragmentToSmiles(mol, frag_atoms)
    return fsmi
```
it is based on the observation/assumption that FragmentOnBonds() and then
MolToSmiles() canonizes the fragments cleanly.
> print(get_frag_smi(mol,[1,2,3,17]))
> print(get_frag_smi(mol,[9,10,11,12]))
prints `cN(c)O` twice.

best wishes,
wim

On Thu, Mar 27, 2025 at 12:23 PM Pavel Polishchuk <pav...@uk...>
wrote:

> Hello,
>
>   I encountered an issue with SMILES of fragments. Maybe someone may
> suggest a workaround.
>   I attached the notebook, but will also reproduce some code here.
>
>   We have a structure with two Ns and we take an N atom and adjacent atoms
> to make a fragment SMILES and got different results, while SMILES represent
> the same pattern (only the order of atoms is different). I guess this
> happens due to canonicalization algorithm, which takes into account some
> additional information missing in the output SMILES (e.g. ring membership).
> For instance, if we break a saturated cycle (bond 8-9), we get identical
> SMILES output.
>
> mol = Chem.MolFromSmiles('CCn1c2cccc3CCn(c23)c2ccccc12')
>
>
> print(Chem.MolFragmentToSmiles(mol, [1,2,3,17], canonical=True))
> print(Chem.MolFragmentToSmiles(mol, [9,10,11,12], canonical=True))
>
> cN(C)c
> cN(c)C
>
>   So, the question is how to workaround this issue? We already have
> millions of such patterns. So, it will work if we will be able to
> canonicalize them. However, standard canonicalization does not work,
> because we have disable sanitization during SMILES parsing. It returns the
> same output as input SMILES. Any ideas are appreciated.
>
> print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(C)c', sanitize=False)))
> print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(c)C', sanitize=False)))
>
> cN(C)c
> cN(c)C
>
>   This issue actually came from the code of identification of functional
> groups.
>
> Kind regards,
> Pavel
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

[Rdkit-discuss] canonical fragment SMILES

From: Pavel P. <pav...@uk...> - 2025-03-27 11:19:34

Attachments: frag_smi_canonization.ipynb

Hello,

   I encountered an issue with SMILES of fragments. Maybe someone may 
suggest a workaround.
   I attached the notebook, but will also reproduce some code here.

   We have a structure with two Ns and we take an N atom and adjacent 
atoms to make a fragment SMILES and got different results, while SMILES 
represent the same pattern (only the order of atoms is different). I 
guess this happens due to canonicalization algorithm, which takes into 
account some additional information missing in the output SMILES (e.g. 
ring membership). For instance, if we break a saturated cycle (bond 
8-9), we get identical SMILES output.

mol = Chem.MolFromSmiles('CCn1c2cccc3CCn(c23)c2ccccc12')


print(Chem.MolFragmentToSmiles(mol, [1,2,3,17], canonical=True))
print(Chem.MolFragmentToSmiles(mol, [9,10,11,12], canonical=True))

cN(C)c
cN(c)C

   So, the question is how to workaround this issue? We already have 
millions of such patterns. So, it will work if we will be able to 
canonicalize them. However, standard canonicalization does not work, 
because we have disable sanitization during SMILES parsing. It returns 
the same output as input SMILES. Any ideas are appreciated.

print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(C)c', sanitize=False)))
print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(c)C', sanitize=False)))

cN(C)c
cN(c)C

   This issue actually came from the code of identification of 
functional groups.

Kind regards,
Pavel

[Rdkit-discuss] talus - tools to make SMARTS-based fingerprint generators

From: Andrew D. <da...@da...> - 2024-11-04 16:35:34

Hi all,

 I've spent the last while working on some techniques to improve the performance of SMARTS-based fingerprint generators. It's called "talus" and is available at https://hg.sr.ht/~dalke/talus .

It's able to improve the performance of Klekota-Roth fingerprint generation by about a factor of 12.

These fingerprints have long been described as a slow to generate, eg, "PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints" (2010) at https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.21707 says "The slowest algorithms are the Klekota-Roth fingerprint and Klekota-Roth fingerprint count because they are matching 4860 SMARTS patterns for each molecule.", which they timed as taking 14x the time of MACCS-key generation.

The fastest version is available at https://hg.sr.ht/~dalke/talus/browse/KlekotaRoth/kr_filtered_atomtypes.py?rev=tip , which takes a SMILES file and generates the fingerprints in chemfp's FPS format, as a standalone file which depends only on RDKit.

## How does it work?

This effort comes from looking at the Klekota-Roth fingerprints (defined in the supplementary data for "Chemical substructures that enrich for biological activity", doi: 10.1093/bioinformatics/btn479, https://academic.oup.com/bioinformatics/article/24/21/2518/192573 and available with a few minor syntax changes in the CDK's Java sources), which contains 4,860 SMARTS strings including

[!#1][CH2][CH]([!#1])[!#1]
  -and-
OS(=O)(=O)c1ccc(NN=C(C=O)C=O)cc1

The direct translation into a set of if statements, conceptually like

  _pat123 = Chem.MolFromSmarts("[!#1][CH2][CH]([!#1])[!#1]")
     ...
  if mol.HasSubstructMatch(_pat123):
     fp.SetBit(123)

takes 417 seconds in my standard benchmark of about 30,000 SMILES string, of which only 2.6 seconds is parsing the SMILES string and the rest is the SMARTS matches.

I was able to speed this up by a factor of 12 using the following techniques:

1) create a filter based on atom types counts, something like:

  _at1_pat = Chem.MolFromSmarts("[!#1]")
  _at2_pat = Chem.MolFromSmarts("[CH2]")
  _at3_pat = Chem.MolFromSmarts("[CH]")
  ...
  num_at1 = len(mol.GetSubstructMatches(_at1_pat))
  num_at2 = len(mol.GetSubstructMatches(_at2_pat))
  num_at3 = len(mol.GetSubstructMatches(_at3_pat))
  ...
  if (num_at1 >= 3 and num_at2 >= 1 and num_at3 >= 1
      and mol.HasSubstructMatch(_pat123)):
     fp.SetBit(123)

2) Analyze the atom SMARTS to recognize that, for example, both "[CH2]" and "[CH]" will always match "[!#1]", so the minimum counts can be increased to:

  if (num_at1 >= 5 and num_at2 >= 1 and num_at3 >= 1
      and mol.HasSubstructMatch(_pat123)):
     fp.SetBit(123)

3) Identify SMARTS prefixes, which provide a natural tree structure.

For example, the last two SMARTS patterns in the Klekota-Roth keys are:

SCCS
SCCS(=O)=O

There is no reason to test for "SCCS(=O)=O" if "SCCS" does not pass, in which case there's no need to repeat the check for "S" and "C" counts, resulting in something like:

  if (num_S >= 2 and num_C >= 2
       and mol.HasSubstructMatch(_pat_4858)):
      fp.SetBit(4858)
      if (num_O >= 2
           and mol.HasSubstructMatch(_pat_4859)):
          fp.SetBit(4859)

4) Improve the effectiveness of SMARTS prefixes

The SMARTS patterns are generated by Daylight's canonicalization rules then sorted ASCII-betically, but the SMARTS prefix method works better if the SMARTS starts with a unlikely chain terminal. For example, bit 3994 (key 3995) is "COc1cccc(C=NNC(=O)CO)c1", but "OCC(NN=Cc1cccc(OC)c1)=O" is an equivalent SMARTS with a longer initial chain.

5) Identify SMARTS prefixes which can be inserted as a filter.

Here's an example of what that looks like, with a bit number followed by the SMARTS pattern, where the "*" indicates that a pattern is only used for filtering:

2949 Br
3222 BrC
* BrCC filter 7 patterns
3332 BrC(C)(Br)Br
3333 BrC(C)C(=O)N
* BrCCC filter 3 patterns
3430 BrCC(C)O
2973 BrCCC=O
4683 BrCCC(NC(O)=O)=O
4227 BrC(C(N)NC=O)(Br)Br
4692 BrC(C(O)NC=O)(Br)Br

This says the "Br" is one of the keys, so bits 3222, 3332, 3333, 3430, etc. will not be tested unless Br exists.

It further notices that "BrCC" is a common prefix to 7 patterns, so on the assumption that the overhead of one rejection test (which should be usual case) saves the time needed to do 7 additional test, it adds that extra filter.

The "BrCCC" is provide a further refinement.

All told, this brought Klekota-Roth fingerprint generation down to 33.5 seconds, of which 2.3 seconds (7%) was for SMILES processing so another 10x performance gain may be possible.

## These gains are not necessarily portable

These impressive performance gains are possible because of how the Klekota-Roth keys were generated. For the subset of the PubChem keys which can be handled by "HasSubstructMatch" to a SMARTS pattern, the overall performance is only 2x, not 12x.

## Possible future directions

A clear direction for future improvement would be to build a decision tree based on all reasonable SMARTS subgraphs, tuned by match statistics from a representative selection of molecules. Another extension would be to handle minimum counts, like how "at least 2 rings of size 6", (expressed as "*~1~*~*~*~*~*~1" or "[R]@1@[R]@[R]@[R]@[R]@[R]@1") requires at least 7 ring atoms.

Anyone thinking further along these lines may be interested in "Efficient matching of multiple chemical subgraphs" at https://www.nextmovesoftware.com/talks/Sayle_MultipleSmarts_ICCS_201106.pdf .

I wanted a system which could generate a Python module, rather than a C/C++/Java library, resulting in different trade-offs.

## Methods to analyze atom and bond SMARTS terms

Developing this package required building a parser for the atom and bond SMARTS terms so I could tell if one atom SMARTS is a ubset of another atoms SMARTS. (I let RDKit handle the full SMARTS parsing, then use QueryAtom.GetSmarts() or QueryBond.GetSmarts() to get the actual SMARTS terms).

I think it may be of broader interest for anyone working with SMARTS as a syntax level.

For example, the test driver takes a SMARTS string and gives a breakdown of the different components, and where that information came from in the SMARTS term:

% python smarts_parse.py '[#6a]=@[PH+]'
Pattern SMARTS: [#6a]=@[PH+]
 atoms[0]: [#6&a] -> [c;R;!X0]
            ^^ ^  elements: [c]
               ^  in_ring: [R]
                  connectivities: [!X0]
                   + from SMARTS topology
 atoms[1]: [P&H1&+] -> [P;H1;h0,h1;+1;!X0]
            ^       elements: [P]
              ^^    total_hcount: [H1]
              ^^    implicit_hcount: [h0,h1]
                 ^  charges: [+1]
                    connectivities: [!X0]
                     + from SMARTS topology
 bonds[0]: =&@ (between atoms 0 and 1) -> '=;@'
           ^   bondtypes: [=]
             ^ in_ring: [@]

This is able to figure out that "[#6a]" means it must be an aromatic carbon, which means it must be in a ring. It also knows from the SMARTS topology that there must be at least one bond (hence [!X0]). Were it a bit more clever, the "R" should tell it there are at least two bonds, both ring bonds, but that's for the future to fix.

It also adds some additional constraints (which I conjectured would be useful atom typing) like how "H1" means the implicit hydrogen count must be only 0 or 1.

Some of this work dates back to a SMARTS regular-expression based tokenizer I contributed to Brian Kelly's FROWNS project back in 2001 or so! See https://frowns.sourceforge.net/ .

If you want to take this effort further, please contact me and I'll provide some help, thoughts, and advice!

					Andrew
					da...@da...

[Rdkit-discuss] 8th Advanced In Silico Drug Design workshop in Olomouc

From: Pavel P. <pav...@uk...> - 2024-10-25 13:34:58

Attachments: 8ADD_flyer.pdf

Dear colleagues,

   we are glad to invite you to the 8th Advanced In Silico Drug Design 
workshop which will be 27-31 January 2025 at Palacky University in 
Olomouc (Czech Republic).
   This year we cover topics on:
   - virtual screening
   - machine learning and AI
   - structure- and ligand-based drug design tools
   - pharmacophore modeling
   - molecular docking and dynamics
   - de novo design
   - chemical space visualization and others
   Lectures and tutorials will be provided by experts in the field from 
Austria, France, Italy, Israel and Czech Republic. In particular, Prof. 
Thierry Langer, Prof. Alexandre Varnek, Prof. Johannes Kirchmair, Prof. 
Hanoch Senderowitz, Prof. Alexander Domling.
   There is no fee. The web-site of the workshop 
https://www.kfc.upol.cz/8add.

Kind regards,
Pavel

[Rdkit-discuss] Standardizer.standardize() == rdMolStandardize.???

From: <ml...@li...> - 2024-09-27 07:10:03

Hello,

Recently, I updated some Python code using

from rdkit.Chem.MolStandardize import Standardizer # rdkit <= 
'2023.09.3'

to

from rdkit.Chem.MolStandardize import rdMolStandardize # rdkit >= 
'2024.03.5'

Because after an rdkit fresh install, rdkit was updated and my former 
code
stopped working.

My old code was this:
---
standardizer = Standardizer()
def standardize(preserve_stereo, preserve_taut, mol):
     if preserve_stereo or preserve_taut:
         s_mol = standardizer.standardize(mol)
         # We don't need to get fragment parent, because the charge 
parent is the largest fragment
         s_mol = standardizer.charge_parent(s_mol, skip_standardize=True)
         s_mol = standardizer.isotope_parent(s_mol, 
skip_standardize=True)
         if not preserve_stereo:
             s_mol = standardizer.stereo_parent(s_mol, 
skip_standardize=True)
         if not preserve_taut:
             s_mol = standardizer.tautomer_parent(s_mol, 
skip_standardize=True)
         return standardizer.standardize(s_mol)
     else:
         # standardizer.super_parent(mol): _NOT_ 
standardizer.standardize(mol)
         # which doesn't even unsalt the molecule...
         return standardizer.super_parent(mol)
---

And the new code is:
---
def standardize(preserve_stereo, preserve_taut, mol):
     if preserve_stereo or preserve_taut:
         # We don't need to get fragment parent, because the charge 
parent is the largest fragment
         s_mol = rdMolStandardize.ChargeParent(mol, 
skipStandardize=False)
         s_mol = rdMolStandardize.IsotopeParent(s_mol, 
skipStandardize=True)
         if not preserve_stereo:
             s_mol = rdMolStandardize.StereoParent(s_mol, 
skipStandardize=False)
         if not preserve_taut:
             s_mol = rdMolStandardize.TautomerParent(s_mol, 
skipStandardize=False)
         return s_mol
     else:
         return rdMolStandardize.SuperParent(mol, skipStandardize=False)
---

Which I hope is isofunctional.

The old Standardizer module had a "standardize" method.

Is this method also present in rdMolStandardize?

Has it changed name (e.g. to rdMolStandardize.Cleanup)?

Regards,
Francois.

[Rdkit-discuss] Survey on data formats [responses welcome]

From: <dd...@wp...> - 2024-09-18 17:34:37

Hi all,   The following survey aims to 
gather empirical data to better understand the expectations of data 
format users concerning comparing them.   It should take no more than 10 minutes:   forms.gle https://forms.gle/K9AR6gbyjCNCk4FL6   Your response would be greatly appreciated!   Best,  Dominik

Re: [Rdkit-discuss] Impose conformation of molecule substructure?

From: Manish S. <ms...@sa...> - 2024-09-12 18:14:45

Attachments: image003.jpg

Hi Kurt,

 

You might find the following scripts helpful for enumerating compounds and
align them to a reference molecule:

 

o RDKitEnumerateCompoundLibrary.py
<http://www.mayachemtools.org/docs/scripts/html/RDKitEnumerateCompoundLibrar
y.html> 

o RDKitPerformPositionalAnalogueScan.py
<http://www.mayachemtools.org/docs/scripts/html/RDKitPerformPositionalAnalog
ueScan.html> 

o RDKitGenerateConstrainedConformers.py
<http://www.mayachemtools.org/docs/scripts/html/RDKitGenerateConstrainedConf
ormers.html> 

o RDKitPerformConstrainedMinimization.py
<http://www.mayachemtools.org/docs/scripts/html/RDKitPerformConstrainedMinim
ization.html> 

 

Let me know of any further questions.

 

Thanks,

Manish

 

 

From: Kurt Thorn <kur...@ar...> 
Sent: Thursday, September 12, 2024 8:50 AM
To: rdk...@li...
Subject: [Rdkit-discuss] Impose conformation of molecule substructure?

 

Hi All -

 

I would like to enumerate a virtual library of a compound family we have a
crystal structure of, where I want to model structures of multiple
substituents at a single site. What I would like to do is enforce that the
constant part of the molecule assume the conformation in the crystal
structure and enumerate just conformers for the new substituent added. Does
anyone have a  suggestion for how to achieve this in rdkit?

 

Thanks,

Kurt

 

 



 Dr Kurt Thorn

Chief Technology Officer

+1.609.423.1571 (US Office)

+1.415.298.3495 (US Mobile)

kur...@ar... <mailto:kur...@ar...> 

www.arrepath.com <http://www.arrepath.com/> 

 

ArrePath Inc.

303A College Road East

Princeton, NJ 08540

U.S.

Re: [Rdkit-discuss] Impose conformation of molecule substructure?

From: Kurt T. <kur...@ar...> - 2024-09-12 17:26:29

Attachments: Outlook-Logo

Thanks Stephen!

That code pointed me to the key "coordMap" parameter for fixing atom coordinates.

Kurt
________________________________
From: Stephen Roughley <s.d...@go...>
Sent: Thursday, September 12, 2024 9:40 AM
To: Kurt Thorn <kur...@ar...>
Cc: rdk...@li... <rdk...@li...>
Subject: Re: [Rdkit-discuss] Impose conformation of molecule substructure?

Hi Kurt,

The Vernalis KNIME community contribution has a node "Templated Conformer Generator (RDKit)" (see https://hub.knime.com/n/wK3RJiystQYq5M9w ) which will do exactly this.  If you don't want to do it in KNIME, then you can see the relevant bits of the Java source at:

https://github.com/vernalis/vernalis-knime-nodes/blob/d125b97ad2841133622150c168472168547c4ff3/com.vernalis.knime.chem.pmi/src/com/vernalis/knime/chem/pmi/nodes/confs/rdkitgenerate/RdkitConfgenNodeModel.java#L441-L465

and in particular at:

https://github.com/vernalis/vernalis-knime-nodes/blob/d125b97ad2841133622150c168472168547c4ff3/com.vernalis.knime.chem.pmi/src/com/vernalis/knime/chem/pmi/nodes/confs/rdkitgenerate/RdkitConfgenNodeModel.java#L688-L809

Steve

On Thu, 12 Sept 2024 at 17:13, Kurt Thorn <kur...@ar...<mailto:kur...@ar...>> wrote:
Hi All -

I would like to enumerate a virtual library of a compound family we have a crystal structure of, where I want to model structures of multiple substituents at a single site. What I would like to do is enforce that the constant part of the molecule assume the conformation in the crystal structure and enumerate just conformers for the new substituent added. Does anyone have a  suggestion for how to achieve this in rdkit?

Thanks,
Kurt

[Logo    Description automatically generated]

 Dr Kurt Thorn

Chief Technology Officer

+1.609.423.1571 (US Office)

+1.415.298.3495 (US Mobile)

kur...@ar...<mailto:kur...@ar...>

www.arrepath.com<http://www.arrepath.com/>

ArrePath Inc.

303A College Road East

Princeton, NJ 08540

U.S.

_______________________________________________
Rdkit-discuss mailing list
Rdk...@li...<mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Impose conformation of molecule substructure?

From: Stephen R. <s.d...@go...> - 2024-09-12 16:40:53

Attachments: Outlook-Logo

Hi Kurt,

The Vernalis KNIME community contribution has a node "Templated Conformer
Generator (RDKit)" (see https://hub.knime.com/n/wK3RJiystQYq5M9w ) which
will do exactly this.  If you don't want to do it in KNIME, then you can
see the relevant bits of the Java source at:

https://github.com/vernalis/vernalis-knime-nodes/blob/d125b97ad2841133622150c168472168547c4ff3/com.vernalis.knime.chem.pmi/src/com/vernalis/knime/chem/pmi/nodes/confs/rdkitgenerate/RdkitConfgenNodeModel.java#L441-L465

and in particular at:

https://github.com/vernalis/vernalis-knime-nodes/blob/d125b97ad2841133622150c168472168547c4ff3/com.vernalis.knime.chem.pmi/src/com/vernalis/knime/chem/pmi/nodes/confs/rdkitgenerate/RdkitConfgenNodeModel.java#L688-L809

Steve



On Thu, 12 Sept 2024 at 17:13, Kurt Thorn <kur...@ar...> wrote:

> Hi All -
>
> I would like to enumerate a virtual library of a compound family we have a
> crystal structure of, where I want to model structures of multiple
> substituents at a single site. What I would like to do is enforce that the
> constant part of the molecule assume the conformation in the crystal
> structure and enumerate just conformers for the new substituent added. Does
> anyone have a  suggestion for how to achieve this in rdkit?
>
> Thanks,
> Kurt
>
>
>
> *[image: Logo Description automatically generated]*
>
>  *Dr Kurt Thorn*
>
> *Chief Technology Officer*
>
> +1.609.423.1571 (US Office)
>
> +1.415.298.3495 (US Mobile)
>
> kur...@ar...
>
> www.arrepath.com
>
>
>
> *ArrePath Inc.*
>
> 303A College Road East
>
> Princeton, NJ 08540
>
> U.S.
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

[Rdkit-discuss] Impose conformation of molecule substructure?

From: Kurt T. <kur...@ar...> - 2024-09-12 16:10:33

Attachments: Outlook-Logo Desc

Hi All -

I would like to enumerate a virtual library of a compound family we have a crystal structure of, where I want to model structures of multiple substituents at a single site. What I would like to do is enforce that the constant part of the molecule assume the conformation in the crystal structure and enumerate just conformers for the new substituent added. Does anyone have a  suggestion for how to achieve this in rdkit?

Thanks,
Kurt




[Logo  Description automatically generated]

 Dr Kurt Thorn

Chief Technology Officer

+1.609.423.1571 (US Office)

+1.415.298.3495 (US Mobile)

kur...@ar...<mailto:kur...@ar...>

www.arrepath.com<http://www.arrepath.com/>



ArrePath Inc.

303A College Road East

Princeton, NJ 08540

U.S.

Re: [Rdkit-discuss] Question about source code to do energy minimization with UFF or MMFF94 force field

From: Andrew D. <da...@da...> - 2024-09-11 12:47:44

Hi Srdjan,

> On Sep 11, 2024, at 11:12, Srdjan Pusara <srd...@ho...> wrote:
>   I would like to ask is it possible to find source code how these [UFF] interaction terms were implemented?

The RDKit source code is available at https://github.com/rdkit/rdkit/tree/master .

Use the green button labeled "<> Code" to get the source code either through the git version control tool, or as a zip file.

If you want to use the web interface, see https://github.com/rdkit/rdkit/tree/master/Code/ForceField/UFF

Best regards,

				Andrew
				da...@da...

[Rdkit-discuss] Question about source code to do energy minimization with UFF or MMFF94 force field

From: Srdjan P. <srd...@ho...> - 2024-09-11 09:12:54

  Hello,

I have seen that Rdkit can return force field parameters between group of atoms (bond_params = rdForceFieldHelpers.GetUFFBondStretchParams(mol, 6, 1),angle_params = rdForceFieldHelpers.GetUFFAngleBendParams(mol, 0, 1, 2) etc).

  I would like to ask is it possible to find source code how these interaction terms were implemented? I understand that these equations can be implemented by reading original paper, but it would be helpful to access the source code od RDkit where these interaction terms are already implemented.  In addiion, I have noticed that original UFF paper has some small errors or typos, so having already implemented source code would help.


  Thanx for help in advance.

[Rdkit-discuss] Python RDkit job in the UK

From: Joe B. <Joe...@Sc...> - 2024-08-28 13:55:51

Hi all,

We are recruiting for a full-time developer with python, RDkit and chemistry experience to work on our Compliance Hub applications. These are used by many of the world's top pharmaceutical companies, CROs and specialist chemical suppliers to ensure compliance with complex and chemical regulations globally.

For more information and to apply please see https://blog.scitegrity.com/news/blog-post-1-0-5-1-0-1

It's a remote role, although you do need to be UK based.

Best regards

Joe Bradley
CEO, Scitegrity Limited

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Scitegrity accepts no liability for any damage caused by any virus transmitted by this email. Scitegrity accept no liability for any advice relating to controlled substances given in this e-mail.

Re: [Rdkit-discuss] Atom aromatic, but not in a ring

From: Andrew D. <da...@da...> - 2024-08-27 13:57:15

On Aug 27, 2024, at 14:44, Ingvar Lagerstedt <in...@ne...> wrote:
> To me it would make sense if RDKit removed the aromatic flags for any atom that is no longer in a ring when deleting an aromatic atom/bond.
> Alternatively remove the aromatic flag on any non-ring atom when attempting to kekulize the structure rather than throwing an exception.

As Noel commented, the toolkit can't make that assumption. Different people will have different reasons for removing an atom. Some will want to remove multiple atoms, for example.

Here is one function to remove an atom. It converts the molecule to Kekulé form, updates the hydrogen counts on the neighboring atoms, and then removes the specified atom:

from rdkit import Chem

m = Chem.MolFromSmiles('c1ccccc1')
mw = Chem.RWMol(m)
Chem.Kekulize(mw, clearAromaticFlags=True)

atom_idx = 0
atom = mw.GetAtomWithIdx(atom_idx)
for bond in atom.GetBonds():
    int_bondtype = int(bond.GetBondType())
    assert int_bondtype in (1, 2, 3), "unexpected bond type!"
    other_atom = bond.GetOtherAtom(atom)
    other_atom.SetNumExplicitHs(
        other_atom.GetNumExplicitHs() + int_bondtype)

mw.RemoveAtom(atom_idx)
Chem.SanitizeMol(mw)
print(Chem.MolToSmiles(mw))

Bear in mind that there can be multiple possible Kekulé assignments, while Chem.Kekulize only picks one. In some cases (not this one, of course) you may need to apply the above removal method to all distinct assignments (for the given ring system) in order to get all the valid transformed molecules.

A more correct function should also also take care when the int_bondtype == 1 and the other_atom has a chiral tag, and does not already have a hydrogen, because you may want to preserve the chiral indicator. Something like this may work (it's copied and pasted from another function, with a few tweaks to match the above naming scheme, but I haven't tested it).

            num_hs = other_atom.GetTotalNumHs()
            if (not num_hs) and other_atom.GetChiralTag():
                for bond_i, b in enumerate(other_atom.GetBonds()):
                    if b.GetIdx() == bond_idx:
                        break
                else:
                    raise AssertionError("Could not find bond")
                want_invert = (bond_i % 2 == 0)
                if want_invert:
                    if other_atom.GetChiralTag() == 2:
                        other_atom.SetChiralTag(Chem.ChiralType.CHI_TETRAHEDRAL_CW)
                    else:
                        other_atom.SetChiralTag(Chem.ChiralType.CHI_TETRAHEDRAL_CCW)

Cheers,

				Andrew
				da...@da...

Re: [Rdkit-discuss] Atom aromatic, but not in a ring

From: Noel O'B. <bao...@gm...> - 2024-08-27 13:27:05

There are other more subtle changes that can affect the aromaticity, e.g.
changing a bond order, the charge, or the atomic number of an atom. IMO,
the user needs to take responsibility for knowing if aromaticity might be
invalidated, and perform the appropriate actions. The alternative is for
the toolkit to take the responsibility, trigger a check on every edit and
take a performance hit in the general case. Indeed, atom deletion could be
treated specially, but slippery slope and confusion here we come! :-)

Regards,
Noel

On Tue, 27 Aug 2024 at 13:47, Ingvar Lagerstedt <in...@ne...>
wrote:

> Hello,
>
> When deleting an aromatic atom or bond, the ring information is removed,
> while any remaining atom in the broken aromatic ring is still labelled
> aromatic.  When attempting to sanitize such a molecule I get an exception: "rdkit.Chem.rdchem.AtomKekulizeException:
> non-ring atom 0 marked aromatic"
>
> To recreate:
>
> >>> from rdkit import Chem
>
> >>> m = Chem.MolFromSmiles('c1ccccc1')
>
> >>> mw = Chem.RWMol(m)
>
> >>> mw.RemoveAtom(0)
>
> >>> Chem.SanitizeMol(mw)
>
> [10:40:50] non-ring atom 0 marked aromatic
>
> Traceback (most recent call last):
>
>   File "<stdin>", line 1, in <module>
>
> rdkit.Chem.rdchem.AtomKekulizeException: non-ring atom 0 marked aromatic
>
>
> The example is simplistic, but there are reactions where an aromatic
> system can be broken, such as Zincke-Koenig reaction or Djerassi-Rylander
> oxidation. The exception makes it harder to describe such reactions.
>
>
> I currently check if an atom/bond is aromatic before deleting it, and if
> so remove all aromatic flags in the molecule.
>
>
> To me it would make sense if RDKit removed the aromatic flags for any atom
> that is no longer in a ring when deleting an aromatic atom/bond.
>
> Alternatively remove the aromatic flag on any non-ring atom when
> attempting to kekulize the structure rather than throwing an exception.
>
>
> Compare with a Birch reduction where the ring stays intact, here the
> kekulization/the following aromatize step rightly fails to find an aromatic
> ring, no exception is thrown, and the atoms are marked as non-aromatic.
>
>
> Kind Regards,
>
> Ingvar
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

[Rdkit-discuss] Atom aromatic, but not in a ring

From: Ingvar L. <in...@ne...> - 2024-08-27 12:44:59

Hello,

When deleting an aromatic atom or bond, the ring information is removed, while any remaining atom in the broken aromatic ring is still labelled aromatic.  When attempting to sanitize such a molecule I get an exception: "rdkit.Chem.rdchem.AtomKekulizeException: non-ring atom 0 marked aromatic"

To recreate:
>>> from rdkit import Chem
>>> m = Chem.MolFromSmiles('c1ccccc1')
>>> mw = Chem.RWMol(m)
>>> mw.RemoveAtom(0)
>>> Chem.SanitizeMol(mw)
[10:40:50] non-ring atom 0 marked aromatic
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
rdkit.Chem.rdchem.AtomKekulizeException: non-ring atom 0 marked aromatic

The example is simplistic, but there are reactions where an aromatic system can be broken, such as Zincke-Koenig reaction or Djerassi-Rylander oxidation. The exception makes it harder to describe such reactions.

I currently check if an atom/bond is aromatic before deleting it, and if so remove all aromatic flags in the molecule.

To me it would make sense if RDKit removed the aromatic flags for any atom that is no longer in a ring when deleting an aromatic atom/bond.
Alternatively remove the aromatic flag on any non-ring atom when attempting to kekulize the structure rather than throwing an exception.

Compare with a Birch reduction where the ring stays intact, here the kekulization/the following aromatize step rightly fails to find an aromatic ring, no exception is thrown, and the atoms are marked as non-aromatic.

Kind Regards,
Ingvar

109 messages has been excluded from this view by a project administrator.

Flat | Threaded

1 2 3 .. 464 > >> (Page 1 of 464)