rdkit-discuss Mailing List for RDKit (Page 2)

Open-Source Cheminformatics and Machine Learning

Brought to you by: glandrum

rdkit-discuss — Mailing list for discussion, questions and answers.

You can subscribe to this list here.

2006	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2007	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep (27)	Oct (4)	Nov (20)	Dec (4)
2008	Jan (12)	Feb (2)	Mar (23)	Apr (40)	May (30)	Jun (6)	Jul (35)	Aug (60)	Sep (31)	Oct (33)	Nov (35)	Dec (3)
2009	Jan (16)	Feb (77)	Mar (88)	Apr (57)	May (33)	Jun (27)	Jul (55)	Aug (26)	Sep (12)	Oct (45)	Nov (42)	Dec (23)
2010	Jan (64)	Feb (17)	Mar (30)	Apr (55)	May (30)	Jun (65)	Jul (112)	Aug (26)	Sep (67)	Oct (20)	Nov (67)	Dec (23)
2011	Jan (57)	Feb (43)	Mar (50)	Apr (66)	May (95)	Jun (73)	Jul (64)	Aug (47)	Sep (22)	Oct (56)	Nov (51)	Dec (34)
2012	Jan (64)	Feb (45)	Mar (65)	Apr (85)	May (76)	Jun (47)	Jul (75)	Aug (72)	Sep (31)	Oct (77)	Nov (61)	Dec (41)
2013	Jan (68)	Feb (63)	Mar (36)	Apr (73)	May (61)	Jun (69)	Jul (98)	Aug (60)	Sep (74)	Oct (102)	Nov (92)	Dec (63)
2014	Jan (112)	Feb (84)	Mar (72)	Apr (59)	May (96)	Jun (54)	Jul (91)	Aug (54)	Sep (38)	Oct (47)	Nov (33)	Dec (39)
2015	Jan (41)	Feb (115)	Mar (66)	Apr (87)	May (63)	Jun (53)	Jul (61)	Aug (59)	Sep (115)	Oct (42)	Nov (60)	Dec (20)
2016	Jan (52)	Feb (72)	Mar (100)	Apr (125)	May (61)	Jun (106)	Jul (62)	Aug (74)	Sep (151)	Oct (151)	Nov (117)	Dec (148)
2017	Jan (106)	Feb (75)	Mar (106)	Apr (67)	May (85)	Jun (144)	Jul (53)	Aug (73)	Sep (188)	Oct (106)	Nov (118)	Dec (74)
2018	Jan (96)	Feb (43)	Mar (40)	Apr (111)	May (77)	Jun (112)	Jul (64)	Aug (85)	Sep (73)	Oct (117)	Nov (97)	Dec (47)
2019	Jan (63)	Feb (112)	Mar (109)	Apr (61)	May (51)	Jun (41)	Jul (57)	Aug (68)	Sep (47)	Oct (126)	Nov (117)	Dec (96)
2020	Jan (84)	Feb (82)	Mar (80)	Apr (100)	May (78)	Jun (68)	Jul (76)	Aug (69)	Sep (76)	Oct (73)	Nov (69)	Dec (42)
2021	Jan (44)	Feb (30)	Mar (85)	Apr (65)	May (41)	Jun (72)	Jul (55)	Aug (9)	Sep (44)	Oct (44)	Nov (30)	Dec (40)
2022	Jan (35)	Feb (29)	Mar (55)	Apr (30)	May (31)	Jun (27)	Jul (49)	Aug (15)	Sep (17)	Oct (25)	Nov (15)	Dec (40)
2023	Jan (32)	Feb (10)	Mar (10)	Apr (21)	May (33)	Jun (31)	Jul (12)	Aug (17)	Sep (14)	Oct (12)	Nov (8)	Dec (12)
2024	Jan (10)	Feb (18)	Mar (7)	Apr (4)	May (6)	Jun (4)	Jul (5)	Aug (6)	Sep (8)	Oct (1)	Nov (1)	Dec
2025	Jan	Feb	Mar (3)	Apr	May	Jun	Jul (1)	Aug (2)	Sep (3)	Oct (2)	Nov	Dec

Flat | Threaded

<< < 1 2 3 4 .. 464 > >> (Page 2 of 464)

[Rdkit-discuss] Python job at Scripps Research

From: Diogo M. <dio...@gm...> - 2024-08-22 19:16:02

Hello,

We are recruiting a programmer, primarily Python, to improve autodock
(molecular
docking) and integrate it with other software, such as RDKit and OpenMM.
The location is Scripps Research in La Jolla, California. Goals are:

- to support development in general,
- improve user-friendliness of command line and graphical interfaces,
- make autodock components more usable from Python

For more details and to apply, see:
https://recruiting2.ultipro.com/SCR1003TSRI/JobBoard/98759e7d-7ede-4c0b-ac7b-2c6293c7b522/OpportunityDetail?opportunityId=b92548d1-155c-4c8e-be0e-f59c5b2452e0

Best regards,
Diogo

[Rdkit-discuss] ANN: chemfp 4.2

From: Andrew D. <da...@da...> - 2024-08-05 08:35:38

Hi RDKit-ers,

I have released chemfp 4.2. The new "simarray" functionality computes the full comparison matrix as a NumPy array, eg, for use in some clustering algorithms. It has built-in support for Tanimoto, Dice, cosine, and Hamming comparisons, plus an option to get the individual "a", "b", "c", and "d" components should you need a specialized metric. It processes 100M comparisons per second on my laptop, which means if you had 30 TB of free disk space you could generate the NxN comparisons for ChEMBL in about a day. (I'm curious if someone will do this!)

I've also updated chemfp's RDKit-Fingerprint, RDKit-Morgan, RDKit-AtomPair, and RDKit-Torsion fingerprint types to use RDKit's fingerprint generator API, instead of the older function-based API. This includes support for count emulation. Some of the parameter names have changed to follow RDKit's newer convention, and the RDKit-Morgan fingerprints now default to r=3 (to match the RDKit default) rather than r=2.

Chemfp still supports the older function-based API, which is used if you specify the older version number explicitly.

For a full description of what's new in this release, see https://chemfp.com/docs/whats_new_in_42.html .

Chemfp may be the package you’ve been looking for, if you work with binary cheminformatics fingerprints in Python. Chemfp is perhaps best known for its high-performance fingerprint similarity search. Its Taylor/Butina clustering, MaxMin diversity selection, and sphere exclusion, (including directed sphere exclusion) are equally world-class. Or, if you simply need a 100K by 100K distance array to pass into scikit-learn, chemfp’s simarray can generate that in less than a minute.

The chemfp homepage is https://chemfp.com/ . To install a pre-compiled chemfp for Linux-based OSes:

python -m pip install chemfp -i https://chemfp.com/packages/

The default installation limits or disables a few chemfp features as described in the base license agreement at https://chemfp.com/BaseLicense.txt . To request a license key, which is free for academic use, see https://chemfp.com/license/ .

Best regards,

Andrew Dalke
da...@da...

Re: [Rdkit-discuss] Header files dropped from 2024.03.5 conda package of rdkit

From: Ingvar L. <in...@ne...> - 2024-07-30 14:59:03

Ah -  found the rdkit-dev conda package.  Missed a wildcard when I searched
.

> On 30 Jul 2024, at 14:58, Ingvar Lagerstedt <in...@ne...> wrote:
> 
> Hello, 
> 
> In the conda package for RDKit 2024.03.5 for rdkit, including librdkit there are no header files, e.g., RDKitBase.h.  They were in the 2024.03.4 version.  Have the headers moved to another conda package, or have they accidentally been left out.  I have the linux-64 and osx-arm64 version.
> 
> Kind Regards,
> Ingvar
> 
> Ingvar Lagerstedt
> Senior software engineer
> NextMove Software Limited
> Innovation Centre, 320 Cambridge Science Park, Cambridge, CB4 0WG
> 
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Header files dropped from 2024.03.5 conda package of rdkit

From: Ingvar L. <in...@ne...> - 2024-07-30 14:21:32

Hello, 

In the conda package for RDKit 2024.03.5 for rdkit, including librdkit there are no header files, e.g., RDKitBase.h.  They were in the 2024.03.4 version.  Have the headers moved to another conda package, or have they accidentally been left out.  I have the linux-64 and osx-arm64 version.

Kind Regards,
Ingvar

Ingvar Lagerstedt
Senior software engineer
NextMove Software Limited
Innovation Centre, 320 Cambridge Science Park, Cambridge, CB4 0WG

Re: [Rdkit-discuss] Chirality labels for fingerprints

From: Greg L. <gre...@gm...> - 2024-07-26 15:07:08

Hi Joao,

On Thu, Jul 25, 2024 at 8:03 PM J Sousa <jso...@gm...> wrote:

>
> In fingerprints calculated by RDKit with includeChirality=True, is the CIP
> label (R/S) the atom property directly used to generate the integer
> identifier of an atom circular neighborhood?
>
> Is the CIP label used for Morgan, RTKit, Atom pairs and Torsions
> fingerprints?
>

Yes, that is currently how the fingerprinting code handles chirality. This
is not a great way to do it, but it's what we have right now.[1]

-greg
[1] I haven't put a lot of time into this, but I haven't come up with
anything better yet.

[Rdkit-discuss] Chirality labels for fingerprints

From: J S. <jso...@gm...> - 2024-07-25 18:00:50

Hi,

In fingerprints calculated by RDKit with includeChirality=True, is the CIP
label (R/S) the atom property directly used to generate the integer
identifier of an atom circular neighborhood?

Is the CIP label used for Morgan, RTKit, Atom pairs and Torsions
fingerprints?

Thanks,
Joao Sousa

Re: [Rdkit-discuss] RDKit PostgreSQL extension: Unexpected behaviour of substruct()

From: Ernst-Georg S. <pg...@tu...> - 2024-07-02 15:56:50

Am 27.06.2024 um 11:03 schrieb Wim Dehaen:
> I would expect the problem here is kekulization. The SMARTS is pattern 
> matching using the kekule structure (i.e. double and single bonds, non 
> aromatic atoms) and is not sanitized whereas the SMILES after parsing 
> and sanitization has aromatic bonds and aromatic atoms. Try what happens 
> when you do a SMARTS match with the SMILES with aromatic atoms: 
> `[2H]c1cc([3H])cc(C2=N[C@](C)([37Cl])CC2)c1`

That was it indeed.

Thank you,

Ernst-Georg

Re: [Rdkit-discuss] RDKit PostgreSQL extension: Unexpected behaviour of substruct()

From: Noel O'B. <bao...@gm...> - 2024-06-27 09:28:07

"Every valid SMILES is also a valid SMARTS": I think this is one of John
May's lines, which I was never keen on as it makes people think that if you
treat a SMILES as a SMARTS that it will match the original SMILES. It
mostly will, but I think you have found the difference between the SMILES
and SMARTS treatment of "[2H]" - one means deuterium, the other means an
isotope of mass 2 with a single implicit hydrogen attached. It doesn't
match because the deuterium doesn't have another hydrogen attached. [I
think??]

Regards,
Noel

On Thu, 27 Jun 2024 at 10:05, Wim Dehaen <wim...@gm...> wrote:

> I would expect the problem here is kekulization. The SMARTS is pattern
> matching using the kekule structure (i.e. double and single bonds, non
> aromatic atoms) and is not sanitized whereas the SMILES after parsing and
> sanitization has aromatic bonds and aromatic atoms. Try what happens when
> you do a SMARTS match with the SMILES with aromatic atoms:
> `[2H]c1cc([3H])cc(C2=N[C@](C)([37Cl])CC2)c1`
>
> best wishes
> wim
>
> On Thu, Jun 27, 2024 at 10:56 AM pgchem pgchem <pg...@tu...>
> wrote:
>
>> Hello all,
>>
>> if every valid SMILES is also a valid SMARTS, why does:
>>
>> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol,
>> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol)
>>
>> yield "True", but:
>>
>> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol,
>> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::qmol)
>>
>> is "False"? The same is observed when using the @> operator.
>>
>> RDKit 2024.03.3 built from source + PostgreSQL 16.3.
>>
>> best regards
>>
>> Ernst-Georg
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdk...@li...
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

Re: [Rdkit-discuss] RDKit PostgreSQL extension: Unexpected behaviour of substruct()

From: Wim D. <wim...@gm...> - 2024-06-27 09:04:05

I would expect the problem here is kekulization. The SMARTS is pattern
matching using the kekule structure (i.e. double and single bonds, non
aromatic atoms) and is not sanitized whereas the SMILES after parsing and
sanitization has aromatic bonds and aromatic atoms. Try what happens when
you do a SMARTS match with the SMILES with aromatic atoms:
`[2H]c1cc([3H])cc(C2=N[C@](C)([37Cl])CC2)c1`

best wishes
wim

On Thu, Jun 27, 2024 at 10:56 AM pgchem pgchem <pg...@tu...> wrote:

> Hello all,
>
> if every valid SMILES is also a valid SMARTS, why does:
>
> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol,
> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol)
>
> yield "True", but:
>
> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol,
> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::qmol)
>
> is "False"? The same is observed when using the @> operator.
>
> RDKit 2024.03.3 built from source + PostgreSQL 16.3.
>
> best regards
>
> Ernst-Georg
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

[Rdkit-discuss] RDKit PostgreSQL extension: Unexpected behaviour of substruct()

From: pgchem p. <pg...@tu...> - 2024-06-27 08:53:29

Hello all,
 
if every valid SMILES is also a valid SMARTS, why does:
 
select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol, '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol)

yield "True", but:
 
select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol, '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::qmol)
 
is "False"? The same is observed when using the @> operator.
 
RDKit 2024.03.3 built from source + PostgreSQL 16.3.
 
best regards
 
Ernst-Georg

[Rdkit-discuss] Chemical Biology Team Leader - EMBL-EBI (Cambrdige, UK)

From: Eloy F. <elo...@gm...> - 2024-06-24 08:39:37

Dear RDKitters,

We are recruiting for a Chemical Biology Team Leader. This is an exciting
opportunity to lead the Chemical Biology resources, based at the Wellcome
Genome Campus in Hinxton near Cambridge, UK.

The chemical biology team
<https://www.ebi.ac.uk/about/teams/chemical-biology-services/> at EMBL-EBI
delivers world-leading databases and resources to the scientific community.
Our flagship resource, ChEMBL <https://www.ebi.ac.uk/chembl/>, is a
database of high-quality quantitative small-molecule bioactivity data
curated from the scientific literature and direct data depositions.
SureChEMBL <https://www.surechembl.org/> is a complementary patent resource
containing chemical structures and biology/drug discovery annotations
extracted daily from patents. UniChem <https://www.ebi.ac.uk/unichem/> links
chemical structures across databases. ChEBI <https://www.ebi.ac.uk/chebi/> is
a database and ontology of small molecules relevant to biology.

Closing Date: 19th July 2024

More details here <https://www.embl.org/jobs/position/EBI02255>

Kind regards,
Eloy

Re: [Rdkit-discuss] Compiling RDkit 2023 gives MD5 issues

From: James W. <jea...@gm...> - 2024-05-14 15:08:58

This resolved itself after a refresh, so whether I had a bad download of
the file I'm not sure. Anyway, I have the image now, so all's well

On Tue, 14 May 2024 at 15:23, Greg Landrum <gre...@gm...> wrote:

> Hi James,
>
> If that's pulling the inchi zip from rdkit.org then the MD5 shouldn't
> have changed.
>
> The easiest thing is to just replace the MD5 in
> $RDBASE/Code/cmake/Modules/FindInchi.cmake with what you're getting (after
> making sure it is in fact the correct zip file of course).
>
> -greg
>
>
> On Tue, May 14, 2024 at 1:04 PM James Wallace <jea...@gm...>
> wrote:
>
>> I'm trying to compile RDKit 2023_03_3 into a Docker container, but the
>> CMake MD5 check fails for the Inchi library. Is there a way of disabling
>> this check, because my presumption is the library has changed going
>> forward, but for compatibility reasons, I want to keep this version as
>> close to stock as possible. I enclose the specific error below:
>>
>> #12 1177.0 The md5 checksum for /rdkit/External/INCHI-API/INCHI-1-SRC.zip
>> is incorrect; expected: f2efa0c58cef32915686c04d7055b4e9, found:
>> 4579f086463c76353a75ecc6193becb9
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdk...@li...
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>

Re: [Rdkit-discuss] Compiling RDkit 2023 gives MD5 issues

From: Greg L. <gre...@gm...> - 2024-05-14 14:23:28

Hi James,

If that's pulling the inchi zip from rdkit.org then the MD5 shouldn't have
changed.

The easiest thing is to just replace the MD5 in
$RDBASE/Code/cmake/Modules/FindInchi.cmake with what you're getting (after
making sure it is in fact the correct zip file of course).

-greg


On Tue, May 14, 2024 at 1:04 PM James Wallace <jea...@gm...> wrote:

> I'm trying to compile RDKit 2023_03_3 into a Docker container, but the
> CMake MD5 check fails for the Inchi library. Is there a way of disabling
> this check, because my presumption is the library has changed going
> forward, but for compatibility reasons, I want to keep this version as
> close to stock as possible. I enclose the specific error below:
>
> #12 1177.0 The md5 checksum for /rdkit/External/INCHI-API/INCHI-1-SRC.zip
> is incorrect; expected: f2efa0c58cef32915686c04d7055b4e9, found:
> 4579f086463c76353a75ecc6193becb9
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

[Rdkit-discuss] Compiling RDkit 2023 gives MD5 issues

From: James W. <jea...@gm...> - 2024-05-14 11:01:51

I'm trying to compile RDKit 2023_03_3 into a Docker container, but the
CMake MD5 check fails for the Inchi library. Is there a way of disabling
this check, because my presumption is the library has changed going
forward, but for compatibility reasons, I want to keep this version as
close to stock as possible. I enclose the specific error below:

#12 1177.0 The md5 checksum for /rdkit/External/INCHI-API/INCHI-1-SRC.zip is
incorrect; expected: f2efa0c58cef32915686c04d7055b4e9, found:
4579f086463c76353a75ecc6193becb9

Re: [Rdkit-discuss] sampling of ring conformation for docking

From: He, A. <he...@bu...> - 2024-05-13 21:25:50

Hi Pavel,

Do you work with small rings (5, 6, 7) or large cyclic structures (like cyclic peptides)?
To distinguish different conformations of small rings, I feel that the torsional angles or apex heights – such geometric values that are alignment-free and depend on the internal coordinates of the molecules - might be more useful than RMSD. You can run conformation generation then put conformers into categories, if you don’t have too many rings and the rings aren’t that big. To get started with a simple example in RDKit, I previously found this tutorial very helpful:
https://sunhwan.github.io/blog/2021/02/24/RDKit-ETKDG-Piperazine.html

Docking in AutoDock Vina (https://autodock-vina.readthedocs.io/en/latest/) or AutoDock-GPU (https://github.com/ccsb-scripps/AutoDock-GPU) supports sampling of ring conformations on the run. By default, attempts will be made during docking to sample alternate conformers of 7-membered and larger rings. Optionally, you could also turn on the sampling for 6-membered rings and smaller ones. Take a peek at this recent paper to learn about the method:
https://www.cambridge.org/core/journals/qrb-discovery/article/performance-evaluation-of-flexible-macrocycle-docking-in-autodock/D8417BC284AEE198EC6AF25C7E677249

The Meeko project (https://github.com/forlilab/Meeko?tab=readme-ov-file#python-tutorial) provides a seamless workflow in Python to export your RDKit molecules into AutoDock-ready formats (and the docking outcomes can be retrieved back to RDKit, too!). The multiple docking outcomes with AutoDock Vina can give you at least some idea of what conformations might fit. You could refine the poses with more advanced methods.

Hope this helps!


Best regards,
Amy H.

From: Pavel Polishchuk <pav...@uk...>
Date: Monday, May 13, 2024 at 4:43 AM
To: rdk...@li... <rdk...@li...>
Subject: [Rdkit-discuss] sampling of ring conformation for docking
Hello,

   I use RDKit to embed initial conformations for docking. The issue is
with saturated rings. I can use a single random conformer but its
geometry may be unsuitable and the whole molecule will fail to dock. I
can use several starting conformers for docking and to avoid docking of
very similar conformers I can select a few diverse conformers based on
RMSD between rings only. However, the issue occurs if a molecule has
several such saturated rings. The current workaround is to compute RMSD
between corresponding rings individually, then average RMSD values and
select a diverse set of conformers. It may work to some extend.
   However I'm curious whether a better solution possible? Can we sample
rings individually and embed a molecule using pre-generated conformers
of some parts (rings)? I know about the restricted conformer enumeration
function, but it will work if we supply only a single connected part as
fixed. It should not work if we have two disconnected parts (rings) with
3D coordinates, because we do not know their relative position to
generate 3D coordinates for the rest of atoms in a molecule.
   Maybe someone will have some ideas/suggestions?

Kind regards,
Pavel


_______________________________________________
Rdkit-discuss mailing list
Rdk...@li...
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!0p9-LhqopxbW2-tJTOxCwEVRUKO6jN5s_2WifPuV2PCrDjoa_nTmgY9NPdqsyDi2aHTJ3LA1_Kh37wI0Vhn8IlJ5PAEKr5vut811YA$<https://urldefense.com/v3/__https:/lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!0p9-LhqopxbW2-tJTOxCwEVRUKO6jN5s_2WifPuV2PCrDjoa_nTmgY9NPdqsyDi2aHTJ3LA1_Kh37wI0Vhn8IlJ5PAEKr5vut811YA$>

[Rdkit-discuss] sampling of ring conformation for docking

From: Pavel P. <pav...@uk...> - 2024-05-13 08:43:01

Hello,

   I use RDKit to embed initial conformations for docking. The issue is 
with saturated rings. I can use a single random conformer but its 
geometry may be unsuitable and the whole molecule will fail to dock. I 
can use several starting conformers for docking and to avoid docking of 
very similar conformers I can select a few diverse conformers based on 
RMSD between rings only. However, the issue occurs if a molecule has 
several such saturated rings. The current workaround is to compute RMSD 
between corresponding rings individually, then average RMSD values and 
select a diverse set of conformers. It may work to some extend.
   However I'm curious whether a better solution possible? Can we sample 
rings individually and embed a molecule using pre-generated conformers 
of some parts (rings)? I know about the restricted conformer enumeration 
function, but it will work if we supply only a single connected part as 
fixed. It should not work if we have two disconnected parts (rings) with 
3D coordinates, because we do not know their relative position to 
generate 3D coordinates for the rest of atoms in a molecule.
   Maybe someone will have some ideas/suggestions?

Kind regards,
Pavel

Re: [Rdkit-discuss] Request for Assistance with MACCS 166 Fingerprint Calculation for 3D QSAR Study

From: Ariadna L. P. <ari...@gm...> - 2024-05-02 08:18:31

Hello everyone,

Thank you for all your helpful suggestions.

I've taken careful note of them, and they have been extremely helpful in
guiding my work.
3D-QSAR is also new for me and your insights and expertise have been
incredibly valuable.

Thank you once again for your generous assistance.

Best Regards,

Ariadna Llop

Missatge de Andrew Dalke <da...@da...> del dia dt., 30 d’abr.
2024 a les 22:45:

> Hi Ariadna,
>
>   In general the MACCS keys are not that good for comparing similarity.
> They exist still for historical reasons. Back in the 1970s the company
> Molecular Design Limited developed a program called "Molecular Access
> System" (MACCS) for structure registration, substructure search, and the
> like.
>
> Substructure search is slow, so MACCS includes a set of keys which would
> act as fast filters - if the query contained a key but the database entry
> did not, then the query could not match that entry.
>
> In the 1980s when fingerprint similarity search first became popular -
> this is before the term "fingerprint" was even coined - people used the
> MACCS keys because they were already computed and sitting there, on the
> computer system they were already using.
>
> Over time people developed other types of fingerprints, and different ways
> to compare them, and a more complete understanding of how they are coupled
> to the types of system being studied.
>
> For example, in "Comparing structural fingerprints using a
> literature-based similarity benchmark" by Sayle and O'Boyle,
> "Extended-connectivity fingerprints of diameter 4 and 6 are among the best
> performing fingerprints when ranking diverse structures by similarity, as
> is the topological torsion fingerprint. However, when ranking very close
> analogues, the atom pair fingerprint outperforms the others tested."
>
> They found the MACCS fingerprints to be one of the worst performers, which
> you might expect now that you know the happenstance which made them popular.
>
> Since you are doing 3D QSAR, you should familiarize yourself with the
> fingerprints used in that area. I have no experience with 3D QSAR and
> cannot provide advice on what is appropriate.
>
> The first paper I found using Google Scholar to search for "3d qsar
> fingerprints" is "Docking, Interaction Fingerprint, and Three-Dimensional
> Quantitative Structure–Activity Relationship (3D-QSAR) of Sigma1 Receptor
> Ligands, Analogs of the Neuroprotective Agent RC-33" at
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637851/ which uses
> Interaction fingerprints.
>
> The second is "Novel TOPP descriptors in 3D-QSAR analysis of apoptosis
> inducing 4-aryl-4H-chromenes: Comparison versus other 2D- and
> 3D-descriptors" at
> https://www.sciencedirect.com/science/article/pii/S0968089607005834 which
> I mention to because it summarizes 7 different descriptor-based approaches,
> and places the MACCS keys in last place, far below the second worst ("TOPP
> > GRIND > BCI 4096 = ECFP > FCFP > GRID-GOLPE ≫ DRAGON ⋙ MDL 166").
>
> No doubt there are many others for you to read through and try out.
>
>
> > # Generate fingerprint descriptor database
> > fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]
>
> What I can suggest is you try my chemfp package, specifically the 3.2b1 I
> just released (bear in mind that it is beta!)
>
> You can install it with:
>
>    python -m pip install chemfp==4.2b1 -i https://chemfp.com/packages/
>
> To generate Morgan fingerprints of radius 2, I suggest you compute them
> once and store them in a file, like this command-line example:
>
>   rdkit2fps --morgan2 dataset.smi -o dataset.fps
>
> (use "--maccs" to generate MACCS keys, "--pair" for atom pairs; and use
> "--help" to see what other options are available.)
>
> To "Calculate pairwise Tanimoto similarity between fingerprints" as a
> distance, you can use another command-line tool to generate the matrix as a
> NumPy "npy" file, like this:
>
>   chemfp simarray dataset.fps --as-distance -o dataset.npy
>
> To load this in Python:
>
>   import numpy as np
>   dists = np.load("dataset.npy")
>
> If you also need the identifiers:
>
>   with open("dataset.npy", "rb") as f:
>     dists = np.load(f)
>     metadata = np.load(f)
>     ids = np.load(f)
>
> This should make it easier to iterate over the different clustering
> methods available, since you only generate the fingerprints and distance
> matrix once.
>
> If you decide to use interaction fingerprints, or some other fingerprint
> type that is not in the RDKit, you can still generate the fingerprints in
> FPS format (a simple text format) and use chemfp to generate your matrix
> for you, either on the command-line or through its Python API.
>
> > However, I'm not satisfied with the results and would like to experiment
> with MACCS Keys to see if they yield better clustering outcomes. Does
> anyone know how to cluster compounds using MACCS fingerprints? Any insights
> on the best approach to calculate similarities and cluster using these
> fingerprints would be highly appreciated.
>
> In case I was not clear enough before, MACCS keys make poor fingerprints.
> There is no reason to expect they will yield better clustering outcomes,
> and multiple papers which suggest they will make worse outcomes.
>
> Best regards,
>
>                                 Andrew
>                                 da...@da...
>
>
>

Re: [Rdkit-discuss] Request for Assistance with MACCS 166 Fingerprint Calculation for 3D QSAR Study

From: Andrew D. <da...@da...> - 2024-04-30 21:10:38

Hi Ariadna,

  In general the MACCS keys are not that good for comparing similarity. They exist still for historical reasons. Back in the 1970s the company Molecular Design Limited developed a program called "Molecular Access System" (MACCS) for structure registration, substructure search, and the like.

Substructure search is slow, so MACCS includes a set of keys which would act as fast filters - if the query contained a key but the database entry did not, then the query could not match that entry.

In the 1980s when fingerprint similarity search first became popular - this is before the term "fingerprint" was even coined - people used the MACCS keys because they were already computed and sitting there, on the computer system they were already using.

Over time people developed other types of fingerprints, and different ways to compare them, and a more complete understanding of how they are coupled to the types of system being studied.

For example, in "Comparing structural fingerprints using a literature-based similarity benchmark" by Sayle and O'Boyle, "Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested."

They found the MACCS fingerprints to be one of the worst performers, which you might expect now that you know the happenstance which made them popular.

Since you are doing 3D QSAR, you should familiarize yourself with the fingerprints used in that area. I have no experience with 3D QSAR and cannot provide advice on what is appropriate. 

The first paper I found using Google Scholar to search for "3d qsar fingerprints" is "Docking, Interaction Fingerprint, and Three-Dimensional Quantitative Structure–Activity Relationship (3D-QSAR) of Sigma1 Receptor Ligands, Analogs of the Neuroprotective Agent RC-33" at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637851/ which uses Interaction fingerprints.

The second is "Novel TOPP descriptors in 3D-QSAR analysis of apoptosis inducing 4-aryl-4H-chromenes: Comparison versus other 2D- and 3D-descriptors" at https://www.sciencedirect.com/science/article/pii/S0968089607005834 which I mention to because it summarizes 7 different descriptor-based approaches, and places the MACCS keys in last place, far below the second worst ("TOPP > GRIND > BCI 4096 = ECFP > FCFP > GRID-GOLPE ≫ DRAGON ⋙ MDL 166").

No doubt there are many others for you to read through and try out.


> # Generate fingerprint descriptor database
> fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]

What I can suggest is you try my chemfp package, specifically the 3.2b1 I just released (bear in mind that it is beta!)

You can install it with:

   python -m pip install chemfp==4.2b1 -i https://chemfp.com/packages/

To generate Morgan fingerprints of radius 2, I suggest you compute them once and store them in a file, like this command-line example:

  rdkit2fps --morgan2 dataset.smi -o dataset.fps

(use "--maccs" to generate MACCS keys, "--pair" for atom pairs; and use "--help" to see what other options are available.)

To "Calculate pairwise Tanimoto similarity between fingerprints" as a distance, you can use another command-line tool to generate the matrix as a NumPy "npy" file, like this:

  chemfp simarray dataset.fps --as-distance -o dataset.npy

To load this in Python:

  import numpy as np
  dists = np.load("dataset.npy")

If you also need the identifiers:

  with open("dataset.npy", "rb") as f:
    dists = np.load(f)
    metadata = np.load(f)
    ids = np.load(f)

This should make it easier to iterate over the different clustering methods available, since you only generate the fingerprints and distance matrix once.

If you decide to use interaction fingerprints, or some other fingerprint type that is not in the RDKit, you can still generate the fingerprints in FPS format (a simple text format) and use chemfp to generate your matrix for you, either on the command-line or through its Python API.

> However, I'm not satisfied with the results and would like to experiment with MACCS Keys to see if they yield better clustering outcomes. Does anyone know how to cluster compounds using MACCS fingerprints? Any insights on the best approach to calculate similarities and cluster using these fingerprints would be highly appreciated.

In case I was not clear enough before, MACCS keys make poor fingerprints. There is no reason to expect they will yield better clustering outcomes, and multiple papers which suggest they will make worse outcomes.

Best regards,

				Andrew
				da...@da...

[Rdkit-discuss] For bioactivity prediction

From: YUKTI D. <yuk...@st...> - 2024-04-24 19:14:59

Can anybody help me doing bioactivity prediction of batch of smiles through RDKit?

Re: [Rdkit-discuss] Request for Assistance with MACCS 166 Fingerprint Calculation for 3D QSAR Study

From: Greg L. <gre...@gm...> - 2024-04-23 14:20:24

Hi,

Please do not duplicate questions/posts between the mailing list and github
discussions. That's spamming the community.

-greg


On Tue, Apr 23, 2024 at 4:10 PM Ariadna Llop Peiró <ari...@gm...>
wrote:

> Hello everyone,
>
> I'm currently working with a dataset of chemical compounds, aiming to
> cluster them into different series to create a 3D-QSAR model. Up to this
> point, I've been using Morgan Fingerprints to generate the descriptors and
> cluster the compounds based on their Tanimoto Similarity:
>
> ```
> # Generate fingerprint descriptor database
> fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]
>
>
> # Calculate pairwise Tanimoto similarity between fingerprints
> similarity_matrix = []
> for i in range(len(fps)):
>     similarities = []
>     for j in range(len(fps)):
>         similarities.append(DataStructs.TanimotoSimilarity(fps[i], fps[j]))
>
>     similarity_matrix.append(similarities)
> ```
>
>
> With the similarity matrix, I applied hierarchical clustering based on a
> Tanimoto Similarity threshold to group similar compounds:
>
> ```
> # Cluster based on Tanimoto similarity
> dists = 1 - np.array(similarity_matrix)
> hc = hierarchy.linkage(squareform(dists), method='single')
>
> # Specify a distance threshold or number of clusters
> threshold = 0.6  # Adjust this value based on your dendrogram and
> similarity values
> clusters = hierarchy.fcluster(hc, threshold, criterion='distance')
> ```
>
> However, I'm not satisfied with the results and would like to experiment
> with MACCS Keys to see if they yield better clustering outcomes. Does
> anyone know how to cluster compounds using MACCS fingerprints? Any insights
> on the best approach to calculate similarities and cluster using these
> fingerprints would be highly appreciated.
>
> Thank you in advance for your suggestions!
>
> Ariadna Llop
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

[Rdkit-discuss] Request for Assistance with MACCS 166 Fingerprint Calculation for 3D QSAR Study

From: Ariadna L. P. <ari...@gm...> - 2024-04-23 14:07:38

Hello everyone,

I'm currently working with a dataset of chemical compounds, aiming to
cluster them into different series to create a 3D-QSAR model. Up to this
point, I've been using Morgan Fingerprints to generate the descriptors and
cluster the compounds based on their Tanimoto Similarity:

```
# Generate fingerprint descriptor database
fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]


# Calculate pairwise Tanimoto similarity between fingerprints
similarity_matrix = []
for i in range(len(fps)):
    similarities = []
    for j in range(len(fps)):
        similarities.append(DataStructs.TanimotoSimilarity(fps[i], fps[j]))

    similarity_matrix.append(similarities)
```


With the similarity matrix, I applied hierarchical clustering based on a
Tanimoto Similarity threshold to group similar compounds:

```
# Cluster based on Tanimoto similarity
dists = 1 - np.array(similarity_matrix)
hc = hierarchy.linkage(squareform(dists), method='single')

# Specify a distance threshold or number of clusters
threshold = 0.6  # Adjust this value based on your dendrogram and
similarity values
clusters = hierarchy.fcluster(hc, threshold, criterion='distance')
```

However, I'm not satisfied with the results and would like to experiment
with MACCS Keys to see if they yield better clustering outcomes. Does
anyone know how to cluster compounds using MACCS fingerprints? Any insights
on the best approach to calculate similarities and cluster using these
fingerprints would be highly appreciated.

Thank you in advance for your suggestions!

Ariadna Llop

[Rdkit-discuss] specifying cis/trans in reactions‏

From: מיכל ר. <mic...@gm...> - 2024-03-27 10:28:22

Hi
I'm trying to define the following reaction:
'([A:1]\[A:2]=[A:3]\[A:4]=[A:5]/[A:6]=[A:7]\[A:8]=[A:9]\[A:10]) >>
([A:1]/[A:2]=[A:9]\[A:10].[A:4]1=[A:5][A:6]=[A:7][A:8]=[A:3]1)'

I want the reaction to take place for the cis case specifically as written
and not for the all-trans reactant. using rdchiral I manage to eliminate
the all-trans reactant, but the product is given in its all-trans case and
not in the cis case, as the reaction demands (between atoms 1,2,9,10):
reactant: 'CCCC/[NH+]=C/C=C(C)\\C=C/C=C(C)/C=C/C1=C(C)CCCC1(C)C'
product: 'CCCC/[NH+]=C(C)/C=C/C1=C(C)CCCC1(C)C'

How can I resolve this issue?

Re: [Rdkit-discuss] Strange behaviour for GetSubstructMatches with dative bonds

From: Greg L. <gre...@gm...> - 2024-03-20 16:36:08

For what it's worth, this one works too:
m.GetSubstructMatches(Chem.MolFromSmarts('P1->[Zr+3]<-C1'))

It looks like a problem in the way ring closure bonds are being handled in
the SMARTS parser.
Jan: would you mind creating an issue for this in github?

-greg


On Wed, Mar 20, 2024 at 3:30 PM Jan Halborg Jensen <jhj...@ch...>
wrote:

> The following finds no matches:
>
> m = Chem.MolFromSmiles('C1P->[Zr+3]<-1')
> m.GetSubstructMatches(Chem.MolFromSmarts('C1P->[Zr+3]<-1’))
>
> But all these work:
>
> m.GetSubstructMatches(Chem.MolFromSmiles('C1P->[Zr+3]<-1’))
>
> m.GetSubstructMatches(Chem.MolFromSmarts('[*]->[Zr+3]’))
>
> m = Chem.MolFromSmiles('C1P-[Zr+3]-1')
> m.GetSubstructMatches(Chem.MolFromSmarts('C1P-[Zr+3]-1’))
>
>
> Is this a bug, or is there something I’m missing with regard to the first
> case?
>
> Best regards, Jan
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

[Rdkit-discuss] Strange behaviour for GetSubstructMatches with dative bonds

From: Jan H. J. <jhj...@ch...> - 2024-03-20 14:28:16

The following finds no matches:

m = Chem.MolFromSmiles('C1P->[Zr+3]<-1')
m.GetSubstructMatches(Chem.MolFromSmarts('C1P->[Zr+3]<-1’))

But all these work:

m.GetSubstructMatches(Chem.MolFromSmiles('C1P->[Zr+3]<-1’))

m.GetSubstructMatches(Chem.MolFromSmarts('[*]->[Zr+3]’))

m = Chem.MolFromSmiles('C1P-[Zr+3]-1')
m.GetSubstructMatches(Chem.MolFromSmarts('C1P-[Zr+3]-1’))


Is this a bug, or is there something I’m missing with regard to the first case?

Best regards, Jan

Re: [Rdkit-discuss] Bug in ResonanceMolSupplier?

From: Paolo T. <pao...@gm...> - 2024-03-19 11:16:35

Dear Jan,

Definitely it is a bug.
I’ll try and fix it for the next release which is due in ~2 weeks.

Thanks for reporting, cheers
Paolo

> On 19 Mar 2024, at 11:20, Jan Halborg Jensen <jhj...@ch...> wrote:
> 
> Why does ResonanceMolSupplier only give me one resonance structure for O[NH+]=[C-]NC when O[NH+]=[CH]NC gives me two structures? Is that a bug?
> 
> Best regards, Jan
> 
> 
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

109 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 .. 464 > >> (Page 2 of 464)