From: Vincent S. <vfs...@ua...> - 2023-05-17 01:04:48
|
Hi RDKit Community, I am experimenting with explicit bonds in SMILES. From my understanding, when I create a mol object from a SMILES, the atom index order is preserved and corresponds to the order from left to right in the SMILES. I thought that this might also be the case for bond indices, but that does not appear to be correct (see example below). Is it possible to get a bond index in the order of the SMILES? Thanks, Vin smi = "CCc1cc[nH]c1CCC1CCC(CC1)c1cc[nH]c1" mol1 = Chem.MolFromSmiles(smi) smi_explicit = Chem.MolToSmiles(mol1, allBondsExplicit=True) mol2 = Chem.MolFromSmiles(smi_explicit) print(smi_explicit) C-C-c1:c:c:[nH]:c:1-C-C-C1-C-C-C(-c2:c:c:[nH]:c:2)-C-C-1 Here is a manual labeling of bond index from left to right and marking aromatic bond locations C-C-c1:c:c:[nH]:c :1-C-C-C1-C -C -C( -c2 :c :c :[nH]:c :2)-C -C -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 + + + + + + + + + + However, as you can see below, the actual bond index numbers using SMARTS matching for pattern *:* is as follows: smarts = '*:*' atom_idx_matches = mol2.GetSubstructMatches(Chem.MolFromSmarts(smarts)) # get bond idx matches # from https://www.rdkit.org/docs/Cookbook.html#returning-substructure-matches-as-smiles def get_bond_idx_matches(smarts, mol, match_atom_indices): query_mol = Chem.MolFromSmarts(smarts) bond_indices = [] for query_bond in query_mol.GetBonds(): atom_index1 = match_atom_indices[query_bond.GetBeginAtomIdx()] atom_index2 = match_atom_indices[query_bond.GetEndAtomIdx()] bond_indices.append(mol.GetBondBetweenAtoms( atom_index1, atom_index2).GetIdx()) return bond_indices bond_idx_matches = [] for idx_group in range(len(atom_idx_matches)): bond_idx_matches.append(get_bond_idx_matches(smarts,mol2,atom_idx_matches[idx_group])) print(sorted(bond_idx_matches)) [[2], [3], [4], [5], [13], [14], [15], [16], [19], [21]] |
From: Andrew D. <da...@da...> - 2023-05-17 09:35:14
|
On May 17, 2023, at 02:31, Vincent Scalfani <vfs...@ua...> wrote: > I thought that this might also be the case for bond indices, but that does not appear to be correct (see example below). Is it possible to get a bond index in the order of the SMILES? This may help you understand why that's a difficult question. What does the bond index mean in something like C12.OC23.C3.C1 ? Does the bond for closure 1 come first in the bond list, because that's where it start, or is it last, because that's where it ends? It looks like you think it should be the closure position. Here's your SMILES labelled by atom index: ┌ 1 1 1 1 1 1 1 1 1 1 atoms│ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 └ | | | | | | | | | | | | | | | | | | | | SMILES[ C-C-c1:c:c:[nH]:c:1-C-C-C1-C-C-C(-c2:c:c:[nH]:c:2)-C-C-1 I used the program at the end of this email to print the information in bond list order: In bondlist order i Bnd# a1 ~ a2 frag 0 0 0 - 1 C-C 1 1 1 - 2 C-c 2 2 2 : 3 c:c 3 3 3 : 4 c:c 4 4 4 : 5 c:[nH] 5 5 5 : 6 [nH]:c 6 6 6 - 7 c-C 7 7 7 - 8 C-C 8 8 8 - 9 C-C 9 9 9 - 10 C-C 10 10 10 - 11 C-C 11 11 11 - 12 C-C 12 12 12 - 13 C-c 13 13 13 : 14 c:c 14 14 14 : 15 c:c 15 15 15 : 16 c:[nH] 16 16 16 : 17 [nH]:c 17 17 12 - 18 C-C 18 18 18 - 19 C-C 19 19 6 : 2 c:c 20 20 19 - 9 C-C 21 21 17 : 13 c:c If you step through them you'll see that the closure atoms (2-6, 9-19, and 13-17) are added to the bond list at the end, after processing the atoms which make up the spanning tree. It appears the closure bond have the begin and end atom indices with the largest first, which makes it possible to tell that a given bond is a closure bond. In principle then it should be possible to reorder the bonds to get the order you want. This proved trickier than I could manage in the time I have. Perhaps the better question is, why do you need the bond indices in a specific order? Cheers, Andrew da...@da... from rdkit import Chem bond_symbols = { Chem.BondType.SINGLE: "-", Chem.BondType.DOUBLE: "=", Chem.BondType.TRIPLE: "#", Chem.BondType.AROMATIC: ":", } smi = "CCc1cc[nH]c1CCC1CCC(CC1)c1cc[nH]c1" #smi = "[C@@](F)(Cl)(Br)O" mol1 = Chem.MolFromSmiles(smi) smi_explicit = Chem.MolToSmiles(mol1, allBondsExplicit=True) mol2 = Chem.MolFromSmiles(smi_explicit) def show(bonds): print(" i Bnd# a1 ~ a2 frag") for i, b in enumerate(bonds): a1, a2 = b.GetBeginAtomIdx(), b.GetEndAtomIdx() symbol = bond_symbols[b.GetBondType()] s = Chem.MolFragmentToSmiles(mol2, atomsToUse=[a1, a2], rootedAtAtom=a1, allBondsExplicit=True) print(f"{i:2d} {b.GetIdx():2d} {a1:2d} {symbol} {a2:2d} {s.center(8)}") print(smi_explicit) print("In bondlist order") show(mol2.GetBonds()) |
From: Francois B. <ml...@li...> - 2023-05-18 08:21:30
|
Dear list, I asked this question in rdkit's github discussions: https://github.com/rdkit/rdkit/discussions/6377 But, apparently that's not more responsive than the ML, so here is my question: --- I have a ligand, I would like to score its current conformer using rdkit's UFF implementation. Later, I would like to change some rotatable bond (single bond out of ring) and update the conformer's energy bu just re-evaluating the part of the energy that is supposed to have changed (i.e. the dihedral component). Bond lengths, bond angles and partial charges being constant. I suspect it should be faster than rescoring the conformer from scratch. How to do this with rdkit? --- Thanks a lot, Francois. |
From: Paolo T. <pao...@gm...> - 2023-05-18 08:37:26
|
Hi Francois, I have replied on GitHub ~10’ ago. p. > On 18 May 2023, at 10:23, Francois Berenger <ml...@li...> wrote: > > Dear list, > > I asked this question in rdkit's github discussions: > > https://github.com/rdkit/rdkit/discussions/6377 > > But, apparently that's not more responsive than the ML, so here is my question: > --- > I have a ligand, I would like to score its current conformer using rdkit's UFF implementation. > > Later, I would like to change some rotatable bond (single bond out of ring) and update > the conformer's energy bu just re-evaluating the part of the energy that is supposed to have > changed (i.e. the dihedral component). > Bond lengths, bond angles and partial charges being constant. > > I suspect it should be faster than rescoring the conformer from scratch. > > How to do this with rdkit? > --- > > Thanks a lot, > Francois. > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Geoffrey H. <geo...@gm...> - 2023-05-18 12:04:32
|
But it’s not so simple. If you rotate a dihedral, all the non-bonded VdW interactions also change. Calculating a force field energy at one particular geometry is extremely fast, and the current version allows you to run multiple energies in parallel. Generate a bunch of geometries at once and then score those in parallel. Trying to “update” isn’t worth it, because there are always more non-bonded interactions (n^2) than bonds and angles. -Geoff On May 18, 2023 at 4:24 AM -0400, Francois Berenger <ml...@li...>, wrote: > Dear list, > > I asked this question in rdkit's github discussions: > > https://github.com/rdkit/rdkit/discussions/6377 > > But, apparently that's not more responsive than the ML, so here is my > question: > --- > I have a ligand, I would like to score its current conformer using > rdkit's UFF implementation. > > Later, I would like to change some rotatable bond (single bond out of > ring) and update > the conformer's energy bu just re-evaluating the part of the energy that > is supposed to have > changed (i.e. the dihedral component). > Bond lengths, bond angles and partial charges being constant. > > I suspect it should be faster than rescoring the conformer from scratch. > > How to do this with rdkit? > --- > > Thanks a lot, > Francois. > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |