Thread: [Rdkit-discuss] Can a bond index be associated with order in explicit SMILES?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi RDKit Community,

I am experimenting with explicit bonds in SMILES. From my understanding, when I create a mol object from a SMILES, the atom index order is preserved and corresponds to the order from left to right in the SMILES.

I thought that this might also be the case for bond indices, but that does not appear to be correct (see example below). Is it possible to get a bond index in the order of the SMILES?

Thanks, Vin

smi = "CCc1cc[nH]c1CCC1CCC(CC1)c1cc[nH]c1"
mol1 = Chem.MolFromSmiles(smi)
smi_explicit = Chem.MolToSmiles(mol1, allBondsExplicit=True)
mol2 = Chem.MolFromSmiles(smi_explicit)
print(smi_explicit)

C-C-c1:c:c:[nH]:c:1-C-C-C1-C-C-C(-c2:c:c:[nH]:c:2)-C-C-1

Here is a manual labeling of bond index from left to right and marking aromatic bond locations

C-C-c1:c:c:[nH]:c :1-C-C-C1-C -C -C( -c2 :c :c :[nH]:c :2)-C -C -1
 0 1  2 3 4    5  6 7 8 9  10 11 12   13 14 15 16   17 18 19 20 21
      + + +    +  +                       +  +  +   +   +

However, as you can see below, the actual bond index numbers using SMARTS matching for pattern *:* is as follows:

smarts = '*:*'
atom_idx_matches = mol2.GetSubstructMatches(Chem.MolFromSmarts(smarts))

# get bond idx matches
# from https://www.rdkit.org/docs/Cookbook.html#returning-substructure-matches-as-smiles

def get_bond_idx_matches(smarts, mol, match_atom_indices):
    query_mol = Chem.MolFromSmarts(smarts)
    bond_indices = []
    for query_bond in query_mol.GetBonds():
        atom_index1 = match_atom_indices[query_bond.GetBeginAtomIdx()]
        atom_index2 = match_atom_indices[query_bond.GetEndAtomIdx()]
        bond_indices.append(mol.GetBondBetweenAtoms(
             atom_index1, atom_index2).GetIdx())
    return bond_indices

bond_idx_matches = []
for idx_group in range(len(atom_idx_matches)):
    bond_idx_matches.append(get_bond_idx_matches(smarts,mol2,atom_idx_matches[idx_group]))
print(sorted(bond_idx_matches))

  [[2], [3], [4], [5], [13], [14], [15], [16], [19], [21]]

Thread: [Rdkit-discuss] Can a bond index be associated with order in explicit SMILES?

Open-Source Cheminformatics and Machine Learning

rdkit-discuss