[Rdkit-discuss] SMARTS/SMIRKS Canonicalisation in RdKit
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: <amo...@dc...> - 2018-12-06 17:33:32
|
Hi All,
I am trying to standardize the some SMIRKS patterns. Currently after writing out the smirks pattern, I am splitting into individual molecules (SMARTS), parsing them into RdKit as molecules, and writing them back out to SMARTS. After which, i join the SMARTS back together to obtain the 'standarized' SMIRKS pattern.
In the examples below, i have highlighted (in red) the parts that differ between the two SMIRKS patterns, both before and after processing with RdKit. I am aware that RdKit has also changed the semi-colon (";") to the ampersand ("&"), but have not highlighted it as a change.
Is there a way to standardize the SMARTS pattern in RdKit?
I know you can canonicalize for SMILES by turning the flag on, but i'm not aware of such a feature for SMARTS patterns.
Example SMARTS:
SMIRKS_A:
[c;H0;+0:6]-[c;H0;+0:5]1:[n;H0;+0:4]:[nH;+0:1]:[n;H0;+0:2]:[n;H0;+0:3]:1>>[N-;H0:1]=[N+;H0:2]=[N-;H0:3].[N;H0;+0:4]#[C;H0;+0:5]-[c;H0;+0:6]
SMIRKS_B:
[c;H0;+0:6]-[c;H0;+0:5]1:[n;H0;+0:4]:[n;H0;+0:1]:[n;H0;+0:2]:[nH;+0:3]:1>>[N-;H0:1]=[N+;H0:2]=[N-;H0:3].[N;H0;+0:4]#[C;H0;+0:5]-[c;H0;+0:6]
SMIRKS_A post RdKit:
[c&H0&+0:6]-[c&H0&+0:5]1:[n&H0&+0:4]:[n&H1&+0:1]:[n&H0&+0:2]:[n&H0&+0:3]:1>>[N&-&H0:1]=[N&+&H0:2]=[N&-&H0:3].[N&H0&+0:4]#[C&H0&+0:5]-[c&H0&+0:6]
SMIRKS_B post RdKit:
[c&H0&+0:6]-[c&H0&+0:5]1:[n&H0&+0:4]:[n&H0&+0:1]:[n&H0&+0:2]:[n&H1&+0:3]:1>>[N&-&H0:1]=[N&+&H0:2]=[N&-&H0:3].[N&H0&+0:4]#[C&H0&+0:5]-[c&H0&+0:6]
Thanks for the help,
-Amol
|