[Rdkit-discuss] Polymers, S-Groups, and molblock-parsing (oh my!)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: James D. <J.D...@ve...> - 2011-10-19 12:20:20
|
Dear All, I just wanted to raise an observation about the behaviour of the molblock parser. I was running some SMARTS-based substructure queries in KNIME, and happened to be looking for aromatic N-oxides - the query was just "nO" - which should maybe be the answer as well! : ) Anyway, I was actually searching DrugBank (via the SDF - http://www.drugbank.ca/system/downloads/current/structures/small_molecul e.sdf.zip) and found Heparin was a hit for my query - which I thought was a bit funny as there are no aromatic nitrogens. It seems, however, that the match is due to the * atoms in the molblock (see below) that are representing the polymer repeat points (leading to *-O, which is matching n-O). As I understand it, the rest of the info about the polymer is stored as S-Group data - and I am presuming that RDKit is not currently interpreting this(?) So I guess the simple question is - should polymers, etc be handled by the parser (maybe if not fully, just partially - eg by deleting the * atoms if the S-Group data are found)? Kind regards James Mrv0541 09201117322D 111114 0 0 1 0 999 V2000 12.8725 -11.1521 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 13.5903 -11.5667 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 12.8725 -10.3272 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 11.8517 -11.7493 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 14.2992 -11.1521 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 13.5903 -12.3914 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.5903 -9.9172 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 12.1547 -9.9172 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10.8307 -12.3335 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 14.2992 -10.3272 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 14.9729 -11.9185 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 11.4415 -10.3272 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 10.8307 -13.1582 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 10.1175 -11.9232 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 15.3200 -9.7433 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 16.1934 -11.9139 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 10.1175 -13.5728 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 11.7684 -14.1445 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 9.3996 -12.3335 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 16.3409 -9.1504 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 16.1889 -12.7387 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 16.1889 -11.0892 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 17.0225 -11.9139 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 9.3996 -13.1582 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 10.1175 -14.3975 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.6864 -11.9232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 16.3409 -8.3257 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 17.0586 -9.5650 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 8.6819 -13.5683 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 7.9730 -12.3335 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.6864 -11.0985 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 17.0586 -7.9154 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 15.6276 -7.9154 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 17.7720 -9.1504 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 17.0586 -10.3942 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 6.9121 -13.5594 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 17.7720 -8.3257 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 14.9099 -8.3257 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 15.6276 -7.0907 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 19.3208 -10.0487 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 18.7974 -7.7326 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 19.8138 -7.1442 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 19.8138 -6.3194 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 20.5314 -7.5589 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 20.5314 -5.9093 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 19.1005 -5.9093 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 21.2449 -7.1442 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 20.5314 -8.3880 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 21.2449 -6.3194 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 18.2713 -5.9004 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 22.5298 -7.7348 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 22.7828 -5.4368 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 17.4465 -5.8959 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 22.5342 -8.5639 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 17.4421 -6.7207 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 17.4421 -5.0712 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 16.6217 -5.8915 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 23.2475 -8.9741 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 21.8165 -8.9741 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 11.7684 -15.0239 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 11.7684 -15.9034 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 12.6642 -15.0236 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 10.9056 -15.0242 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.5903 -13.2709 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 13.5903 -14.1504 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 14.4672 -13.2709 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 12.7084 -13.2708 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 19.3208 -10.9282 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 19.3208 -11.8076 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 20.1836 -10.9285 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 18.4251 -10.9278 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 12.1289 -10.7947 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 14.2749 -12.0271 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 12.8121 -9.5045 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 15.1176 -11.0481 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 12.5405 -9.1879 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 11.7669 -9.1890 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 11.6433 -12.4763 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 14.5818 -9.5522 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 14.6342 -12.6707 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 10.7277 -9.9135 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 10.6535 -13.9639 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 9.4332 -14.0336 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.5870 -12.4756 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 15.5285 -9.0070 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 16.9011 -13.1551 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.5872 -13.0149 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 9.4030 -14.8100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 16.2828 -7.5027 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 17.7429 -10.0258 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 7.2592 -11.9198 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 18.5843 -9.0064 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 17.7730 -10.8067 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 18.5846 -8.4685 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 14.1972 -7.9102 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 19.0013 -7.0009 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 19.7556 -5.4965 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 21.2157 -8.0197 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 19.4052 -5.1427 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 18.9692 -5.0949 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 22.0572 -7.0002 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 19.8169 -8.8005 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 22.0574 -6.4622 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 23.2231 -7.2877 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 18.1544 -7.1370 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 22.8363 -9.6893 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 23.9627 -9.3854 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 23.6588 -8.2589 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 10.4933 -15.7388 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 12.2960 -12.5563 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 18.0123 -11.6421 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 1 4 1 6 0 0 0 2 5 1 0 0 0 0 2 6 1 1 0 0 0 3 7 1 0 0 0 0 3 8 1 1 0 0 0 9 4 1 1 0 0 0 5 10 1 0 0 0 0 5 11 1 6 0 0 0 8 12 1 0 0 0 0 9 13 1 0 0 0 0 9 14 1 0 0 0 0 10 15 1 6 0 0 0 11 16 1 0 0 0 0 13 17 1 0 0 0 0 13 18 1 6 0 0 0 14 19 1 0 0 0 0 20 15 1 6 0 0 0 16 21 1 0 0 0 0 16 22 2 0 0 0 0 16 23 2 0 0 0 0 17 24 1 0 0 0 0 17 25 1 1 0 0 0 19 26 1 6 0 0 0 20 27 1 0 0 0 0 20 28 1 0 0 0 0 24 29 1 6 0 0 0 26 30 1 0 0 0 0 26 31 2 0 0 0 0 27 32 1 0 0 0 0 27 33 1 1 0 0 0 28 34 1 0 0 0 0 28 35 1 1 0 0 0 29 36 1 0 0 0 0 32 37 1 0 0 0 0 33 38 1 0 0 0 0 33 39 2 0 0 0 0 34 40 1 6 0 0 0 37 41 1 1 0 0 0 42 41 1 6 0 0 0 42 43 1 0 0 0 0 42 44 1 0 0 0 0 43 45 1 0 0 0 0 43 46 1 1 0 0 0 44 47 1 0 0 0 0 44 48 1 1 0 0 0 45 49 1 0 0 0 0 46 50 1 0 0 0 0 47 51 1 6 0 0 0 49 52 1 0 0 0 0 50 53 1 0 0 0 0 51 54 1 0 0 0 0 53 55 1 0 0 0 0 53 56 2 0 0 0 0 53 57 2 0 0 0 0 54 58 1 0 0 0 0 54 59 2 0 0 0 0 7 10 1 0 0 0 0 19 24 1 0 0 0 0 34 37 1 0 0 0 0 47 49 1 0 0 0 0 18 60 1 0 0 0 0 60 61 2 0 0 0 0 60 62 2 0 0 0 0 60 63 1 0 0 0 0 6 64 1 0 0 0 0 64 65 2 0 0 0 0 64 66 2 0 0 0 0 64 67 1 0 0 0 0 40 68 1 0 0 0 0 68 69 2 0 0 0 0 68 70 2 0 0 0 0 68 71 1 0 0 0 0 1 72 1 1 0 0 0 2 73 1 6 0 0 0 3 74 1 6 0 0 0 5 75 1 1 0 0 0 8 76 1 0 0 0 0 8 77 1 0 0 0 0 9 78 1 6 0 0 0 10 79 1 1 0 0 0 11 80 1 0 0 0 0 12 81 1 0 0 0 0 13 82 1 1 0 0 0 17 83 1 6 0 0 0 19 84 1 1 0 0 0 20 85 1 1 0 0 0 21 86 1 0 0 0 0 24 87 1 1 0 0 0 25 88 1 0 0 0 0 27 89 1 6 0 0 0 28 90 1 6 0 0 0 30 91 1 0 0 0 0 34 92 1 1 0 0 0 35 93 1 0 0 0 0 37 94 1 6 0 0 0 38 95 1 0 0 0 0 42 96 1 1 0 0 0 43 97 1 6 0 0 0 44 98 1 6 0 0 0 46 99 1 0 0 0 0 46100 1 0 0 0 0 47101 1 1 0 0 0 48102 1 0 0 0 0 49103 1 0 0 0 0 51104 1 0 0 0 0 55105 1 0 0 0 0 58106 1 0 0 0 0 58107 1 0 0 0 0 58108 1 0 0 0 0 63109 1 0 0 0 0 67110 1 0 0 0 0 71111 1 0 0 0 0 M STY 1 1 SRU M SCN 1 1 HT M SAL 1 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 M SAL 1 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 M SAL 1 15 31 32 33 34 35 37 38 39 40 41 42 43 44 45 46 M SAL 1 15 47 48 49 50 51 53 54 55 56 57 58 59 60 61 62 M SAL 1 15 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 M SAL 1 15 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 M SAL 1 15 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 M SAL 1 4 108 109 110 111 M SDI 1 4 6.8467 -4.6587 6.8467 -16.3159 M SDI 1 4 24.3752 -16.3159 24.3752 -4.6587 M SBL 1 2 35 51 M SMT 1 n M END > <DRUGBANK_ID> DB01109 > <DRUG_GROUPS> approved; investigational > <GENERIC_NAME> Heparin > <SYNONYMS> Alpha-Heparin; Heparin sodium; Heparin sodium preservative Free; Heparin sodium salt; Heparin sulfate; Heparinate; Heparinic acid; Sodium heparin; heparin > <BRANDS> Ariven; Arteven; Calcilean; Calciparine; Certoparin; Depo-Heparin; Eparina [DCIT]; Hed-Heparin; Hepalean; Heparin Cy 216; Heparin Leo; Heparin Lock Flush; Hepathrom; Leparan; Lipo-Hepin; Liquaemin; Liquaemin Sodium; Liquemin; Multiparin; Novoheparin; Pabyrin; Parvoparin; Pularin; Thromboliquine; Vetren > <CHEMICAL_FORMULA> C26H42N2O37S5 > <MOLECULAR_WEIGHT> 1134.928 > <EXACT_MASS> 1134.006993818 > <IUPAC_NAME> 3-[(5-{[6-carboxy-4,5-dihydroxy-3-(sulfooxy)oxan-2-yl]oxy}-6-(hydroxymet hyl)-3-(sulfoamino)-4-(sulfooxy)oxan-2-yl)oxy]-6-({5-acetamido-4,6-dihyd roxy-2-[(sulfooxy)methyl]oxan-3-yl}oxy)-4-hydroxy-5-(sulfooxy)oxane-2-ca rboxylic acid > <INCHI_IDENTIFIER> InChI=1S/C26H42N2O37S5/c1-4(30)27-7-9(31)13(6(56-23(7)39)3-55-67(43,44)4 5)58-26-19(65-70(52,53)54)12(34)16(20(62-26)22(37)38)60-24-8(28-66(40,41 )42)15(63-68(46,47)48)14(5(2-29)57-24)59-25-18(64-69(49,50)51)11(33)10(3 2)17(61-25)21(35)36/h5-20,23-26,28-29,31-34,39H,2-3H2,1H3,(H,27,30)(H,35 ,36)(H,37,38)(H,40,41,42)(H,43,44,45)(H,46,47,48)(H,49,50,51)(H,52,53,54 ) > <INCHI_KEY> InChIKey=HTTJABKRGRZYRN-UHFFFAOYSA-N > <SMILES> CC(=O)NC1C(O)OC(COS(O)(=O)=O)C(OC2OC(C(OC3OC(CO)C(OC4OC(C(O)C(O)C4OS(O)( =O)=O)C(O)=O)C(OS(O)(=O)=O)C3NS(O)(=O)=O)C(O)C2OS(O)(=O)=O)C(O)=O)C1O > <JCHEM_ACCEPTOR_COUNT> 33.0 > <JCHEM_DONOR_COUNT> 15.0 > <JCHEM_ACIDIC_PKA> -2.37 > <ALOGPS_LOGP> -1.68 > <JCHEM_LOGP> -8.35 > <ALOGPS_LOGS> -2.02 > <JCHEM_POLARIZABILITY> 93.37 > <JCHEM_POLAR_SURFACE_AREA> 610.49 > <JCHEM_REFRACTIVITY> 195.91 > <JCHEM_ROTATABLE_BOND_COUNT> 20 > <ALOGPS_SOLUBILITY> 1.08e+01 g/l $$$$ ______________________________________________________________________ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or pos...@ve.... Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies Oakdene Court 613 Reading Road Winnersh, Berkshire RG41 5UA. Tel: +44 118 977 3133 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page.. ______________________________________________________________________ |