rdkit-discuss Mailing List for RDKit (Page 9)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
|
2025 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Francois B. <ml...@li...> - 2023-05-10 01:42:27
|
Hello, Maybe you can use this: Chem.MolToSmiles(mol, allHsExplicit=True) This will place each heavy atom between '[' and ']' and give you the number of hydrogens for each. It get easier to work with SMILES strings after this (you don't need anymore a full blown SMILES parser). Regards, F. On 09/05/2023 14:55, Haijun Feng wrote: > [1] > > Hi All, > > I am trying to add atom numbers in smiles as belows, > > from rdkit import Chem > mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') > for i, atom in enumerate(mol.GetAtoms()): > atom.SetProp('molAtomMapNumber',str(i)) > smi=Chem.MolToSmiles(mol) > print(smi) > > the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8] > > then I want to split the smiles into atoms, I did it like this: > > from rdkit import Chem > mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') > for i, atom in enumerate(mol.GetAtoms()): > atom.SetProp('molAtomMapNumber',str(i)) > print(i,atom.GetSymbol()) > > the output is: > > 0 C > 1 C > 2 C > 3 C > 4 C > 5 C > 6 C > 7 N > 8 O > > But what I do want is something like this (with fragments instead of > atoms): > > 0 cH > 1 CH > ... > 7 NH2 > 8 O > > Can anyone help me figure out how to get each atom with H from the > smiles as above. Thanks so much! > > best, > > Hal > > Links: > ------ > [1] https://stackoverflow.com/posts/76197437/timeline > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Andrew D. <da...@da...> - 2023-05-09 12:55:26
|
On May 9, 2023, at 07:55, Haijun Feng <hai...@gm...> wrote: > Can anyone help me figure out how to get each atom with H from the smiles as above. Thanks so much! Try using Chem.MolFragmentToSmiles to get the SMILES for each atom, with all hydrogens explicit, then strip off the leading and trailing []s. from rdkit import Chem mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') for i, atom in enumerate(mol.GetAtoms()): atom_smi = Chem.MolFragmentToSmiles(mol, allHsExplicit=True, atomsToUse=[atom.GetIdx()]) print(i, atom_smi.strip("[]")) This prints 0 cH 1 cH 2 cH 3 cH 4 cH 5 c 6 C 7 NH2 8 O Your code showed you using atom.SetProp('molAtomMapNumber',str(i)) In the following, I'll set that property *after* getting the atom SMILES, so the map is not included as part of the output: from rdkit import Chem mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') for i, atom in enumerate(mol.GetAtoms()): atom_smi = Chem.MolFragmentToSmiles(mol, allHsExplicit=True, atomsToUse=[atom.GetIdx()]) print(i, atom_smi.strip("[]")) atom.SetIntProp("molAtomMapNumber", i) print(Chem.MolToSmiles(mol)) which gives the output 0 cH 1 cH 2 cH 3 cH 4 cH 5 c 6 C 7 NH2 8 O [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8] > the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8] For what it's worth, I get the slightly different: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8] You should be aware that the input order and the output SMILES order might be different. Because of the simpler structure of your preferred output SMILES format, you can alternatively extract the atom terms from the output string by looking for the substrings inside of the []s, as in the following: import re >>> re.compile(r'\[[^]]+\]').findall("[cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8]") ['[cH:0]', '[cH:1]', '[cH:2]', '[cH:3]', '[cH:4]', '[c:5]', '[C:6]', '[NH2:7]', '[O:8]'] This list will exactly match the output SMILES atom order. Cheers, Andrew da...@da... |
From: Wim D. <wim...@gm...> - 2023-05-09 09:43:01
|
Hi, I think if you simply need H and the H count appended it is by far the easiest by just appending it to the symbol string. See the codeblock below: def get_symbol_with_Hs(a): symbol=a.GetSymbol() charge=a.GetFormalCharge() hcount=a.GetTotalNumHs() if hcount > 0: symbol+="H" if hcount > 1: symbol+=str(hcount) if charge==1: symbol+="+" if charge==-1: symbol+="-" if charge > 1: symbol+=f"(+{charge})" if charge < -1: symbol+=f"(-{charge})" return symbol mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') for i, atom in enumerate(mol.GetAtoms()): atom.SetProp('molAtomMapNumber',str(i)) print(i,get_symbol_with_Hs(atom)) ----- another way I would recommend is using smiles and explicit hydrogens (i.e. bracketed) instead. For your use case I would imagine this as follows: from rdkit import Chem mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') mol=Chem.AddHs(mol) rwmol=Chem.RWMol(mol) for b in list(rwmol.GetBonds()): ba=b.GetBeginAtom() ea=b.GetEndAtom() if ba.GetAtomicNum()!=1 and ea.GetAtomicNum()!=1: rwmol.RemoveBond(ba.GetIdx(),ea.GetIdx()) frags=Chem.GetMolFrags(rwmol, asMols=True,sanitizeFrags=False) for i,f in enumerate(frags): print(i,Chem.MolToSmiles(f)) this would output 0 [H]c 1 [H]c 2 [H]c 3 [H]c 4 [H]c 5 c 6 C 7 [H]N[H] 8 O i hope that helps. best wishes wim On Tue, May 9, 2023 at 7:58 AM Haijun Feng <hai...@gm...> wrote: > > <https://stackoverflow.com/posts/76197437/timeline> > > Hi All, > > I am trying to add atom numbers in smiles as belows, > > from rdkit import Chem > mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') > for i, atom in enumerate(mol.GetAtoms()): > atom.SetProp('molAtomMapNumber',str(i)) > smi=Chem.MolToSmiles(mol) > print(smi) > > the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8] > > then I want to split the smiles into atoms, I did it like this: > > from rdkit import Chem > mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') > for i, atom in enumerate(mol.GetAtoms()): > atom.SetProp('molAtomMapNumber',str(i)) > print(i,atom.GetSymbol()) > > the output is: > > 0 C > 1 C > 2 C > 3 C > 4 C > 5 C > 6 C > 7 N > 8 O > > *But what I do want is something like this (with fragments instead of > atoms): * > > > > > > > *0 cH1 CH...7 NH28 O * > > Can anyone help me figure out how to get each atom with H from the smiles > as above. Thanks so much! > > > best, > > > Hal > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Santiago F. <san...@me...> - 2023-05-09 07:03:31
|
Thank you for your answer. I will try the new version. Regards Santiago ________________________________ De: David Cosgrove <dav...@gm...> Enviado: jueves, 4 de mayo de 2023 11:29 Para: Santiago Fraga <san...@me...> Cc: Wim Dehaen <wim...@gm...>; RDKit Discuss <rdk...@li...> Asunto: Re: [Rdkit-discuss] Molfile from smiles As part of the work on improving the way RDKit handles organometallics that is in the latest release, there is MolOps::cleanUpOrganometallics, which attempts to do the bond transformations in a similar way to that gist. The intention was that this would be part of the default sanitization, but late in the day it was discovered that it didn't work well with compounds with 2 metal atoms and bridging chlorine atoms, such as 'F[Pd]1(Cl)Cl->[Pd](Cl)(Cl)<-Cl1'. It's my intention to fix that at some point in the near future, but in the meantime if you're working in C++ it is available for use with caveats. Worth a try in this case, perhaps. Dave On Thu, May 4, 2023 at 9:51 AM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote: Good morning Wim Yes, I know that the original smiles has problems with the dative bonds. I am trying to load the molecule and then fix those bonds using this solution: https://gist.github.com/greglandrum/6cd7aadcdedb1ebcafa9537e8a47e3a4 And then generate a new molfile. I will try to apply your code to see if I can improve the molecule depiction. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:+sa...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Wim Dehaen <wim...@gm...<mailto:wim...@gm...>> Enviado: martes, 2 de mayo de 2023 21:37 Para: Santiago Fraga <san...@me...<mailto:san...@me...>> Cc: Ling Chan <lin...@gm...<mailto:lin...@gm...>>; RDKit Discuss <rdk...@li...<mailto:rdk...@li...>> Asunto: Re: [Rdkit-discuss] Molfile from smiles Hi all, unfortunately I can't offer a "fix" but I can offer these minor comments: -it seems like the SMILES has some parsing error. You can make uses of RDKits extension for dative bonds in SMILES ("->") and replace the SMILES with the below, which will parse, and give (what i assume is) the intended structure: "C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1" -more fundamentally, i think the reason this molecule is hard to render is because, as a hexavalent iridium complex it is more fundamentally 3-dimensional and therefore tougher to sketch. you can see here on wikipedia Ir(ppy)3 even when manually sketched looks a bit funny: https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png -in general, organometallic species have various limitations when it comes to their handling by cheminformatics packages. for this reason, some care is needed when dealing with species like this to make sure you won't have issues down the line. an overview of some rdkit related ones see this presentation by prof jan jensen: https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf Finally, if i embed the molecule and then display its 2D projection, it actually looks pretty good (despite a warning UFF doesnt recognize iridium). See below: [image.png] This was generated using the following codeblock (in Python, not C++, sorry for that): mol = Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True) mol = Chem.AddHs(mol) AllChem.EmbedMolecule(mol,randomSeed=0xf00d) mol = Chem.RemoveHs(mol) display(mol) best wishes wim On Tue, May 2, 2023 at 5:06 PM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote: Thanks for your answer, Ling Chan. But I am already using that option with the C++ API. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:+sa...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Ling Chan <lin...@gm...<mailto:lin...@gm...>> Enviado: martes, 2 de mayo de 2023 4:15 Para: Santiago Fraga <san...@me...<mailto:san...@me...>> Cc: RDKit Discuss <rdk...@li...<mailto:rdk...@li...>> Asunto: Re: [Rdkit-discuss] Molfile from smiles Hello Santiago, In case you are still looking for an answer, somewhere in my notes I wrote the following. to get a better depiction of complicated topology, do this before rendering. from rdkit.Chem import rdDepictor rdDepictor.SetPreferCoordGen(True) Sometimes it helps. Good luck. Ling Santiago Fraga <san...@me...<mailto:san...@me...>> 於 2023年4月21日週五 上午2:17寫道: Good morning I am trying to generate a molfile from smiles, using the RDKit C++ implementation. But in some cases the result molfile is like the one in the attached image. My code is something like this: string molecule = "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); mol->updatePropertyCache(false); RDDepict::preferCoordGen = true; RDDepict::compute2DCoords(*mol); string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) How could I fix the molfile? Regards Santiago _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk |
From: Haijun F. <hai...@gm...> - 2023-05-09 05:55:55
|
<https://stackoverflow.com/posts/76197437/timeline> Hi All, I am trying to add atom numbers in smiles as belows, from rdkit import Chem mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') for i, atom in enumerate(mol.GetAtoms()): atom.SetProp('molAtomMapNumber',str(i)) smi=Chem.MolToSmiles(mol) print(smi) the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8] then I want to split the smiles into atoms, I did it like this: from rdkit import Chem mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') for i, atom in enumerate(mol.GetAtoms()): atom.SetProp('molAtomMapNumber',str(i)) print(i,atom.GetSymbol()) the output is: 0 C 1 C 2 C 3 C 4 C 5 C 6 C 7 N 8 O *But what I do want is something like this (with fragments instead of atoms): * *0 cH1 CH...7 NH28 O * Can anyone help me figure out how to get each atom with H from the smiles as above. Thanks so much! best, Hal |
From: David C. <dav...@gm...> - 2023-05-04 09:29:51
|
As part of the work on improving the way RDKit handles organometallics that is in the latest release, there is MolOps::cleanUpOrganometallics, which attempts to do the bond transformations in a similar way to that gist. The intention was that this would be part of the default sanitization, but late in the day it was discovered that it didn't work well with compounds with 2 metal atoms and bridging chlorine atoms, such as 'F[Pd]1(Cl)Cl->[Pd](Cl)(Cl)<-Cl1'. It's my intention to fix that at some point in the near future, but in the meantime if you're working in C++ it is available for use with caveats. Worth a try in this case, perhaps. Dave On Thu, May 4, 2023 at 9:51 AM Santiago Fraga <san...@me...> wrote: > Good morning Wim > Yes, I know that the original smiles has problems with the dative > bonds. > I am trying to load the molecule and then fix those bonds using this > solution: > https://gist.github.com/greglandrum/6cd7aadcdedb1ebcafa9537e8a47e3a4 > > And then generate a new molfile. I will try to apply your code to > see if I can improve the molecule > depiction. > > Regards > Santiago > > <http://www.mestrelab.com> > > SANTIAGO FRAGA > *Software Developer* > san...@me... <+sa...@me...> > > *MESTRELAB RESEARCH S.L.* > PHONE *+34881976775* > FAX *+34981941079* > Feliciano Barrera, 9B-Bajo 15706 > Santiago de Compostela (SPAIN) > > Follow us: > [image: Mestrelab Twitter] <https://twitter.com/mestrelab> [image: > Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> > [image: Canal de YouTube Mestrelab] > <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [image: > MestreBlog] <http://mestrelab.com/blog/> > > > > ------------------------------ > *De:* Wim Dehaen <wim...@gm...> > *Enviado:* martes, 2 de mayo de 2023 21:37 > *Para:* Santiago Fraga <san...@me...> > *Cc:* Ling Chan <lin...@gm...>; RDKit Discuss < > rdk...@li...> > *Asunto:* Re: [Rdkit-discuss] Molfile from smiles > > Hi all, > unfortunately I can't offer a "fix" but I can offer these minor comments: > -it seems like the SMILES has some parsing error. You can make uses of > RDKits extension for dative bonds in SMILES ("->") and replace the SMILES > with the below, which will parse, and give (what i assume is) the intended > structure: > > "C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1" > -more fundamentally, i think the reason this molecule is hard to render is > because, as a hexavalent iridium complex it is more fundamentally > 3-dimensional and therefore tougher to sketch. you can see here on > wikipedia Ir(ppy)3 even when manually sketched looks a bit funny: > > https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png > -in general, organometallic species have various limitations when it comes > to their handling by cheminformatics packages. for this reason, some care > is needed when dealing with species like this to make sure you won't have > issues down the line. an overview of some rdkit related ones see this > presentation by prof jan jensen: > https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf > > Finally, if i embed the molecule and then display its 2D projection, it > actually looks pretty good (despite a warning UFF doesnt recognize > iridium). See below: > [image: image.png] > This was generated using the following codeblock (in Python, not C++, > sorry for that): > > mol = > Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True) > mol = Chem.AddHs(mol) > AllChem.EmbedMolecule(mol,randomSeed=0xf00d) > mol = Chem.RemoveHs(mol) > display(mol) > > best wishes > wim > > On Tue, May 2, 2023 at 5:06 PM Santiago Fraga <san...@me...> > wrote: > > Thanks for your answer, Ling Chan. > But I am already using that option with the C++ API. > > Regards > Santiago > > <http://www.mestrelab.com> > > SANTIAGO FRAGA > *Software Developer* > san...@me... <+sa...@me...> > > *MESTRELAB RESEARCH S.L.* > PHONE *+34881976775* > FAX *+34981941079* > Feliciano Barrera, 9B-Bajo 15706 > Santiago de Compostela (SPAIN) > > Follow us: > [image: Mestrelab Twitter] <https://twitter.com/mestrelab> [image: > Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> > [image: Canal de YouTube Mestrelab] > <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [image: > MestreBlog] <http://mestrelab.com/blog/> > > > > ------------------------------ > *De:* Ling Chan <lin...@gm...> > *Enviado:* martes, 2 de mayo de 2023 4:15 > *Para:* Santiago Fraga <san...@me...> > *Cc:* RDKit Discuss <rdk...@li...> > *Asunto:* Re: [Rdkit-discuss] Molfile from smiles > > Hello Santiago, > > In case you are still looking for an answer, somewhere in my notes I wrote > the following. > > to get a better depiction of complicated topology, do this before > rendering. > from rdkit.Chem import rdDepictor > rdDepictor.SetPreferCoordGen(True) > > Sometimes it helps. Good luck. > > Ling > > > > Santiago Fraga <san...@me...> 於 2023年4月21日週五 上午2:17寫道: > > Good morning > > I am trying to generate a molfile from smiles, using the RDKit > C++ implementation. > But in some cases the result molfile is like the one in the > attached image. > > My code is something like this: > > string molecule = > "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; > RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); > mol->updatePropertyCache(false); > RDDepict::preferCoordGen = true; > RDDepict::compute2DCoords(*mol); > string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) > > > How could I fix the molfile? > > Regards > Santiago > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk |
From: Santiago F. <san...@me...> - 2023-05-04 08:48:44
|
Good morning Wim Yes, I know that the original smiles has problems with the dative bonds. I am trying to load the molecule and then fix those bonds using this solution: https://gist.github.com/greglandrum/6cd7aadcdedb1ebcafa9537e8a47e3a4 And then generate a new molfile. I will try to apply your code to see if I can improve the molecule depiction. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Wim Dehaen <wim...@gm...> Enviado: martes, 2 de mayo de 2023 21:37 Para: Santiago Fraga <san...@me...> Cc: Ling Chan <lin...@gm...>; RDKit Discuss <rdk...@li...> Asunto: Re: [Rdkit-discuss] Molfile from smiles Hi all, unfortunately I can't offer a "fix" but I can offer these minor comments: -it seems like the SMILES has some parsing error. You can make uses of RDKits extension for dative bonds in SMILES ("->") and replace the SMILES with the below, which will parse, and give (what i assume is) the intended structure: "C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1" -more fundamentally, i think the reason this molecule is hard to render is because, as a hexavalent iridium complex it is more fundamentally 3-dimensional and therefore tougher to sketch. you can see here on wikipedia Ir(ppy)3 even when manually sketched looks a bit funny: https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png -in general, organometallic species have various limitations when it comes to their handling by cheminformatics packages. for this reason, some care is needed when dealing with species like this to make sure you won't have issues down the line. an overview of some rdkit related ones see this presentation by prof jan jensen: https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf Finally, if i embed the molecule and then display its 2D projection, it actually looks pretty good (despite a warning UFF doesnt recognize iridium). See below: [image.png] This was generated using the following codeblock (in Python, not C++, sorry for that): mol = Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True) mol = Chem.AddHs(mol) AllChem.EmbedMolecule(mol,randomSeed=0xf00d) mol = Chem.RemoveHs(mol) display(mol) best wishes wim On Tue, May 2, 2023 at 5:06 PM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote: Thanks for your answer, Ling Chan. But I am already using that option with the C++ API. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:+sa...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Ling Chan <lin...@gm...<mailto:lin...@gm...>> Enviado: martes, 2 de mayo de 2023 4:15 Para: Santiago Fraga <san...@me...<mailto:san...@me...>> Cc: RDKit Discuss <rdk...@li...<mailto:rdk...@li...>> Asunto: Re: [Rdkit-discuss] Molfile from smiles Hello Santiago, In case you are still looking for an answer, somewhere in my notes I wrote the following. to get a better depiction of complicated topology, do this before rendering. from rdkit.Chem import rdDepictor rdDepictor.SetPreferCoordGen(True) Sometimes it helps. Good luck. Ling Santiago Fraga <san...@me...<mailto:san...@me...>> 於 2023年4月21日週五 上午2:17寫道: Good morning I am trying to generate a molfile from smiles, using the RDKit C++ implementation. But in some cases the result molfile is like the one in the attached image. My code is something like this: string molecule = "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); mol->updatePropertyCache(false); RDDepict::preferCoordGen = true; RDDepict::compute2DCoords(*mol); string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) How could I fix the molfile? Regards Santiago _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Gustavo S. <gus...@gm...> - 2023-05-03 19:26:23
|
Hi Guys, I'm sorry it took me this long to try it... But I could finally get to it, and it works well now. Thanks for your help! -- Gustavo Seabra. On Tue, Apr 11, 2023 at 3:19 AM Jan Halborg Jensen <jhj...@ch...> wrote: > Hi Gustavo > > raw_mol = Chem.MolFromXYZFile('acetate.xyz') > mol = Chem.Mol(raw_mol) > rdDetermineBonds.DetermineBonds(mol,charge=-1) > > Best regards, Jan > > On 7 Apr 2023, at 22.57, Gustavo Seabra <gus...@gm...> wrote: > > Hi everyone, > > I'm having difficulties using RDKit to read molecules from an XYZ file, > and I would really appreciate some help. > > The problem is that whenever i read a molecule from an XYZ file, I get > just a disconnected clump of atoms, not a molecule. For example: the > following code: > > import rdkit > from rdkit import Chem > from rdkit.Chem import Draw, rdmolfiles > mol = Chem.MolFromSmiles('COC1=C(O)C[C@@](O)(CO)CC1=O') > mol = Chem.AddHs(mol) > mol > > <image.png> > > Chem.AllChem.EmbedMolecule(mol) > Chem.MolToXYZFile(mol, "rdkit_mol.xyz") > mol2 = Chem.MolFromXYZFile('rdkit_mol.xyz') > mol2 > <image.png> > Is there a bug on the XYZ code, or am I missing something? > > Thanks! > -- > Gustavo Seabra. > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > > https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=05%7C01%7Cjhjensen%40chem.ku.dk%7Ca747765687134eda68a708db37ab1ba1%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C638164980266752900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FKeB%2FR%2FQzRDYIe9zpZfKMqbjNYULOH4VQ5jhfJmxK6I%3D&reserved=0 > > > |
From: Wim D. <wim...@gm...> - 2023-05-02 19:38:08
|
Hi all, unfortunately I can't offer a "fix" but I can offer these minor comments: -it seems like the SMILES has some parsing error. You can make uses of RDKits extension for dative bonds in SMILES ("->") and replace the SMILES with the below, which will parse, and give (what i assume is) the intended structure: "C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1" -more fundamentally, i think the reason this molecule is hard to render is because, as a hexavalent iridium complex it is more fundamentally 3-dimensional and therefore tougher to sketch. you can see here on wikipedia Ir(ppy)3 even when manually sketched looks a bit funny: https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png -in general, organometallic species have various limitations when it comes to their handling by cheminformatics packages. for this reason, some care is needed when dealing with species like this to make sure you won't have issues down the line. an overview of some rdkit related ones see this presentation by prof jan jensen: https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf Finally, if i embed the molecule and then display its 2D projection, it actually looks pretty good (despite a warning UFF doesnt recognize iridium). See below: [image: image.png] This was generated using the following codeblock (in Python, not C++, sorry for that): mol = Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True) mol = Chem.AddHs(mol) AllChem.EmbedMolecule(mol,randomSeed=0xf00d) mol = Chem.RemoveHs(mol) display(mol) best wishes wim On Tue, May 2, 2023 at 5:06 PM Santiago Fraga <san...@me...> wrote: > Thanks for your answer, Ling Chan. > But I am already using that option with the C++ API. > > Regards > Santiago > > <http://www.mestrelab.com> > > SANTIAGO FRAGA > *Software Developer* > san...@me... <+sa...@me...> > > *MESTRELAB RESEARCH S.L.* > PHONE *+34881976775* > FAX *+34981941079* > Feliciano Barrera, 9B-Bajo 15706 > Santiago de Compostela (SPAIN) > > Follow us: > [image: Mestrelab Twitter] <https://twitter.com/mestrelab> [image: > Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> > [image: Canal de YouTube Mestrelab] > <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [image: > MestreBlog] <http://mestrelab.com/blog/> > > > > ------------------------------ > *De:* Ling Chan <lin...@gm...> > *Enviado:* martes, 2 de mayo de 2023 4:15 > *Para:* Santiago Fraga <san...@me...> > *Cc:* RDKit Discuss <rdk...@li...> > *Asunto:* Re: [Rdkit-discuss] Molfile from smiles > > Hello Santiago, > > In case you are still looking for an answer, somewhere in my notes I wrote > the following. > > to get a better depiction of complicated topology, do this before > rendering. > from rdkit.Chem import rdDepictor > rdDepictor.SetPreferCoordGen(True) > > Sometimes it helps. Good luck. > > Ling > > > > Santiago Fraga <san...@me...> 於 2023年4月21日週五 上午2:17寫道: > > Good morning > > I am trying to generate a molfile from smiles, using the RDKit > C++ implementation. > But in some cases the result molfile is like the one in the > attached image. > > My code is something like this: > > string molecule = > "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; > RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); > mol->updatePropertyCache(false); > RDDepict::preferCoordGen = true; > RDDepict::compute2DCoords(*mol); > string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) > > > How could I fix the molfile? > > Regards > Santiago > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Santiago F. <san...@me...> - 2023-05-02 14:59:44
|
Thanks for your answer, Ling Chan. But I am already using that option with the C++ API. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Ling Chan <lin...@gm...> Enviado: martes, 2 de mayo de 2023 4:15 Para: Santiago Fraga <san...@me...> Cc: RDKit Discuss <rdk...@li...> Asunto: Re: [Rdkit-discuss] Molfile from smiles Hello Santiago, In case you are still looking for an answer, somewhere in my notes I wrote the following. to get a better depiction of complicated topology, do this before rendering. from rdkit.Chem import rdDepictor rdDepictor.SetPreferCoordGen(True) Sometimes it helps. Good luck. Ling Santiago Fraga <san...@me...<mailto:san...@me...>> 於 2023年4月21日週五 上午2:17寫道: Good morning I am trying to generate a molfile from smiles, using the RDKit C++ implementation. But in some cases the result molfile is like the one in the attached image. My code is something like this: string molecule = "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); mol->updatePropertyCache(false); RDDepict::preferCoordGen = true; RDDepict::compute2DCoords(*mol); string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) How could I fix the molfile? Regards Santiago _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Ling C. <lin...@gm...> - 2023-05-02 02:16:03
|
Hello Santiago, In case you are still looking for an answer, somewhere in my notes I wrote the following. to get a better depiction of complicated topology, do this before rendering. from rdkit.Chem import rdDepictor rdDepictor.SetPreferCoordGen(True) Sometimes it helps. Good luck. Ling Santiago Fraga <san...@me...> 於 2023年4月21日週五 上午2:17寫道: > Good morning > > I am trying to generate a molfile from smiles, using the RDKit > C++ implementation. > But in some cases the result molfile is like the one in the > attached image. > > My code is something like this: > > string molecule = > "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; > RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); > mol->updatePropertyCache(false); > RDDepict::preferCoordGen = true; > RDDepict::compute2DCoords(*mol); > string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) > > > How could I fix the molfile? > > Regards > Santiago > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Thomas <odi...@gm...> - 2023-05-01 10:51:46
|
Thank you WIm for your clarification: so the library is not inferring the valence, it's "choosing" one. I think I've found the solution to my issue: I should probably use rdkit.MolFromSmarts() mol = chem.rdkit.MolFromSmarts('CCS=O') chem.rdkit.MolToSmiles(mol) 'CCS=O' Sometimes, just explaining your problem to others, helps finding the solution. Thomas Il giorno sab 29 apr 2023 alle ore 20:45 Wim Dehaen <wim...@gm...> ha scritto: > THe reason for this is that it will prevent ambiguities due to > nonstandard, higher valences. Because of this, it is not possible to infer > the implicit hydrogen count, so it must be specified explicitly. For S and > P the standard valence would be 2 and 3 respectively, just like for O and > N. But S has nonstandard valences available: 4 and 6 as in sulfones and > sulfoxides. P can commonly have valence of 5, as in phosphoranes. > Your provided SMILES has a valence of at least 3, exceeding the standard > valence of 2. This creates and ambiguity, where the SMILES parser has to > decide whether the S has a valence of 4 or 6. Likewise, with the SMILES > "FP(F)(F)F" a roundtrip through rdkit will convert this into > "F[PH](F)(F)F", this means the notation is consistent with F[PH2](F)F and > distinguishable from FP(F)F. In general when higher valence states are not > possible rdkit will throw a valence error but there are some more examples > available. For example "CIC" will become C[IH]C. > > best wishes > wim > > > On Sat, Apr 29, 2023 at 12:20 PM Thomas <odi...@gm...> wrote: > >> I am not a chemist, so it can be a silly question, but I am interested in >> the logic behind it, also because other libraries (like OpenBabel) behave >> differently. >> >> Why sometimes RDKit writes hydrogens explicitly? >> >> mol = rdkit.MolFromSmiles('CCS=O', sanitize=False) >> rdkit.MolToSmiles(mol) >> 'CC[SH]=O' >> >> The input SMILES is intended as a pattern, not a molecule. I make a mol >> out of it only to get the canonical SMILES, that will be then used as >> SMARTS. >> Logically, I don't understand how the number of H attached to the S can >> be "guessed" by the library, still it cannot be left implicit. >> >> Furthermore, I have seen this behaviour only with S and P. I was >> wondering if it's a confined issue, or it can happen with any element. >> Thank you >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > |
From: Wim D. <wim...@gm...> - 2023-04-29 18:45:28
|
THe reason for this is that it will prevent ambiguities due to nonstandard, higher valences. Because of this, it is not possible to infer the implicit hydrogen count, so it must be specified explicitly. For S and P the standard valence would be 2 and 3 respectively, just like for O and N. But S has nonstandard valences available: 4 and 6 as in sulfones and sulfoxides. P can commonly have valence of 5, as in phosphoranes. Your provided SMILES has a valence of at least 3, exceeding the standard valence of 2. This creates and ambiguity, where the SMILES parser has to decide whether the S has a valence of 4 or 6. Likewise, with the SMILES "FP(F)(F)F" a roundtrip through rdkit will convert this into "F[PH](F)(F)F", this means the notation is consistent with F[PH2](F)F and distinguishable from FP(F)F. In general when higher valence states are not possible rdkit will throw a valence error but there are some more examples available. For example "CIC" will become C[IH]C. best wishes wim On Sat, Apr 29, 2023 at 12:20 PM Thomas <odi...@gm...> wrote: > I am not a chemist, so it can be a silly question, but I am interested in > the logic behind it, also because other libraries (like OpenBabel) behave > differently. > > Why sometimes RDKit writes hydrogens explicitly? > > mol = rdkit.MolFromSmiles('CCS=O', sanitize=False) > rdkit.MolToSmiles(mol) > 'CC[SH]=O' > > The input SMILES is intended as a pattern, not a molecule. I make a mol > out of it only to get the canonical SMILES, that will be then used as > SMARTS. > Logically, I don't understand how the number of H attached to the S can be > "guessed" by the library, still it cannot be left implicit. > > Furthermore, I have seen this behaviour only with S and P. I was wondering > if it's a confined issue, or it can happen with any element. > Thank you > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Thomas <odi...@gm...> - 2023-04-29 10:17:47
|
I am not a chemist, so it can be a silly question, but I am interested in the logic behind it, also because other libraries (like OpenBabel) behave differently. Why sometimes RDKit writes hydrogens explicitly? mol = rdkit.MolFromSmiles('CCS=O', sanitize=False) rdkit.MolToSmiles(mol) 'CC[SH]=O' The input SMILES is intended as a pattern, not a molecule. I make a mol out of it only to get the canonical SMILES, that will be then used as SMARTS. Logically, I don't understand how the number of H attached to the S can be "guessed" by the library, still it cannot be left implicit. Furthermore, I have seen this behaviour only with S and P. I was wondering if it's a confined issue, or it can happen with any element. Thank you |
From: Greg L. <gre...@gm...> - 2023-04-28 15:21:48
|
Hi Susan, The RDKit does not currently support SCSR. -greg On Fri, 28 Apr 2023 at 15:07, Susan Leung <sus...@gm...> wrote: > Hi all, > > I am trying to read in some Self-Contained Sequence Representation (SCSR) > structures > https://doi.org/10.1021/ci2001988 > > But I am encountering some issues. I just wanted to clarify, does RDKit > support this representation? > > Many thanks! > > Susan > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Susan L. <sus...@gm...> - 2023-04-28 13:04:52
|
Hi all, I am trying to read in some Self-Contained Sequence Representation (SCSR) structures https://doi.org/10.1021/ci2001988 But I am encountering some issues. I just wanted to clarify, does RDKit support this representation? Many thanks! Susan |
From: Francois B. <ml...@li...> - 2023-04-26 08:48:42
|
Dear rdkiters, Is it possible to list all the torsion angles UFF parameters around single bonds out of rings (rotatable bonds) for a given molecule ? From what I found in the rdkit doc, it is (only?) possible to extract the Vjk value for four consecutive atoms indexed i j k l. But, Vjk is just one parameter (the torsion barrier in kcal/mol): for each torsion angle UFF also defines the multiplicity of the barrier (n_jk, an integer) and phi0 (the angle in degrees at which the barrier is 0), if I understand correctly. I am reading carefully the DREIDING and UFF papers, but I am not (yet?) sure I will be able to get that correctly. So, since rdkit has an UFF implementation, I wonder if it would not be safer to have just rdkit list for me all those torsions parameters for the molecule at hand. If rdkit cannot do that, I might post later a tentative solution so that another pair of eyes might tell me if I got this correctly. Regards, F. |
From: Lauren R. <lau...@me...> - 2023-04-25 09:26:48
|
Hi RDKit community, I’m using RDKit (version 2020.09.4) to read and manipulate CXSMILES with enhanced stereochemistry group definitions. I’d like to read in a CXSMILES with “OR” stereo groups and to change the “OR” to “absolute” within the graph mol, before outputting the new CXSMILES. I have found the methods: RWMol.SetStereoGroups and rdchem.CreateStereoGroups, which I assume should provide the functionality I need, however I’m struggling to use them. Are any of the following options in the example code possible? And if so, please can someone advise how to implement them? Example code: cxsmiles = 'Cl[C@@H](F)(Br) |o1:1|' mol = Chem.MolFromSmiles(cxsmiles) editable_mol = Chem.RWMol(mol) orig_stereo_grps = editable_mol.GetStereoGroups() # option 1 - can I change the stereo group type from STEREO_OR to STEREO_ABSOLUTE by editing an the StereoGroup object itself? for grp in orig_stereo_grps: print([atom.GetIdx() for atom in grp.GetAtoms()]) print(grp.GetGroupType()) # option 2 - can I construct a StereoGroup_vect object with the STEREO_ABSOLUTE type and feed this into the SetStereoGroups command? stereo_grps = Chem.StereoGroup_vect() editable_mol.SetStereoGroups(list(stereo_grps)) # option 3 - can I use Chem.CreateStereoGroup to change the stereo group type? Chem.CreateStereoGroup(Chem.StereoGroupType.STEREO_ABSOLUTE, editable_mol, [1]) print(Chem.MolToCXSmiles(editable_mol)) # no change seen in output Thanks in advance for any help, Lauren Dr Lauren Reid Computational Chemist / Developer MedChemica Ltd Medchemica Ltd is a company registered in England and Wales with company number 8162245 |
From: Santiago F. <san...@me...> - 2023-04-21 09:15:04
|
Good morning I am trying to generate a molfile from smiles, using the RDKit C++ implementation. But in some cases the result molfile is like the one in the attached image. My code is something like this: string molecule = "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); mol->updatePropertyCache(false); RDDepict::preferCoordGen = true; RDDepict::compute2DCoords(*mol); string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) How could I fix the molfile? Regards Santiago |
From: <303...@qq...> - 2023-04-13 03:19:32
|
Hello Greg, If this is not the first time you have seen this email, I'm sorry, please ignore it. Due to network issues, I have tried sending several times. And I found that when sending attachments, the email status was always in the process of being delivered, so I simplified most of the content into emails. I am from the TCMSP database( https://old.tcmsp-e.com/tcmsp.php )I downloaded the Mol2 file of Ingredient and used OpenBabel to batch convert it to. miles files. Then, I used Python to read the molecular formula and RDKit to standardize it. The error message is shown in Fig1. After RDKit reported an error, I was able to open the file "MOL000001. mol2" using BIOVIA Discovery Studio 2020, as shown in Fig2. Its information is as follows: >ID:MOL000001 >Name:anthocyanidin >CAS:84082-34-8 In the TCMSP database, I can obtain the structural formula of the corresponding molecule, as shown in Figure 3. The code in this error is in Fig4: >import pandas as pd >import numpy as np >import rdkit.Chem as Chem >from rdkit.Chem.MolStandardize import rdMolStandardize >TCMSP_ingredients=pd.read_csv('./Network Pharmacology/TCMSP-Spider-main/data/sample_data/ingredients_data.csv',encoding='gb18030') >TCMSP_ingredients >def Standardize(ingredient_id): >>mol=Chem.MolFromMol2File('./Network Pharmacology/TCMSP_MOL/{}.mol2'.format(ingredient_id)) >># removeHs, disconnect metal atoms, normalize the molecule, reionize the molecule >>clean_mol = rdMolStandardize.Cleanup(mol) >># if many fragments, get the "parent" (the actual mol we are interested in) >>parent_clean_mol = rdMolStandardize.FragmentParent(clean_mol) >># try to neutralize molecule >>uncharger = rdMolStandardize.Uncharger() # annoying, but necessary as no convenience method exists >>uncharged_parent_clean_mol = uncharger.uncharge(parent_clean_mol) >># note that no attempt is made at reionization at this step >># nor at ionization at some pH (rdkit has no pKa caculator) >># the main aim to represent all molecules from different sources >># in a (single) standard way, for use in ML, catalogue, etc. >>te = rdMolStandardize.TautomerEnumerator() # idem >>taut_uncharged_parent_clean_mol = te.Canonicalize(uncharged_parent_clean_mol) >>return Chem.MolToSmiles(taut_uncharged_parent_clean_mol) >def read_smiles(path,ingredient_id): >>with open (path+'{}.smiles'.format(ingredient_id)) as file: >>>smiles='' >>>for line in file: >>>>line=line.replace('\n','') >>>>line_list=line.split(' ') >>>>smiles=line_list[0] >>>>return smiles #there is only one line in smiles file. >def smiles_Standardize(smiles): >>mol=Chem.MolFromSmiles(smiles) >>smiles=Chem.MolToSmiles(mol,isomericSmiles=True,canonical=True) >>return smiles >ingredient_ids=TCMSP_ingredients.iloc[:,0].tolist() >SMILES=[] >i=-1 >path='./Network Pharmacology/TCMSP_MOL/' >for ingredient_id in ingredient_ids: >>i+=1 >>print(ingredient_id) >>print(i) >># smiles=Standardize(ingredient_id) >>smiles=read_smiles(path,ingredient_id) >>smiles=smiles_Standardize(smiles) >>SMILES.append(smiles) In the final code block, I commented out the sentence 'smiles=Standardize (ingredient_id)' and used the “smiles=read_smiles(path,ingredient_id) smiles=smiles_Standardize (smiles)". Then I changed the processing format for the same data and used RDKit for processing until it stopped at "MOL000107". The error message is shown in Fig5. The code in this error is in Fig4: In the final code block, I commented out the sentence 'smiles=read_smiles(path,ingredient_id) smiles=smiles_ Standardize (smiles)' and used the “smiles=Standardize (ingredient_id)". I can obtain MOL000107 related information from TCMSP, as shown in Fig6 Its information is as follows: >ID:MOL000107 >Name:quercertin,3-o-beta-d-glucopyranoside >CAS:482-35-9 At the same time, I attempted to use the >try: >except: Ignoring the structure and reporting errors, RDKit reported errors more than 800 times in over 14000 TCMSP indexed Ingredints And, I can obtain the standardized SMILES structural formula of MOL000001 from RDKit: [O-]c1cc2c(O)cc(O)cc2[o+]c1-c1c[c][c][c]c1 The structural formula of MOL000107 is shown in Fig7 This process seems strange, and there should be some degree of inconsistency between the two functions of RDKit. At the same time, I also used the standardization process described above to parse the chemical structural formula of the ChEMBL(https://www.ebi.ac.uk/chembl/) database. After processing hundreds of thousands of molecules, it encountered an error, as shown in Figure 8. I obtained a download file from its official website to standardize the compounds in the "compound_structures" table one by one. After processing hundreds of thousands of molecules, an error occurred. However, I'm sorry that I didn't keep the error message and running it again would take a lot of time. But I remember the error type as it appeared before, either 'C++Signature' or 'NoneType Has no attribute Getatoms' or' Molecule is None '. -Best, Wang Jialuo 王家洛 WangJiaLuo 地址:辽宁省本溪市溪湖区经济技术开发区红柳路85号沈阳药科大学南校区 邮编:117004 Address:Shenyang Pharmaceutical University, 85 Hongliu Rd., Benxi City, Liaoning Province, 117004, P.R.China |
From: Santiago F. <san...@me...> - 2023-04-11 11:25:13
|
Dear Paolo It is a requirement of some users to maintain the D,T labels in the molfiles, but I will check if it could be skipped. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Paolo Tosco <pao...@gm...> Enviado: martes, 11 de abril de 2023 12:17 Para: Santiago Fraga <san...@me...> Cc: rdk...@li... <rdk...@li...> Asunto: Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile Dear Santiago, Using D and T symbols for deuterium and tritium in MDL molfiles is outside the file format specification. Nonetheless, RDKit correctly parses those non-standard D and T symbols when reading an MDL molfile that contains them, as you can verify yourself through a simple test and also looking at the source code: <https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L1506> [rdkit.png] rdkit/MolFileParser.cpp at 36c4ec9e2ba4f5edba39f452cb7458230d9d99bc · rdkit/rdkit<https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L1506> github.com<https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L1506> https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L2179 However, when writing the molfile, RDKit will write it according to specifications, i.e. using the H symbol and adding a “M ISO” entry. Any MDL molfile parser should be able to correctly parse such a file. ChemDraw will even automatically label the atoms as D and T, while MarvinJS will add the “2” and “3” superscript prefixes. To me, it seems a bit overkill to add a flag to preserve non-standard features in MDL molfile writing. Why would you be interested in doing that? Cheers, p. On 11 Apr 2023, at 11:50, Santiago Fraga <san...@me...> wrote: Many thanks for your examples, Wim. But I was checking the option to save the labels D and T in the molfile for the hydrogen isotopes, as other tools can do. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Wim Dehaen <wim...@gm...> Enviado: lunes, 10 de abril de 2023 18:07 Para: Santiago Fraga <san...@me...> Cc: rdk...@li... <rdk...@li...> Asunto: Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile rdkit outputs a molfile with correct isotope labels for me using just: mol=Chem.MolFromSmiles("[3H]c1ccccc1[2H]") Chem.MolToMolFile(mol,"test.mol") or labelling the atoms post hoc: mol=Chem.MolFromSmiles("c1ccccc1") mol=Chem.AddHs(mol) mol.GetAtomWithIdx(6).SetIsotope(3) mol.GetAtomWithIdx(7).SetIsotope(2) mol=Chem.RemoveHs(mol) Chem.MolToMolFile(mol,"test2.mol") I hope this helps best wishes wim On Mon, Apr 10, 2023 at 4:43 PM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote: Good afternoon! I am a relatively new user of RDKit, and mainly the C++ API. I am trying to save in a molfile the labels D and T for the hydrogen isotopes. Like in the following molfile: MJ230401 8 8 0 0 0 0 0 0 0 0999 V2000 -0.3572 0.4125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0716 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0716 -0.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3572 -1.2375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.3572 -0.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.3572 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3572 1.2375 0.0000 T 0 0 0 0 0 0 0 0 0 0 0 0 1.0717 0.4125 0.0000 D 0 0 0 0 0 0 0 0 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 1 1 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 6 8 1 0 0 0 0 1 7 1 0 0 0 0 M END I am trying to set directly the labels in the hydrogen atoms: atom->setProp<string>("atomLabel", "D"); or atom->setProp<string>("_displayLabel", "D"); But when the molfile is generated the labels are not transferred. It seems also that when reading a mofile including the labels, they are discarded. Many thanks in advance Santiago Fraga _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdk...@li... https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Santiago F. <san...@me...> - 2023-04-11 11:06:34
|
Many thanks again, Wim I was just moving in that direction, modifying directly the resulting molfile. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Wim Dehaen <wim...@gm...> Enviado: martes, 11 de abril de 2023 11:25 Para: Santiago Fraga <san...@me...> Cc: rdk...@li... <rdk...@li...> Asunto: Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile Sorry for not reading your question properly. I am personally not aware of a way to export molfiles in this way in rdkit, but I might just be unaware. I think the easiest solution would be probably changing the molblock string post hoc by reading the M ISO line. for example like this in python: ``` mol=Chem.MolFromSmiles("c1c2cccc1[3H].[2H]2") def MolToMolfileDT(mol,path): mb=Chem.MolToMolBlock(mol).split("\n") iso=[x for x in mb[-3].split(" ") if len(x)>0] if iso[1]=="ISO": #check if theres isotope info for i in range(int(iso[2])): isotope=int(iso[4+2*i]) idx=int(iso[3+2*i])+3 if isotope in [2,3]: #only D and T mb[idx]=mb[idx].replace("H",{2:"D",3:"T"}[isotope]) #replace only H (to not have issues like [3Li]) with open(path, "w") as molfile: molfile.write("\n".join(mb)) return MolToMolfileDT(mol,"mol.mol")``` this returns: RDKit 2D 8 8 0 0 0 0 0 0 0 0999 V2000 1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7500 -1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.7500 -1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.7500 1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7500 1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.5000 2.5981 0.0000 T 0 0 0 0 0 0 0 0 0 0 0 0 1.5000 -2.5981 0.0000 D 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 2 3 1 0 3 4 2 0 4 5 1 0 5 6 2 0 6 7 1 0 6 1 1 0 8 2 1 0 M ISO 2 7 3 8 2 M END best wishes wim On Tue, Apr 11, 2023 at 9:14 AM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote: Many thanks for your examples, Wim. But I was checking the option to save the labels D and T in the molfile for the hydrogen isotopes, as other tools can do. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:+sa...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Wim Dehaen <wim...@gm...<mailto:wim...@gm...>> Enviado: lunes, 10 de abril de 2023 18:07 Para: Santiago Fraga <san...@me...<mailto:san...@me...>> Cc: rdk...@li...<mailto:rdk...@li...> <rdk...@li...<mailto:rdk...@li...>> Asunto: Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile rdkit outputs a molfile with correct isotope labels for me using just: mol=Chem.MolFromSmiles("[3H]c1ccccc1[2H]") Chem.MolToMolFile(mol,"test.mol") or labelling the atoms post hoc: mol=Chem.MolFromSmiles("c1ccccc1") mol=Chem.AddHs(mol) mol.GetAtomWithIdx(6).SetIsotope(3) mol.GetAtomWithIdx(7).SetIsotope(2) mol=Chem.RemoveHs(mol) Chem.MolToMolFile(mol,"test2.mol") I hope this helps best wishes wim On Mon, Apr 10, 2023 at 4:43 PM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote: Good afternoon! I am a relatively new user of RDKit, and mainly the C++ API. I am trying to save in a molfile the labels D and T for the hydrogen isotopes. Like in the following molfile: MJ230401 8 8 0 0 0 0 0 0 0 0999 V2000 -0.3572 0.4125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0716 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0716 -0.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3572 -1.2375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.3572 -0.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.3572 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3572 1.2375 0.0000 T 0 0 0 0 0 0 0 0 0 0 0 0 1.0717 0.4125 0.0000 D 0 0 0 0 0 0 0 0 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 1 1 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 6 8 1 0 0 0 0 1 7 1 0 0 0 0 M END I am trying to set directly the labels in the hydrogen atoms: atom->setProp<string>("atomLabel", "D"); or atom->setProp<string>("_displayLabel", "D"); But when the molfile is generated the labels are not transferred. It seems also that when reading a mofile including the labels, they are discarded. Many thanks in advance Santiago Fraga _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Paolo T. <pao...@gm...> - 2023-04-11 10:18:01
|
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"><meta http-equiv="content-type" content="text/html; charset=utf-8"><div dir="ltr"><meta http-equiv="content-type" content="text/html; charset=utf-8"><div dir="ltr"></div><div dir="ltr">Dear Santiago,</div><div dir="ltr"><br></div><div dir="ltr">Using D and T symbols for deuterium and tritium in MDL molfiles is outside the file format specification.</div><div dir="ltr">Nonetheless, RDKit correctly parses those non-standard D and T symbols when reading an MDL molfile that contains them, as you can verify yourself through a simple test and also looking at the source code:</div><div dir="ltr"><br></div><div dir="ltr"><div style="display: block;" class=""><div style="-webkit-user-select: all; -webkit-user-drag: element; display: inline-block;" class="apple-rich-link" draggable="true" role="link" data-url="https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L1506"><a style="border-radius:10px;font-family:-apple-system, Helvetica, Arial, sans-serif;display:block;-webkit-user-select:none;width:300px;user-select:none;-webkit-user-modify:read-only;user-modify:read-only;overflow:hidden;text-decoration:none;" class="lp-rich-link" rel="nofollow" href="https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L1506" dir="ltr" role="button" draggable="false" width="300"><table style="table-layout:fixed;border-collapse:collapse;width:300px;background-color:#E9E9EB;font-family:-apple-system, Helvetica, Arial, sans-serif;" class="lp-rich-link-emailBaseTable" cellpadding="0" cellspacing="0" border="0" width="300"><tbody><tr><td vertical-align="center" align="center"><img alt="rdkit.png" src="cid:3A69F5D6-A23C-45E7-94B0-81441EE89E36"></td></tr><tr><td vertical-align="center"><table bgcolor="#E9E9EB" cellpadding="0" cellspacing="0" width="300" style="font-family:-apple-system, Helvetica, Arial, sans-serif;table-layout:fixed;background-color:rgba(233, 233, 235, 1);" class="lp-rich-link-captionBar"><tbody><tr><td style="padding:8px 0px 8px 0px;" class="lp-rich-link-captionBar-textStackItem"><div style="max-width:100%;margin:0px 16px 0px 16px;overflow:hidden;" class="lp-rich-link-captionBar-textStack"><div style="word-wrap:break-word;font-weight:500;font-size:12px;overflow:hidden;text-overflow:ellipsis;text-align:left;" class="lp-rich-link-captionBar-textStack-topCaption-leading"><a rel="nofollow" href="https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L1506" style="text-decoration: none" draggable="false"><font color="#000000" style="color: rgba(0, 0, 0, 1);">rdkit/MolFileParser.cpp at 36c4ec9e2ba4f5edba39f452cb7458230d9d99bc · rdkit/rdkit</font></a></div><div style="word-wrap:break-word;font-weight:400;font-size:11px;overflow:hidden;text-overflow:ellipsis;text-align:left;" class="lp-rich-link-captionBar-textStack-bottomCaption-leading"><a rel="nofollow" href="https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L1506" style="text-decoration: none" draggable="false"><font color="#A2A2A9" style="color: rgba(60, 60, 67, 0.6);">github.com</font></a></div></div></td></tr></tbody></table></td></tr></tbody></table></a></div></div><a href="https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L2179">https://github.com/rdkit/rdkit/blob/36c4ec9e2ba4f5edba39f452cb7458230d9d99bc/Code/GraphMol/FileParsers/MolFileParser.cpp#L2179</a></div><div dir="ltr"><br></div><div dir="ltr">However, when writing the molfile, RDKit will write it according to specifications, i.e. using the H symbol and adding a “M ISO” entry. Any MDL molfile parser should be able to correctly parse such a file. ChemDraw will even automatically label the atoms as D and T, while MarvinJS will add the “2” and “3” superscript prefixes.</div><div dir="ltr"><br></div><div dir="ltr">To me, it seems a bit overkill to add a flag to preserve non-standard features in MDL molfile writing. Why would you be interested in doing that?</div><div dir="ltr"><br></div><div dir="ltr">Cheers,</div><div dir="ltr">p.</div><div dir="ltr"><br><blockquote type="cite">On 11 Apr 2023, at 11:50, Santiago Fraga <san...@me...> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof"> Many thanks for your examples, Wim.</div> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof"> But I was checking the option to save the labels D and T in the molfile for the hydrogen isotopes,</div> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof"> as other tools can do.</div> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof"> <br> </div> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof"> Regards</div> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof"> Santiago</div> <div class="elementToProof"> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br> </div> <div id="Signature"> <div> <div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <img width="800" height="40" alt="" style="font-family:"Times New Roman";font-size:medium;text-align:start" src="http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg" data-unique-identifier=""><span style="font-family:"Times New Roman";font-size:medium;text-align:start;display:inline !important"></span> <table width="810" style="font-size:medium;text-align:start;font-family:Segoe, "Segoe UI", "DejaVu Sans", "Trebuchet MS", Verdana, sans-serif"> <tbody> <tr> <td width="120" valign="top" style="text-align:left"><a href="http://www.mestrelab.com" target="_blank"><img width="120" height="132" alt="" src="http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg" data-unique-identifier=""></a></td> <td width="185" valign="top" style="text-align:left"> <p><span style="font-size: 18px; color: rgb(129, 130, 131);">SANTIAGO FRAGA</span><br> <span style="font-size: 14px; color: rgb(148, 193, 31);"><strong><em>Software Developer</em></strong></span><br> <a href="mailto: san...@me..." style="font-size: 14px; color: rgb(132, 132, 132);">san...@me...</a></p> </td> <td width="213" valign="top" style="text-align:left"> <p style="font-size: 12px; color: rgb(129, 130, 131);"><strong><em>MESTRELAB RESEARCH S.L.</em></strong><br> <span style="font-size:11px"><span style="color: rgb(148, 193, 31);">PHONE<span> </span></span><em>+34881976775</em><span style="color: rgb(148, 193, 31);"><br> FAX<span> </span></span><em>+34981941079</em><br> Feliciano Barrera, 9B-Bajo 15706<br> Santiago de Compostela (SPAIN)</span></p> </td> <td width="202" valign="top" style="text-align:left"> <p style="font-size: 12px; color: rgb(129, 130, 131);">Follow us:<br> <a href="https://twitter.com/mestrelab" target="_blank"><img alt="Mestrelab Twitter" src="http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/twitter-30px.jpg" data-unique-identifier=""></a><span> </span> <a href="https://www.linkedin.com/company/mestrelab-research" target="_blank"><img alt="Mestrelab Linkedin" src="http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/linkedin-30px.jpg" data-unique-identifier=""></a><span> </span> <a href="https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww"><img alt="Canal de YouTube Mestrelab" src="http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/youtube-30px.jpg" data-unique-identifier=""></a><span> </span> <a href="http://mestrelab.com/blog/" target="_blank"><img alt="MestreBlog" src="http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/blog-mestrelab-30px.jpg" data-unique-identifier=""></a></p> <p> </p> </td> </tr> </tbody> </table> <br> </div> </div> </div> </div> <div id="appendonsend"></div> <hr style="display:inline-block;width:98%" tabindex="-1"> <div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>De:</b> Wim Dehaen <wim...@gm...><br> <b>Enviado:</b> lunes, 10 de abril de 2023 18:07<br> <b>Para:</b> Santiago Fraga <san...@me...><br> <b>Cc:</b> rdk...@li... <rdk...@li...><br> <b>Asunto:</b> Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile</font> <div> </div> </div> <div> <div dir="ltr"> <div>rdkit outputs a molfile with correct isotope labels for me using just:</div> <div><br> </div> <div>mol=Chem.MolFromSmiles("[3H]c1ccccc1[2H]")<br> Chem.MolToMolFile(mol,"test.mol")</div> <div><br> </div> <div>or labelling the atoms post hoc:</div> <div><br> </div> <div>mol=Chem.MolFromSmiles("c1ccccc1")<br> mol=Chem.AddHs(mol)<br> mol.GetAtomWithIdx(6).SetIsotope(3)<br> mol.GetAtomWithIdx(7).SetIsotope(2)<br> mol=Chem.RemoveHs(mol)<br> Chem.MolToMolFile(mol,"test2.mol")</div> <div><br> </div> <div>I hope this helps<br> </div> <div><br> </div> <div> <div>best wishes</div> <div>wim<br> </div> <div><br> </div> </div> </div> <br> <div class="x_gmail_quote"> <div dir="ltr" class="x_gmail_attr">On Mon, Apr 10, 2023 at 4:43 PM Santiago Fraga <<a href="mailto:san...@me...">san...@me...</a>> wrote:<br> </div> <blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex"> <div class="x_msg5693932737842893224"> <div dir="ltr"> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> Good afternoon!</div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> <br> </div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> I am a relatively new user of RDKit, and mainly the C++ API.</div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> <br> </div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> I am trying to save in a molfile the labels D and T for the hydrogen isotopes.</div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> Like in the following molfile:</div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> <br> </div> <blockquote style="margin-top:0px; margin-bottom:0px"> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> <div> MJ230401 </div> <div><br> </div> <div> 8 8 0 0 0 0 0 0 0 0999 V2000</div> <div> -0.3572 0.4125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0</div> <div> -1.0716 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0</div> <div> -1.0716 -0.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0</div> <div> -0.3572 -1.2375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0</div> <div> 0.3572 -0.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0</div> <div> 0.3572 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0</div> <div> -0.3572 1.2375 0.0000 T 0 0 0 0 0 0 0 0 0 0 0 0</div> <div> 1.0717 0.4125 0.0000 D 0 0 0 0 0 0 0 0 0 0 0 0</div> <div> 3 4 2 0 0 0 0</div> <div> 4 5 1 0 0 0 0</div> <div> 5 6 2 0 0 0 0</div> <div> 6 1 1 0 0 0 0</div> <div> 1 2 2 0 0 0 0</div> <div> 2 3 1 0 0 0 0</div> <div> 6 8 1 0 0 0 0</div> <div> 1 7 1 0 0 0 0</div> M END</div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> <br> </div> </blockquote> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> I am trying to set directly the labels in the hydrogen atoms:</div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> <br> </div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> atom->setProp<string>("atomLabel", "D");</div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> or<br> <div> atom->setProp<string>("_displayLabel", "D");</div> <div><br> </div> <div> But when the molfile is generated the labels are not transferred.</div> <div> It seems also that when reading a mofile including the labels, they are discarded.</div> <div><br> </div> <div><br> </div> </div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> Many thanks in advance</div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)"> Santiago Fraga</div> <div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)"> <br> </div> <div id="x_m_5693932737842893224Signature"> <div> <div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)"> <br> </div> </div> </div> </div> </div> _______________________________________________<br> Rdkit-discuss mailing list<br> <a href="mailto:Rdk...@li..." target="_blank">Rdk...@li...</a><br> <a href="https://lists.sourceforge.net/lists/listinfo/rdkit-discuss" rel="noreferrer" target="_blank">https://lists.sourceforge.net/lists/listinfo/rdkit-discuss</a><br> </div> </blockquote> </div> </div> <span>_______________________________________________</span><br><span>Rdkit-discuss mailing list</span><br><span>Rdk...@li...</span><br><span>https://lists.sourceforge.net/lists/listinfo/rdkit-discuss</span><br></div></blockquote></div></div></body></html> |
From: Jan H. J. <jhj...@ch...> - 2023-04-11 09:53:50
|
Hi Gustavo raw_mol = Chem.MolFromXYZFile('acetate.xyz<http://acetate.xyz/>') mol = Chem.Mol(raw_mol) rdDetermineBonds.DetermineBonds(mol,charge=-1) Best regards, Jan On 7 Apr 2023, at 22.57, Gustavo Seabra <gus...@gm...<mailto:gus...@gm...>> wrote: Hi everyone, I'm having difficulties using RDKit to read molecules from an XYZ file, and I would really appreciate some help. The problem is that whenever i read a molecule from an XYZ file, I get just a disconnected clump of atoms, not a molecule. For example: the following code: import rdkit from rdkit import Chem from rdkit.Chem import Draw, rdmolfiles mol = Chem.MolFromSmiles('COC1=C(O)C[C@@](O)(CO)CC1=O') mol = Chem.AddHs(mol) mol <image.png> Chem.AllChem.EmbedMolecule(mol) Chem.MolToXYZFile(mol, "rdkit_mol.xyz<http://rdkit_mol.xyz/>") mol2 = Chem.MolFromXYZFile('rdkit_mol.xyz<http://rdkit_mol.xyz/>') mol2 <image.png> Is there a bug on the XYZ code, or am I missing something? Thanks! -- Gustavo Seabra. _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Santiago F. <san...@me...> - 2023-04-11 09:48:24
|
Many thanks for your examples, Wim. But I was checking the option to save the labels D and T in the molfile for the hydrogen isotopes, as other tools can do. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Wim Dehaen <wim...@gm...> Enviado: lunes, 10 de abril de 2023 18:07 Para: Santiago Fraga <san...@me...> Cc: rdk...@li... <rdk...@li...> Asunto: Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile rdkit outputs a molfile with correct isotope labels for me using just: mol=Chem.MolFromSmiles("[3H]c1ccccc1[2H]") Chem.MolToMolFile(mol,"test.mol") or labelling the atoms post hoc: mol=Chem.MolFromSmiles("c1ccccc1") mol=Chem.AddHs(mol) mol.GetAtomWithIdx(6).SetIsotope(3) mol.GetAtomWithIdx(7).SetIsotope(2) mol=Chem.RemoveHs(mol) Chem.MolToMolFile(mol,"test2.mol") I hope this helps best wishes wim On Mon, Apr 10, 2023 at 4:43 PM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote: Good afternoon! I am a relatively new user of RDKit, and mainly the C++ API. I am trying to save in a molfile the labels D and T for the hydrogen isotopes. Like in the following molfile: MJ230401 8 8 0 0 0 0 0 0 0 0999 V2000 -0.3572 0.4125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0716 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0716 -0.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3572 -1.2375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.3572 -0.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.3572 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3572 1.2375 0.0000 T 0 0 0 0 0 0 0 0 0 0 0 0 1.0717 0.4125 0.0000 D 0 0 0 0 0 0 0 0 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 1 1 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 6 8 1 0 0 0 0 1 7 1 0 0 0 0 M END I am trying to set directly the labels in the hydrogen atoms: atom->setProp<string>("atomLabel", "D"); or atom->setProp<string>("_displayLabel", "D"); But when the molfile is generated the labels are not transferred. It seems also that when reading a mofile including the labels, they are discarded. Many thanks in advance Santiago Fraga _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |