Re: [Rdkit-discuss] Branched peptide MolFromHELM problem
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2020-03-26 10:23:50
|
Since Roger's the expert here and isn't on list, I forwarded the email to him. Just to have it on the list as well, here's his answer: On Thu, Mar 26, 2020 at 10:27 AM Roger Sayle <ro...@ne...> wrote: > > > Hi Greg (and Graham), > > > > The short answer is that RDKit currently only supports a “bioinformatics” > subset of HELM, > > i.e. the protein and nucleic acid sequences that can usefully read to and > from FASTA sequence > > files, and easily converted to or from PDB files. Actually it allows a > bit more this, including > > support for disulphide bridges, D-amino acids, and some common > non-standard amino acids > > (like phosphotyrosine and phosphoserine in kinases) but again things that > can internally be > > handled/represented using PDB residue codes > > > > HELM allows folks to do some pretty wacky stuff, in your example, > connecting an amino > > group to a lysine sidechain to create a hydrazine, and connecting a acetyl > group to the > > C-terminus of a peptide to form some weird kind of oxalate, i.e. a peptide > that ends > > -C(=O)C(=O)C. Alas, these cases and dendritic and cross-linked peptides > aren’t in the > > current RDKit supported subset of HELM. > > > > I suspect that the HELM reader in RDKit will continue to improve over > time, but this is > > mostly as a convenience, as there already exists a completely free, open > source and > > definitive reference implementation for converting HELM to SMILES, and > that’s to use > > the Pistoia Alliance’s Java toolkit (perhaps even via Jython). Hence, the > same editors > > that folks use to enter HELM, also have export as SMILES or export as Mol > options. > > Alas any significant effort to support all the corner cases and monomer > definitions of HELM > > would mostly be “reinventing the wheel” of porting this Pistoia Alliance > code from Java > > to C++, and then require the maintenance burden of keeping it up to date > as the HELM > > specification continues to evolve. > > > > Finally, for sticking arbitrary functional groups (SMILES strings) onto > the N- and C- > > terminii of peptides, Greg may have suggestions for ways of manipulating > RDKit mols > > without the need to go via HELM, which has the burden of defining custom > monomers. > > Instead, a SMIRKS or reaction SMARTS could be used, easily enumerating > libraries of > > alternately terminated peptides. > > > > Graham: If you select your HELM depiction, then right click in the > “Structure View” tab of > > the Pistoia HELM webeditor, you’ll be given the option “Copy Molfile” > which should hopefully > > be sufficient for you to work around this issue and get your modified > peptides into RDKit. > > Greg: Hopefully, this response makes sense and the relevant bits can be > pasted to > > the rdkit mailing list. I suspect there’s an interesting blog post (or > RDKit meeting talk) > > on using pyjnius or similar (perhaps even just os.system(“java …”)) to > call the Pistoia > > Java libraries from python, and then process the resulting SMILES in > RDKit. I’ve also > > heard rumours that something called KNIME is a convenient way to link > RDKit, python > > and Java. > > > > I hope this helps. Please let me know if you disagree and would prefer > branched peptides > > to be supported directly in RDKit’s HELM readers and writers. Sorry for > any inconvenience. > > > > Best regards, > > Roger > > -- > > Roger Sayle, PhD. > > CEO and founder > > NextMove Software Limited > > Registered in England No. 07588305 > > Registered Office: Innovation Centre, 320 Cambridge Science Park, > Cambridge, CB4 0WG > On Wed, Mar 25, 2020 at 3:43 PM Graham Simpson <gr...@da...> wrote: > Hi all, Please be gentle with me! To be honest I am an absolute amateur, > at python and RDKit. I'm a trained peptide chemist and trying to convert > some peptide sequences with modifications into SMILES codes. I've decided, > maybe wrongly, that the best route is via HELM codes to MOL to SMILES using > RDKit in Python - mainly since the peptides are quite modified at > C-/N-term and branched. > I'm a little embarrassed by my code but I have posted it here! > > https://github.com/grahamsimpson/peptides/blob/master/single_peptide_to_HELM.py > <https://github.com/grahamsimpson/peptides/blob/master/single_peptide_to_HELM.py> > > The issue I'm having is the RDKit MolFromHELM - Ive been using the HELM > Webeditor - http://webeditor.openhelm.org/hwe/examples/App.htm - to > validate HELM codes - but the MolFromHELM keeps throwing up an error > (resulting in none and subsequently, MolToSequence or MolToSMILES don't > work. > > Traceback (most recent call last): > > File "test_single_HELM.py", line 34, in <module> > > Sequence = Chem.MolToSequence(mol) > > Boost.Python.ArgumentError: Python argument types in > > rdkit.Chem.rdmolfiles.MolToSequence(NoneType) > > did not match C++ signature: > > MolToSequence(RDKit::ROMol mol) > > The peptides I want are in the form > - linear- Ac-ICECREAMAAICECREAMDD-NH2 > -branched - (Ac-ICECREAMAA)(Ac-ICECREAMDD)K-NH2 > and looking to get the SMILES codes. > > > PEPTIDE1{[ac]}|PEPTIDE2{[am]}|PEPTIDE3{[ac].I.C.E.C.R.E.A.M.A.A.K.D.D.M.A.E.R.C.E.C.I}$PEPTIDE1,PEPTIDE3,1:R2-22:R2|PEPTIDE2,PEPTIDE3,1:R1-12:R3$$$V2.0 > or drawn another way > > > PEPTIDE1{[ac].I.C.E.C.R.E.A.M.A.A}|PEPTIDE2{[ac].I.C.E.C.R.E.A.M.D.D.K.[am]}$PEPTIDE2,PEPTIDE1,12:R3-11:R2$$$V2.0 > > [image: image.png] > > > I would be very grateful for any help you could give me. I'm sure there > are more efficient ways of doing this with dictionaries or other > approaches. This has been bugging me for a while (but has been a great > learning experience!). > Please let me know if it there are other places I could post to find the > answer to this. > Thanks very much in advance, > > Best wishes, > > Graham > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |