Thread: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Ling C. <lin...@gm...> - 2021-03-31 19:56:14
|
Dear Colleagues, I am trying to do something that I think is quite simple, but I have not figured out a simple way. Don't know if I am missing something. I am sure that ultimately I can figure it out, but I wonder if there is a good way. I fragmented a molecule with some rules, using FragmentOnBonds. I did get a list of primary fragments. I wish to recombine pairs (and triplets, but no bigger) of these primary fragments, but only if the resulting fragment is part of the original molecule. I.e. I want to undo some of the cuttings. (FragmentOnSomeBonds does not help, since you cannot ensure that the resulting fragments consist only of pairs of primary fragments.) What is the best way to do this? The following is what I am trying. I see that you can mark the original cut points using the dummyLabels argument in FragmentOnBonds. So I converted the primary fragments to smiles. I looked for the two sides of the original cut point and substituted the two dummyLables to [2H] and [3H]. I then tried to rejoin the fragments using a reaction string "[*:1][2H].[*:2][3H]>>[*:1][*:2]". Unfortunately the ReactionFromSmarts function does not accept this string. So I'll have to use Smarts search to look for [2H] and [3H], then create an editable molecule from the two primary fragments, look for neighbours of [2H] and [3H], add a bond, then delete the atoms [2H] and [3H], then sanitize. Thank you for your ideas. Ling |
From: Francois B. <ml...@li...> - 2021-04-01 00:27:14
|
On 01/04/2021 04:55, Ling Chan wrote: > Dear Colleagues, > > I am trying to do something that I think is quite simple, but I have > not figured out a simple way. Don't know if I am missing something. I > am sure that ultimately I can figure it out, but I wonder if there is > a good way. > > I fragmented a molecule with some rules, using FragmentOnBonds. I did > get a list of primary fragments. I have an ad hoc fragmenting scheme and fragment assembly implemented in there: https://github.com/UnixJunkie/molenc/blob/master/bin/molenc_smisur.py Sorry, but this is non trivial code. Look for the function bind_molecules to connect two fragments. The rdkit python doc might have some simpler examples, using well-known/published fragmenting schemes (BRICS or Recap): http://www.rdkit.org/docs/GettingStartedInPython.html Regards, F. > I wish to recombine pairs (and triplets, but no bigger) of these > primary fragments, but only if the resulting fragment is part of the > original molecule. I.e. I want to undo some of the cuttings. > (FragmentOnSomeBonds does not help, since you cannot ensure that the > resulting fragments consist only of pairs of primary fragments.) > > What is the best way to do this? The following is what I am trying. > > I see that you can mark the original cut points using the dummyLabels > argument in FragmentOnBonds. So I converted the primary fragments to > smiles. I looked for the two sides of the original cut point and > substituted the two dummyLables to [2H] and [3H]. I then tried to > rejoin the fragments using a reaction string > "[*:1][2H].[*:2][3H]>>[*:1][*:2]". Unfortunately the > ReactionFromSmarts function does not accept this string. So I'll have > to use Smarts search to look for [2H] and [3H], then create an > editable molecule from the two primary fragments, look for neighbours > of [2H] and [3H], add a bond, then delete the atoms [2H] and [3H], > then sanitize. > > Thank you for your ideas. > > Ling > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Mark M. <ma...@cr...> - 2021-04-01 08:58:50
|
Hi Ling, Having done something similar (but not in RDKit), I would suggest a different algorithm. I think that fragmenting the molecule first and then stitching the bits together is always going to be very complicated. Instead, just fragment the molecule in the ways that you want: - Find the set B of all breakable bonds according to your rules. I’m assuming here that B contains only acyclic bonds. - To get all of the pairwise pieces, for each element b of B break all bonds in B _except_ b. Keep the fragment containing b, and clean up. - To get all of the triplets, for each tuple (b1, b2) in B, break all bonds in B except b1 and b2. Keep the fragment containing b1 only if it also contains b2. Regards, Mark -- Mark Mackey Chief Scientific Officer Cresset New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 853667 email: ma...@cr...<mailto:ma...@cr...> web: www.cresset-group.com<http://www.cresset-group.com/> skype: mark_cresset From: Ling Chan <lin...@gm...> Sent: 31 March 2021 20:56 To: RDKit <rdk...@li...> Subject: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule Dear Colleagues, I am trying to do something that I think is quite simple, but I have not figured out a simple way. Don't know if I am missing something. I am sure that ultimately I can figure it out, but I wonder if there is a good way. I fragmented a molecule with some rules, using FragmentOnBonds. I did get a list of primary fragments. I wish to recombine pairs (and triplets, but no bigger) of these primary fragments, but only if the resulting fragment is part of the original molecule. I.e. I want to undo some of the cuttings. (FragmentOnSomeBonds does not help, since you cannot ensure that the resulting fragments consist only of pairs of primary fragments.) What is the best way to do this? The following is what I am trying. I see that you can mark the original cut points using the dummyLabels argument in FragmentOnBonds. So I converted the primary fragments to smiles. I looked for the two sides of the original cut point and substituted the two dummyLables to [2H] and [3H]. I then tried to rejoin the fragments using a reaction string "[*:1][2H].[*:2][3H]>>[*:1][*:2]". Unfortunately the ReactionFromSmarts function does not accept this string. So I'll have to use Smarts search to look for [2H] and [3H], then create an editable molecule from the two primary fragments, look for neighbours of [2H] and [3H], add a bond, then delete the atoms [2H] and [3H], then sanitize. Thank you for your ideas. Ling This email has been sent from Cresset BioMolecular Discovery Limited, registered in England and Wales, Company Number: 04151475. The information in this email and any attachments are confidential and may be privileged. It is intended solely for the addressee and access to this email by anyone else is unauthorised. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you must not use, disclose, distribute, store or copy the information in any medium. Although this e-mail and any attachments are believed to be free from any virus or other defect which might affect any system into which they are opened or received, it is the responsibility of the recipient to check that they are virus-free and that they will in no way affect systems and data. No responsibility is accepted by Cresset BioMolecular Discovery Limited for any loss or damage arising in any way from their receipt, opening or use. Privacy notice<https://www.cresset-group.com/privacy/> |
From: Ling C. <lin...@gm...> - 2021-04-01 23:11:33
|
Thank you Mark for your suggestion. It sounds good and I gave it a try. However, this leads to another question that may sound dumb. I have the atom indices of a fragment. For example, the fragment comes from atoms [3,4,5,9,10,11,14] of the original molecule. How can I extract this fragment from the molecule? I tried (1) using EditableMol and deleting atoms one by one using RemoveAtom. But this does not work since the atom numbering changes after each deletion. (2) going through FragmentOnBonds. But the output of FragmentOnBonds have the atom indices reshuffled so I cannot directly use my index list to fish out my fragment. I did manage to achieve what I want by going through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. But is there a neater way? Basically, is there a Chem.MolFragmentToMol function? Thank you again. Ling Mark Mackey <ma...@cr...> 於 2021年4月1日週四 上午1:58寫道: > Hi Ling, > > > > Having done something similar (but not in RDKit), I would suggest a > different algorithm. I think that fragmenting the molecule first and then > stitching the bits together is always going to be very complicated. > Instead, just fragment the molecule in the ways that you want: > > > > - Find the set B of all breakable bonds according to your rules. I’m > assuming here that B contains only acyclic bonds. > > - To get all of the pairwise pieces, for each element b of B break all > bonds in B _except_ b. Keep the fragment containing b, and clean up. > > - To get all of the triplets, for each tuple (b1, b2) in B, break all > bonds in B except b1 and b2. Keep the fragment containing b1 only if it > also contains b2. > > > > Regards, > > Mark > > > > *-- * > > *Mark Mackey* > > *Chief Scientific Officer* > > *Cresset* > > New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 > 0SS, UK > > tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 > 853667 > > email: *ma...@cr... <ma...@cr...>* web: > www.cresset-group.com skype: mark_cresset > > > > > > > > *From:* Ling Chan <lin...@gm...> > *Sent:* 31 March 2021 20:56 > *To:* RDKit <rdk...@li...> > *Subject:* [Rdkit-discuss] rejoining pairs of fragments after fragmenting > a molecule > > > > Dear Colleagues, > > I am trying to do something that I think is quite simple, but I have not > figured out a simple way. Don't know if I am missing something. I am sure > that ultimately I can figure it out, but I wonder if there is a good way. > > I fragmented a molecule with some rules, using FragmentOnBonds. I did get > a list of primary fragments. > > I wish to recombine pairs (and triplets, but no bigger) of these primary > fragments, but only if the resulting fragment is part of the original > molecule. I.e. I want to undo some of the cuttings. (FragmentOnSomeBonds > does not help, since you cannot ensure that the resulting fragments consist > only of pairs of primary fragments.) > > What is the best way to do this? The following is what I am trying. > > I see that you can mark the original cut points using the dummyLabels > argument in FragmentOnBonds. So I converted the primary fragments to > smiles. I looked for the two sides of the original cut point and > substituted the two dummyLables to [2H] and [3H]. I then tried to rejoin > the fragments using a reaction string "[*:1][2H].[*:2][3H]>>[*:1][*:2]". > Unfortunately the ReactionFromSmarts function does not accept this string. > So I'll have to use Smarts search to look for [2H] and [3H], then create an > editable molecule from the two primary fragments, look for neighbours of > [2H] and [3H], add a bond, then delete the atoms [2H] and [3H], then > sanitize. > > Thank you for your ideas. > > Ling > > This email has been sent from Cresset BioMolecular Discovery Limited, > registered in England and Wales, Company Number: 04151475. The information > in this email and any attachments are confidential and may be privileged. > It is intended solely for the addressee and access to this email by anyone > else is unauthorised. If an addressing or transmission error has > misdirected this email, please notify the author by replying to this email. > If you are not the intended recipient you must not use, disclose, > distribute, store or copy the information in any medium. Although this > e-mail and any attachments are believed to be free from any virus or other > defect which might affect any system into which they are opened or > received, it is the responsibility of the recipient to check that they are > virus-free and that they will in no way affect systems and data. No > responsibility is accepted by Cresset BioMolecular Discovery Limited for > any loss or damage arising in any way from their receipt, opening or use. Privacy > notice <https://www.cresset-group.com/privacy/> > |
From: Ling C. <lin...@gm...> - 2021-04-02 00:09:08
|
>> *I did manage to achieve what I want by going through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. But is there a neater way?* Oops, I wrote too soon. Actually I did not achieve what I want. The conversion from the smiles from Chem.MolFragmentToSmiles sometimes crashes, because of Sanitization problem. Ling Chan <lin...@gm...> 於 2021年4月1日週四 下午4:11寫道: > Thank you Mark for your suggestion. It sounds good and I gave it a try. > However, this leads to another question that may sound dumb. > > I have the atom indices of a fragment. For example, the fragment comes > from atoms [3,4,5,9,10,11,14] of the original molecule. How can I extract > this fragment from the molecule? I tried > > (1) using EditableMol and deleting atoms one by one using RemoveAtom. But > this does not work since the atom numbering changes after each deletion. > (2) going through FragmentOnBonds. But the output of FragmentOnBonds have > the atom indices reshuffled so I cannot directly use my index list to fish > out my fragment. > > I did manage to achieve what I want by going > through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. > But is there a neater way? Basically, is there a Chem.MolFragmentToMol > function? > > Thank you again. > > Ling > > > Mark Mackey <ma...@cr...> 於 2021年4月1日週四 上午1:58寫道: > >> Hi Ling, >> >> >> >> Having done something similar (but not in RDKit), I would suggest a >> different algorithm. I think that fragmenting the molecule first and then >> stitching the bits together is always going to be very complicated. >> Instead, just fragment the molecule in the ways that you want: >> >> >> >> - Find the set B of all breakable bonds according to your rules. I’m >> assuming here that B contains only acyclic bonds. >> >> - To get all of the pairwise pieces, for each element b of B break all >> bonds in B _except_ b. Keep the fragment containing b, and clean up. >> >> - To get all of the triplets, for each tuple (b1, b2) in B, break all >> bonds in B except b1 and b2. Keep the fragment containing b1 only if it >> also contains b2. >> >> >> >> Regards, >> >> Mark >> >> >> >> *-- * >> >> *Mark Mackey* >> >> *Chief Scientific Officer* >> >> *Cresset* >> >> New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 >> 0SS, UK >> >> tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 >> 853667 >> >> email: *ma...@cr... <ma...@cr...>* web: >> www.cresset-group.com skype: mark_cresset >> >> >> >> >> >> >> >> *From:* Ling Chan <lin...@gm...> >> *Sent:* 31 March 2021 20:56 >> *To:* RDKit <rdk...@li...> >> *Subject:* [Rdkit-discuss] rejoining pairs of fragments after >> fragmenting a molecule >> >> >> >> Dear Colleagues, >> >> I am trying to do something that I think is quite simple, but I have not >> figured out a simple way. Don't know if I am missing something. I am sure >> that ultimately I can figure it out, but I wonder if there is a good way. >> >> I fragmented a molecule with some rules, using FragmentOnBonds. I did get >> a list of primary fragments. >> >> I wish to recombine pairs (and triplets, but no bigger) of these primary >> fragments, but only if the resulting fragment is part of the original >> molecule. I.e. I want to undo some of the cuttings. (FragmentOnSomeBonds >> does not help, since you cannot ensure that the resulting fragments consist >> only of pairs of primary fragments.) >> >> What is the best way to do this? The following is what I am trying. >> >> I see that you can mark the original cut points using the dummyLabels >> argument in FragmentOnBonds. So I converted the primary fragments to >> smiles. I looked for the two sides of the original cut point and >> substituted the two dummyLables to [2H] and [3H]. I then tried to rejoin >> the fragments using a reaction string "[*:1][2H].[*:2][3H]>>[*:1][*:2]". >> Unfortunately the ReactionFromSmarts function does not accept this string. >> So I'll have to use Smarts search to look for [2H] and [3H], then create an >> editable molecule from the two primary fragments, look for neighbours of >> [2H] and [3H], add a bond, then delete the atoms [2H] and [3H], then >> sanitize. >> >> Thank you for your ideas. >> >> Ling >> >> This email has been sent from Cresset BioMolecular Discovery Limited, >> registered in England and Wales, Company Number: 04151475. The information >> in this email and any attachments are confidential and may be privileged. >> It is intended solely for the addressee and access to this email by anyone >> else is unauthorised. If an addressing or transmission error has >> misdirected this email, please notify the author by replying to this email. >> If you are not the intended recipient you must not use, disclose, >> distribute, store or copy the information in any medium. Although this >> e-mail and any attachments are believed to be free from any virus or other >> defect which might affect any system into which they are opened or >> received, it is the responsibility of the recipient to check that they are >> virus-free and that they will in no way affect systems and data. No >> responsibility is accepted by Cresset BioMolecular Discovery Limited for >> any loss or damage arising in any way from their receipt, opening or use. Privacy >> notice <https://www.cresset-group.com/privacy/> >> > |
From: Chuang, K. <Kan...@uc...> - 2021-04-02 00:42:58
|
Hi Ling, I think I've run into something similar before, have you tried using FragmentOnBonds followed by Chem.GetMolFrags? GetMolFrags lets you toggle a few things (e.g. (bool)asMols=False [, (bool)sanitizeFrags=True) to provide some workarounds with sanitization. Best, Kangway ________________________________ From: Ling Chan <lin...@gm...> Sent: Thursday, April 1, 2021 5:08 PM To: Mark Mackey <ma...@cr...> Cc: RDKit <rdk...@li...> Subject: Re: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule >> I did manage to achieve what I want by going through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. But is there a neater way? Oops, I wrote too soon. Actually I did not achieve what I want. The conversion from the smiles from Chem.MolFragmentToSmiles sometimes crashes, because of Sanitization problem. Ling Chan <lin...@gm...<mailto:lin...@gm...>> 於 2021年4月1日週四 下午4:11寫道: Thank you Mark for your suggestion. It sounds good and I gave it a try. However, this leads to another question that may sound dumb. I have the atom indices of a fragment. For example, the fragment comes from atoms [3,4,5,9,10,11,14] of the original molecule. How can I extract this fragment from the molecule? I tried (1) using EditableMol and deleting atoms one by one using RemoveAtom. But this does not work since the atom numbering changes after each deletion. (2) going through FragmentOnBonds. But the output of FragmentOnBonds have the atom indices reshuffled so I cannot directly use my index list to fish out my fragment. I did manage to achieve what I want by going through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. But is there a neater way? Basically, is there a Chem.MolFragmentToMol function? Thank you again. Ling Mark Mackey <ma...@cr...<mailto:ma...@cr...>> 於 2021年4月1日週四 上午1:58寫道: Hi Ling, Having done something similar (but not in RDKit), I would suggest a different algorithm. I think that fragmenting the molecule first and then stitching the bits together is always going to be very complicated. Instead, just fragment the molecule in the ways that you want: - Find the set B of all breakable bonds according to your rules. I’m assuming here that B contains only acyclic bonds. - To get all of the pairwise pieces, for each element b of B break all bonds in B _except_ b. Keep the fragment containing b, and clean up. - To get all of the triplets, for each tuple (b1, b2) in B, break all bonds in B except b1 and b2. Keep the fragment containing b1 only if it also contains b2. Regards, Mark -- Mark Mackey Chief Scientific Officer Cresset New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 853667 email: ma...@cr...<mailto:ma...@cr...> web: www.cresset-group.com<https://urldefense.com/v3/__http://www.cresset-group.com/__;!!LQC6Cpwp!_TcJCf-sigb7LDzkbKrsrFcW9tvIZeaipJYFoxrlJcJu_i7ISOrtBw_r-FiTOxLSLc3HOA$> skype: mark_cresset From: Ling Chan <lin...@gm...<mailto:lin...@gm...>> Sent: 31 March 2021 20:56 To: RDKit <rdk...@li...<mailto:rdk...@li...>> Subject: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule Dear Colleagues, I am trying to do something that I think is quite simple, but I have not figured out a simple way. Don't know if I am missing something. I am sure that ultimately I can figure it out, but I wonder if there is a good way. I fragmented a molecule with some rules, using FragmentOnBonds. I did get a list of primary fragments. I wish to recombine pairs (and triplets, but no bigger) of these primary fragments, but only if the resulting fragment is part of the original molecule. I.e. I want to undo some of the cuttings. (FragmentOnSomeBonds does not help, since you cannot ensure that the resulting fragments consist only of pairs of primary fragments.) What is the best way to do this? The following is what I am trying. I see that you can mark the original cut points using the dummyLabels argument in FragmentOnBonds. So I converted the primary fragments to smiles. I looked for the two sides of the original cut point and substituted the two dummyLables to [2H] and [3H]. I then tried to rejoin the fragments using a reaction string "[*:1][2H].[*:2][3H]>>[*:1][*:2]". Unfortunately the ReactionFromSmarts function does not accept this string. So I'll have to use Smarts search to look for [2H] and [3H], then create an editable molecule from the two primary fragments, look for neighbours of [2H] and [3H], add a bond, then delete the atoms [2H] and [3H], then sanitize. Thank you for your ideas. Ling This email has been sent from Cresset BioMolecular Discovery Limited, registered in England and Wales, Company Number: 04151475. The information in this email and any attachments are confidential and may be privileged. It is intended solely for the addressee and access to this email by anyone else is unauthorised. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you must not use, disclose, distribute, store or copy the information in any medium. Although this e-mail and any attachments are believed to be free from any virus or other defect which might affect any system into which they are opened or received, it is the responsibility of the recipient to check that they are virus-free and that they will in no way affect systems and data. No responsibility is accepted by Cresset BioMolecular Discovery Limited for any loss or damage arising in any way from their receipt, opening or use. Privacy notice<https://urldefense.com/v3/__https://www.cresset-group.com/privacy/__;!!LQC6Cpwp!_TcJCf-sigb7LDzkbKrsrFcW9tvIZeaipJYFoxrlJcJu_i7ISOrtBw_r-FiTOxJWf9Hoxw$> |
From: Ling C. <lin...@gm...> - 2021-04-02 00:30:35
|
Yes, Kangway, that was what I first tried, as mentioned in the first post. I did not have any problem with obtaining the primary fragments (applying all cuts) . Just that I have not yet figured out how to obtain the secondary fragments, either from recombining the primary fragments, or from fragmenting from the initial molecule (by not applying all cuts). Chuang, Kangway <Kan...@uc...> 於 2021年4月1日週四 下午5:20寫道: > Hi Ling, > > I think I've run into something similar before, have you tried using > FragmentOnBonds followed by Chem.GetMolFrags? GetMolFrags lets you toggle a > few things (e.g. (bool)asMols=False [, (bool)sanitizeFrags=True) to provide > some workarounds with sanitization. > > Best, > Kangway > ------------------------------ > *From:* Ling Chan <lin...@gm...> > *Sent:* Thursday, April 1, 2021 5:08 PM > *To:* Mark Mackey <ma...@cr...> > *Cc:* RDKit <rdk...@li...> > *Subject:* Re: [Rdkit-discuss] rejoining pairs of fragments after > fragmenting a molecule > > >> *I did manage to achieve what I want by going > through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. > But is there a neater way?* > > Oops, I wrote too soon. Actually I did not achieve what I want. The > conversion from the smiles from Chem.MolFragmentToSmiles sometimes crashes, > because of Sanitization problem. > > > Ling Chan <lin...@gm...> 於 2021年4月1日週四 下午4:11寫道: > > Thank you Mark for your suggestion. It sounds good and I gave it a try. > However, this leads to another question that may sound dumb. > > I have the atom indices of a fragment. For example, the fragment comes > from atoms [3,4,5,9,10,11,14] of the original molecule. How can I extract > this fragment from the molecule? I tried > > (1) using EditableMol and deleting atoms one by one using RemoveAtom. But > this does not work since the atom numbering changes after each deletion. > (2) going through FragmentOnBonds. But the output of FragmentOnBonds have > the atom indices reshuffled so I cannot directly use my index list to fish > out my fragment. > > I did manage to achieve what I want by going > through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. > But is there a neater way? Basically, is there a Chem.MolFragmentToMol > function? > > Thank you again. > > Ling > > > Mark Mackey <ma...@cr...> 於 2021年4月1日週四 上午1:58寫道: > > Hi Ling, > > > > Having done something similar (but not in RDKit), I would suggest a > different algorithm. I think that fragmenting the molecule first and then > stitching the bits together is always going to be very complicated. > Instead, just fragment the molecule in the ways that you want: > > > > - Find the set B of all breakable bonds according to your rules. I’m > assuming here that B contains only acyclic bonds. > > - To get all of the pairwise pieces, for each element b of B break all > bonds in B _except_ b. Keep the fragment containing b, and clean up. > > - To get all of the triplets, for each tuple (b1, b2) in B, break all > bonds in B except b1 and b2. Keep the fragment containing b1 only if it > also contains b2. > > > > Regards, > > Mark > > > > *-- * > > *Mark Mackey* > > *Chief Scientific Officer* > > *Cresset* > > New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 > 0SS, UK > > tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 > 853667 > > email: *ma...@cr... <ma...@cr...>* web: > www.cresset-group.com > <https://urldefense.com/v3/__http://www.cresset-group.com/__;!!LQC6Cpwp!_TcJCf-sigb7LDzkbKrsrFcW9tvIZeaipJYFoxrlJcJu_i7ISOrtBw_r-FiTOxLSLc3HOA$> > skype: mark_cresset > > > > > > > > *From:* Ling Chan <lin...@gm...> > *Sent:* 31 March 2021 20:56 > *To:* RDKit <rdk...@li...> > *Subject:* [Rdkit-discuss] rejoining pairs of fragments after fragmenting > a molecule > > > > Dear Colleagues, > > I am trying to do something that I think is quite simple, but I have not > figured out a simple way. Don't know if I am missing something. I am sure > that ultimately I can figure it out, but I wonder if there is a good way. > > I fragmented a molecule with some rules, using FragmentOnBonds. I did get > a list of primary fragments. > > I wish to recombine pairs (and triplets, but no bigger) of these primary > fragments, but only if the resulting fragment is part of the original > molecule. I.e. I want to undo some of the cuttings. (FragmentOnSomeBonds > does not help, since you cannot ensure that the resulting fragments consist > only of pairs of primary fragments.) > > What is the best way to do this? The following is what I am trying. > > I see that you can mark the original cut points using the dummyLabels > argument in FragmentOnBonds. So I converted the primary fragments to > smiles. I looked for the two sides of the original cut point and > substituted the two dummyLables to [2H] and [3H]. I then tried to rejoin > the fragments using a reaction string "[*:1][2H].[*:2][3H]>>[*:1][*:2]". > Unfortunately the ReactionFromSmarts function does not accept this string. > So I'll have to use Smarts search to look for [2H] and [3H], then create an > editable molecule from the two primary fragments, look for neighbours of > [2H] and [3H], add a bond, then delete the atoms [2H] and [3H], then > sanitize. > > Thank you for your ideas. > > Ling > > This email has been sent from Cresset BioMolecular Discovery Limited, > registered in England and Wales, Company Number: 04151475. The information > in this email and any attachments are confidential and may be privileged. > It is intended solely for the addressee and access to this email by anyone > else is unauthorised. If an addressing or transmission error has > misdirected this email, please notify the author by replying to this email. > If you are not the intended recipient you must not use, disclose, > distribute, store or copy the information in any medium. Although this > e-mail and any attachments are believed to be free from any virus or other > defect which might affect any system into which they are opened or > received, it is the responsibility of the recipient to check that they are > virus-free and that they will in no way affect systems and data. No > responsibility is accepted by Cresset BioMolecular Discovery Limited for > any loss or damage arising in any way from their receipt, opening or use. Privacy > notice > <https://urldefense.com/v3/__https://www.cresset-group.com/privacy/__;!!LQC6Cpwp!_TcJCf-sigb7LDzkbKrsrFcW9tvIZeaipJYFoxrlJcJu_i7ISOrtBw_r-FiTOxJWf9Hoxw$> > > |
From: Pavel P. <pav...@uk...> - 2021-04-02 05:22:06
|
Hi Ling, this can be a workaround if RDKit does not have a built-in function to extract a submolecule by atom ids. You may assign atom property labels to these atoms and then looping over atoms in EditableMol remove those ones which do not have this property assigned. Kind regards, Pavel. On 02/04/2021 02:30, Ling Chan wrote: > Yes, Kangway, that was what I first tried, as mentioned in the first > post. I did not have any problem with obtaining the primary fragments > (applying all cuts) . Just that I have not yet figured out how to > obtain the secondary fragments, either from recombining the > primary fragments, or from fragmenting from the initial molecule (by > not applying all cuts). > > Chuang, Kangway <Kan...@uc... > <mailto:Kan...@uc...>> 於 2021年4月1日週四 下午5:20寫道: > > Hi Ling, > > I think I've run into something similar before, have you tried > using FragmentOnBonds followed by Chem.GetMolFrags? GetMolFrags > lets you toggle a few things (e.g. (bool)asMols=False [, > (bool)sanitizeFrags=True) to provide some workarounds with > sanitization. > > Best, > Kangway > ------------------------------------------------------------------------ > *From:* Ling Chan <lin...@gm... > <mailto:lin...@gm...>> > *Sent:* Thursday, April 1, 2021 5:08 PM > *To:* Mark Mackey <ma...@cr... > <mailto:ma...@cr...>> > *Cc:* RDKit <rdk...@li... > <mailto:rdk...@li...>> > *Subject:* Re: [Rdkit-discuss] rejoining pairs of fragments after > fragmenting a molecule > >> /I did manage to achieve what I want by going > through Chem.MolFragmentToSmiles and then convert the Smiles back > to a Mol. But is there a neater way?/ > > Oops, I wrote too soon. Actually I did not achieve what I want. > The conversion from the smiles from Chem.MolFragmentToSmiles > sometimes crashes, because of Sanitization problem. > > > Ling Chan <lin...@gm... <mailto:lin...@gm...>> 於 > 2021年4月1日週四 下午4:11寫道: > > Thank you Mark for your suggestion. It sounds good and I gave > it a try. However, this leads to another question that may > sound dumb. > > I have the atom indices of a fragment. For example, the > fragment comes from atoms [3,4,5,9,10,11,14] of the original > molecule. How can I extract this fragment from the molecule? > I tried > > (1) using EditableMol and deleting atoms one by one using > RemoveAtom. But this does not work since the atom numbering > changes after each deletion. > (2) going through FragmentOnBonds. But the output of > FragmentOnBonds have the atom indices reshuffled so I cannot > directly use my index list to fish out my fragment. > > I did manage to achieve what I want by going > through Chem.MolFragmentToSmiles and then convert the Smiles > back to a Mol. But is there a neater way? Basically, is there > a Chem.MolFragmentToMol function? > > Thank you again. > > Ling > > > Mark Mackey <ma...@cr... > <mailto:ma...@cr...>> 於 2021年4月1日週四 > 上午1:58寫道: > > Hi Ling, > > Having done something similar (but not in RDKit), I would > suggest a different algorithm. I think that fragmenting > the molecule first and then stitching the bits together is > always going to be very complicated. Instead, just > fragment the molecule in the ways that you want: > > - Find the set B of all breakable bonds according to your > rules. I’m assuming here that B contains only acyclic bonds. > > - To get all of the pairwise pieces, for each element b of > B break all bonds in B _except_ b. Keep the fragment > containing b, and clean up. > > - To get all of the triplets, for each tuple (b1, b2) in > B, break all bonds in B except b1 and b2. Keep the > fragment containing b1 only if it also contains b2. > > Regards, > > Mark > > ** > > *-- * > > *Mark Mackey* > > *Chief Scientific Officer* > > *Cresset* > > New Cambridge House, Bassingbourn Road, Litlington, > Cambridgeshire, SG8 0SS, UK > > tel: +44 (0)1223 858890 mobile: +44 (0)7595 > 099165 fax: +44 (0)1223 853667 > > email:_m...@cr... > <mailto:ma...@cr...>_web:www.cresset-group.com > <https://urldefense.com/v3/__http://www.cresset-group.com/__;!!LQC6Cpwp!_TcJCf-sigb7LDzkbKrsrFcW9tvIZeaipJYFoxrlJcJu_i7ISOrtBw_r-FiTOxLSLc3HOA$> > skype: mark_cresset > > *From:*Ling Chan <lin...@gm... > <mailto:lin...@gm...>> > *Sent:* 31 March 2021 20:56 > *To:* RDKit <rdk...@li... > <mailto:rdk...@li...>> > *Subject:* [Rdkit-discuss] rejoining pairs of fragments > after fragmenting a molecule > > Dear Colleagues, > > I am trying to do something that I think is quite simple, > but I have not figured out a simple way. Don't know if I > am missing something. I am sure that ultimately I can > figure it out, but I wonder if there is a good way. > > I fragmented a molecule with some rules, using > FragmentOnBonds. I did get a list of primary fragments. > > I wish to recombine pairs (and triplets, but no bigger) of > these primary fragments, but only if the resulting > fragment is part of the original molecule. I.e. I want to > undo some of the cuttings. (FragmentOnSomeBonds does not > help, since you cannot ensure that the resulting fragments > consist only of pairs of primary fragments.) > > What is the best way to do this? The following is what I > am trying. > > I see that you can mark the original cut points using the > dummyLabels argument in FragmentOnBonds. So I converted > the primary fragments to smiles. I looked for the two > sides of the original cut point and substituted the two > dummyLables to [2H] and [3H]. I then tried to rejoin the > fragments using a reaction string > "[*:1][2H].[*:2][3H]>>[*:1][*:2]". Unfortunately the > ReactionFromSmarts function does not accept this string. > So I'll have to use Smarts search to look for [2H] and > [3H], then create an editable molecule from the two > primary fragments, look for neighbours of [2H] and [3H], > add a bond, then delete the atoms [2H] and [3H], then > sanitize. > > Thank you for your ideas. > > Ling > > > This email has been sent from Cresset BioMolecular > Discovery Limited, registered in England and Wales, > Company Number: 04151475. The information in this email > and any attachments are confidential and may be > privileged. It is intended solely for the addressee and > access to this email by anyone else is unauthorised. If an > addressing or transmission error has misdirected this > email, please notify the author by replying to this email. > If you are not the intended recipient you must not use, > disclose, distribute, store or copy the information in any > medium. Although this e-mail and any attachments are > believed to be free from any virus or other defect which > might affect any system into which they are opened or > received, it is the responsibility of the recipient to > check that they are virus-free and that they will in no > way affect systems and data. No responsibility is accepted > by Cresset BioMolecular Discovery Limited for any loss or > damage arising in any way from their receipt, opening or > use. Privacy notice > <https://urldefense.com/v3/__https://www.cresset-group.com/privacy/__;!!LQC6Cpwp!_TcJCf-sigb7LDzkbKrsrFcW9tvIZeaipJYFoxrlJcJu_i7ISOrtBw_r-FiTOxJWf9Hoxw$> > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Andrew D. <da...@da...> - 2021-04-02 08:51:35
|
On Mar 31, 2021, at 21:55, Ling Chan <lin...@gm...> wrote: > I am trying to do something that I think is quite simple, but I have not figured out a simple way. Don't know if I am missing something. I am sure that ultimately I can figure it out, but I wonder if there is a good way. If you can work in SMILES space rather than molecule space, then try: http://dalkescientific.com/smiles_weld.py It's derived from a technique I developed for the mmpdb package. I called it 'welding' the SMILES strings. What I do is convert the wildcards into closures, then let RDKit merge the closures. (There are a few tricky parts, like support for double-bond stereo chemistry.) Here's an example, where I use a dictionary to tell the program that [1*] should be bonded to [2*]. >>> from rdkit import Chem >>> smi = "N#Cc1ccncc1" >>> mol = Chem.MolFromSmiles(smi) >>> frag_mol = Chem.FragmentOnBonds(mol, [1]) >>> frag_smi = Chem.MolToSmiles(frag_mol) >>> frag_smi '[1*]c1ccncc1.[2*]C#N' >>> import smiles_weld >>> smiles_weld.convert_wildcards_to_closures(frag_smi, {1: 1, 2: 1}) 'c%991ccncc1.C%99#N' >>> Chem.CanonSmiles('c%991ccncc1.C%99#N') 'N#Cc1ccncc1' If you use matching dummy labels then you can omit the conversion table: >>> frag_mol = Chem.FragmentOnBonds(mol, [1], dummyLabels=((4,4),)) >>> frag_smi = Chem.MolToSmiles(frag_mol) >>> frag_smi '[4*]C#N.[4*]c1ccncc1' >>> smiles_weld.convert_wildcards_to_closures(frag_smi) 'C%99#N.c%991ccncc1' >>> Chem.CanonSmiles('C%99#N.c%991ccncc1') 'N#Cc1ccncc1' Note: while the mmpdb code is well-tested, I modified it this morning to handle what I think you want, and I haven't fully tested the new code. The program assumes the SMILES is a canonical SMILES generated by RDKit, and that the wildcard labels don't have a charge, hydrogen count, or other attribute. Cheers, Andrew da...@da... |
From: Ling C. <lin...@gm...> - 2021-04-02 14:23:41
|
Thank you Francois, I took a look at your code and borrowed parts of it to rejoin two molecules. It seems like my problem is solved. I eventually arrived at something like example 4 in https://www.programcreek.com/python/example/123334/rdkit.Chem.CombineMols (which I discovered a bit late). Still, I am not sure if the code is safe. In particular, I wonder if the following conditions are always valid. 1. Chem.CombineMols simply concatenates the atomic indices from the input molecules. 2. The Chem.EditableMol constructor preserves atom ordering from the input. 3. RemoveAtom in EditableMol results in all indices above the deleted to decrease by one, i.e. atom ordering is preserved. Thank you! Ling Francois Berenger <ml...@li...> 於 2021年3月31日週三 下午5:27寫道: > On 01/04/2021 04:55, Ling Chan wrote: > > Dear Colleagues, > > > > I am trying to do something that I think is quite simple, but I have > > not figured out a simple way. Don't know if I am missing something. I > > am sure that ultimately I can figure it out, but I wonder if there is > > a good way. > > > > I fragmented a molecule with some rules, using FragmentOnBonds. I did > > get a list of primary fragments. > > I have an ad hoc fragmenting scheme and fragment assembly > implemented in there: > > https://github.com/UnixJunkie/molenc/blob/master/bin/molenc_smisur.py > > Sorry, but this is non trivial code. > > Look for the function bind_molecules to connect two fragments. > > The rdkit python doc might have some simpler examples, using > well-known/published fragmenting schemes > (BRICS or Recap): > http://www.rdkit.org/docs/GettingStartedInPython.html > > Regards, > F. > > > I wish to recombine pairs (and triplets, but no bigger) of these > > primary fragments, but only if the resulting fragment is part of the > > original molecule. I.e. I want to undo some of the cuttings. > > (FragmentOnSomeBonds does not help, since you cannot ensure that the > > resulting fragments consist only of pairs of primary fragments.) > > > > What is the best way to do this? The following is what I am trying. > > > > I see that you can mark the original cut points using the dummyLabels > > argument in FragmentOnBonds. So I converted the primary fragments to > > smiles. I looked for the two sides of the original cut point and > > substituted the two dummyLables to [2H] and [3H]. I then tried to > > rejoin the fragments using a reaction string > > "[*:1][2H].[*:2][3H]>>[*:1][*:2]". Unfortunately the > > ReactionFromSmarts function does not accept this string. So I'll have > > to use Smarts search to look for [2H] and [3H], then create an > > editable molecule from the two primary fragments, look for neighbours > > of [2H] and [3H], add a bond, then delete the atoms [2H] and [3H], > > then sanitize. > > > > Thank you for your ideas. > > > > Ling > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdk...@li... > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Andrew D. <da...@da...> - 2021-04-02 15:03:51
|
Hi Ling, > On Apr 2, 2021, at 16:23, Ling Chan <lin...@gm...> wrote: > > Thank you Francois, I took a look at your code and borrowed parts of it to rejoin two molecules. It seems like my problem is solved. I eventually arrived at something like example 4 in > https://www.programcreek.com/python/example/123334/rdkit.Chem.CombineMols > (which I discovered a bit late). > > Still, I am not sure if the code is safe. In particular, I wonder if the following conditions are always valid. > • Chem.CombineMols simply concatenates the atomic indices from the input molecules. > • The Chem.EditableMol constructor preserves atom ordering from the input. > • RemoveAtom in EditableMol results in all indices above the deleted to decrease by one, i.e. atom ordering is preserved. I've found that it's very hard to work with molecular graphs and preserve stereochemistry. Consider F/C=C/Cl breaking on the first bond, and the code I pointed you to. FragmentOnBonds() using '9' as the labels gives: [9*]/C=C/Cl.[9*]F My "smiles_weld" code converts that to: CC\%99=C/Cl.F%99 which can be re-canonicalized to the original: F/C=C/Cl . Or, with F[C@H](Cl)Br again, breaking on the first bond. FragmentOnBonds() gives [9*]F.[9*][C@H](Cl)Br smiles_weld converts that to F%99.[C@@H]%99(Cl)Br which is re-canonicalized as F[C@H](Cl)Br Handling this correctly in the molecule API requires paying careful attention to the bond direction, and bond attachment order around the atom, which changes with RemoveAtom() calls. I didn't see stereochemistry support in Francois's "bind_molecules()" nor in the connect_mols() at https://github.com/molecularsets/moses/blob/master/moses/baselines/combinatorial.py (one of the examples from the programcreek.com link you gave). If you don't need to support or preserve stereochemistry, then of course there's no problem. Cheers, Andrew da...@da... |
From: Francois B. <ml...@li...> - 2021-04-05 03:08:37
|
On 03/04/2021 00:03, Andrew Dalke wrote: > Hi Ling, > >> On Apr 2, 2021, at 16:23, Ling Chan <lin...@gm...> wrote: >> >> Thank you Francois, I took a look at your code and borrowed parts of >> it to rejoin two molecules. It seems like my problem is solved. I >> eventually arrived at something like example 4 in >> https://www.programcreek.com/python/example/123334/rdkit.Chem.CombineMols >> (which I discovered a bit late). >> >> Still, I am not sure if the code is safe. In particular, I wonder if >> the following conditions are always valid. >> • Chem.CombineMols simply concatenates the atomic indices from the >> input molecules. >> • The Chem.EditableMol constructor preserves atom ordering from the >> input. >> • RemoveAtom in EditableMol results in all indices above the deleted >> to decrease by one, i.e. atom ordering is preserved. > > I've found that it's very hard to work with molecular graphs and > preserve stereochemistry. > > Consider F/C=C/Cl breaking on the first bond, and the code I pointed > you to. > > FragmentOnBonds() using '9' as the labels gives: [9*]/C=C/Cl.[9*]F > > My "smiles_weld" code converts that to: CC\%99=C/Cl.F%99 which can be > re-canonicalized to the original: F/C=C/Cl . > > Or, with F[C@H](Cl)Br again, breaking on the first bond. > > FragmentOnBonds() gives [9*]F.[9*][C@H](Cl)Br > > smiles_weld converts that to F%99.[C@@H]%99(Cl)Br which is > re-canonicalized as F[C@H](Cl)Br > > Handling this correctly in the molecule API requires paying careful > attention to the bond direction, and bond attachment order around the > atom, which changes with RemoveAtom() calls. I didn't see > stereochemistry support in Francois's "bind_molecules()" nor in the > connect_mols() at After discussing with an organic chemist, we decided that fragmenting (on the computer) molecules on bonds which are involved in stereochemistry (stereo bond or linked to a stereo center) is not desirable. I.e. if the molecules being fragmented have stereochemistry assigned, we don't touch around it. > https://github.com/molecularsets/moses/blob/master/moses/baselines/combinatorial.py > (one of the examples from the programcreek.com link you gave). > > If you don't need to support or preserve stereochemistry, then of > course there's no problem. > > Cheers, > > Andrew > da...@da... > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |