I am using smina in order to minimize ligand-protein complexes and I have noticed that the output in PDBQT file containing the binding site produced by smina the residue names have been replaced with UNL.
This poses me two problems:
1.- On the one hand, it is difficult to merge the minimized residues back into the original PDB file.
2.- On the other, some scoring functions expect to find residue names.
I have tried to convert the PDBQT files back to PDB using OpenBabel, but it treats the residues as heteroatoms belonging to a single unknown residue.
Would it be possible to modify the smina code in order to preserve residue information?
Best regards,
Miro
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
David Koes
Assistant Professor
Computational and Systems Biology
University of Pittsburgh
On 03/28/2016 08:00 AM, Miro wrote:
Hello,
I am using smina in order to minimize ligand-protein complexes and I
have noticed that the output in PDBQT file containing the binding site
produced by smina the residue names have been replaced with UNL.
This poses me two problems:
1.- On the one hand, it is difficult to merge the minimized residues
back into the original PDB file.
2.- On the other, some scoring functions expect to find residue names.
I have tried to convert the PDBQT files back to PDB using OpenBabel, but
it treats the residues as heteroatoms belonging to a single unknown residue.
Would it be possible to modify the smina code in order to preserve
residue information?
Attached you will find the input and output files (output files contain the string "min").
As you can see, in both output files (ligand and protein flexible residues) all residue names have been changed to "UNL".
I do not know if things would be different if I provide a split protein pdbqt file with flexible and rigid residues. However, letting smina handling this automatically is a lot more convenient when one is dealing with large numbers of proteins and ligands, as it is my case.
Thank you and kind regards,
Miro Moman
RCSI Molecular Medicine
Dublin, Ireland
The issue here is that OpenBabel's PDBQT writing code eliminates residue
names (because the flex part only includes side chains, not whole
residues). We use this code in two place - once at the beginning in
defining the flexible residues and once at the end to write out a pdbqt
file, if that is what is requested.
I've committed a workaround for the first case, so we don't lose residue
information right off the bat. I've also updated smina.static. If you
output as a pdbqt you will still get UNL residues, but pdb (no qt)
output will retain the residue names.
Hope this helps,
David Koes
Assistant Professor
Computational and Systems Biology
University of Pittsburgh
Attached you will find the input and output files (output files contain
the string "min").
As you can see, in both output files (ligand and protein flexible
residues) all residue names have been changed to "UNL".
I do not know if things would be different if I provide a split protein
pdbqt file with flexible and rigid residues. However, letting smina
handling this automatically is a lot more convenient when one is dealing
with large numbers of proteins and ligands, as it is my case.
Thank you and kind regards,
Miro Moman
RCSI Molecular Medicine
Dublin, Ireland
It works! Thank you. The residue name of the ligand is still modified (which could be an issue if it were a peptide, but it is not the case), however, the flexible residues pdb output preserves the original residue names.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, the problem now is that it removes the protein atom types and the residue numbers. As a consequence, most programs do not understand the protein structure correctly and it is very difficult to merge the flexible residues back into the original structure.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is the relevant part of the script (it profits from the fact that the order of the residues and the atoms within each residue, except for the amide atoms, do not change in the input and output files):
Hi,
Has anyone come up with a generic script that allows for the merging of flexible residues with the original PDB ? The beauty of Smina relative to Autodock is that one does not have to prepare separate flexible and rigid protein segments. This simplification would seem to be lost if in order to re-merge the 2 parts this is exactly what one has to do. I cannot figure out a straight forward solution given that several atoms of the original and the flexible residues will overlap with the same coordinates.
Any smart algorithms out there ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Apologies for the delayed response - I've been a bit swamped and am slowly making my way through my email queue. I had thought we had implemented this feature at some point, but apparently not. Unfortunately, the task is made a bit difficult by the fact that the atom names get lost during docking, but I put together a script that is hopefully not too fragile (but does require all files to be PDB format): https://github.com/gnina/gnina/blob/master/scripts/makeflex.py
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks David. I can confirm that your script worked effortlessy and perfectly. For other users, here are some installations that I had to make that are required by the script, but your setup may already have these packages.
Hi David,
I also just tried out that script makeflex.py, but found that in the final protein structure all oxygen and nitrogens, and some carbon atoms from the flex output are missing, and that I am left with loads of H atoms that are not attached to any other atoms.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I am using smina in order to minimize ligand-protein complexes and I have noticed that the output in PDBQT file containing the binding site produced by smina the residue names have been replaced with UNL.
This poses me two problems:
1.- On the one hand, it is difficult to merge the minimized residues back into the original PDB file.
2.- On the other, some scoring functions expect to find residue names.
I have tried to convert the PDBQT files back to PDB using OpenBabel, but it treats the residues as heteroatoms belonging to a single unknown residue.
Would it be possible to modify the smina code in order to preserve residue information?
Best regards,
Miro
Can you provide a test case?
David Koes
Assistant Professor
Computational and Systems Biology
University of Pittsburgh
On 03/28/2016 08:00 AM, Miro wrote:
This would be the command line:
smina --receptor proteins/${protein}.pdbqt --ligand docking/${ligand}/${result}.pdbqt --flexdist_ligand docking/${ligand}/${result}.pdbqt --out docking/${ligand}/${result}_min.pdbqt --out_flex docking/${ligand}/${result}_protmin.pdbqt --log docking/${ligand}/${result}_min.log --flexdist 6 --minimize
Attached you will find the input and output files (output files contain the string "min").
As you can see, in both output files (ligand and protein flexible residues) all residue names have been changed to "UNL".
I do not know if things would be different if I provide a split protein pdbqt file with flexible and rigid residues. However, letting smina handling this automatically is a lot more convenient when one is dealing with large numbers of proteins and ligands, as it is my case.
Thank you and kind regards,
Miro Moman
RCSI Molecular Medicine
Dublin, Ireland
The issue here is that OpenBabel's PDBQT writing code eliminates residue
names (because the flex part only includes side chains, not whole
residues). We use this code in two place - once at the beginning in
defining the flexible residues and once at the end to write out a pdbqt
file, if that is what is requested.
I've committed a workaround for the first case, so we don't lose residue
information right off the bat. I've also updated smina.static. If you
output as a pdbqt you will still get UNL residues, but pdb (no qt)
output will retain the residue names.
Hope this helps,
David Koes
Assistant Professor
Computational and Systems Biology
University of Pittsburgh
On 03/29/2016 04:33 AM, Miro wrote:
Thanks a million! I will download the updated code and give it a try ASAP.
It works! Thank you. The residue name of the ligand is still modified (which could be an issue if it were a peptide, but it is not the case), however, the flexible residues pdb output preserves the original residue names.
OK, the problem now is that it removes the protein atom types and the residue numbers. As a consequence, most programs do not understand the protein structure correctly and it is very difficult to merge the flexible residues back into the original structure.
If at least the residue number could be kept, it would be already useful.
OK, I have writen a bash script to merge smina's minimised flexible residues back into the orginal pdbqt file to allow for rescoring.
This is the relevant part of the script (it profits from the fact that the order of the residues and the atoms within each residue, except for the amide atoms, do not change in the input and output files):
smina --receptor 3HC5_A.pdbqt --ligand docking/${ligand}/${pose}.pdbqt --flexdist_ligand docking/${ligand}/${pose}.pdbqt --out docking/${ligand}/${pose}_min.pdbqt --out_flex docking/${ligand}/${pose}_3HC5_min.pdb --log docking/${ligand}/${pose}_smina.log --flexdist 5 --minimize
Remerge the smina minimised residues into the original pdbqt file for rescoring
grep "^Flexible residues:" docking/${ligand}/${pose}_smina.log | sed "s/Flexible residues: //g" | tr ' ' '\n' | cut -d":" -f2 | while read resnumber; do
grep "^ATOM" 3HC5_A.pdbqt | grep -E "^.{23}${resnumber}" | sed '4d' | sed '1d' | sed "/^.{13}H.*$/d" > docking/${ligand}/split_${resnumber}.pdbqt
done
cat docking/${ligand}/split_* > docking/${ligand}/flexible_residues.pdbqt
rm docking/${ligand}/split_*
grep "^ATOM" docking/${ligand}/${pose}_3HC5_min.pdb | sed "/^.{13}H.*$/d" > docking/${ligand}/flexible_residues.pdb
paste <(cut -c 1-27 docking/${ligand}/flexible_residues.pdbqt) <(cut -c 28-54 docking/${ligand}/flexible_residues.pdb) <(cut -c 55-79 docking/${ligand}/flexible_residues.pdbqt) --delimiters '' > docking/${ligand}/flexible_residues_merged.pdbqt
mv docking/${ligand}/flexible_residues_merged.pdbqt docking/${ligand}/flexible_residues.pdbqt
rm docking/${ligand}/flexible_residues.pdb
sed -i 's/^.{26}/&:/' docking/${ligand}/flexible_residues.pdbqt
sed -e 's/^.{26}/&:/' 3HC5_A.pdbqt > docking/${ligand}/3HC5_A.tmp
awk -F":" 'NR==FNR{a[$1]=$0;next;}a[$1]{$0=a[$1]}1' docking/${ligand}/flexible_residues.pdbqt docking/${ligand}/3HC5_A.tmp > docking/${ligand}/3HC5_A_flex.pdbqt
sed -i 's/://g' docking/${ligand}/3HC5_A_flex.pdbqt
rm docking/${ligand}/flexible_residues.pdbqt docking/${ligand}/3HC5_A.tmp
babel docking/${ligand}/3HC5_A_flex.pdbqt docking/${ligand}/3HC5_A_flex.pdb -d -p7.4
rm docking/${ligand}/3HC5_A_flex.pdbqt
~/MGLTools-1.5.6/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_receptor4.py -r docking/${ligand}/3HC5_A_flex.pdb -o docking/${ligand}/${pose}_3HC5_min.pdbqt -U nphs
rm docking/${ligand}/3HC5_A_flex.pdb
Last edit: Miro 2016-04-04
Hi,
Has anyone come up with a generic script that allows for the merging of flexible residues with the original PDB ? The beauty of Smina relative to Autodock is that one does not have to prepare separate flexible and rigid protein segments. This simplification would seem to be lost if in order to re-merge the 2 parts this is exactly what one has to do. I cannot figure out a straight forward solution given that several atoms of the original and the flexible residues will overlap with the same coordinates.
Any smart algorithms out there ?
Hi Paulette,
Apologies for the delayed response - I've been a bit swamped and am slowly making my way through my email queue. I had thought we had implemented this feature at some point, but apparently not. Unfortunately, the task is made a bit difficult by the fact that the atom names get lost during docking, but I put together a script that is hopefully not too fragile (but does require all files to be PDB format):
https://github.com/gnina/gnina/blob/master/scripts/makeflex.py
Dear David,
Thank you very much! I am looking forward to trying the script tomorrow.
Thanks David. I can confirm that your script worked effortlessy and perfectly. For other users, here are some installations that I had to make that are required by the script, but your setup may already have these packages.
pip install -U ProDy
pip install biopython
python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose
Hi David,
I also just tried out that script makeflex.py, but found that in the final protein structure all oxygen and nitrogens, and some carbon atoms from the flex output are missing, and that I am left with loads of H atoms that are not attached to any other atoms.
Can you try using PDBs instead of PDBQTs? Do you have an example you can provide?
I am actually using PDBs. I have attached an example of the original
protein and the flex output.
On 06/03/2019 15:30, David Koes wrote:
I pushed a bug fix. Not sure how I didn't catch this before. Give it another try.
81e5ce27064db63dae8df734d8406d60122ffed5
Thank you David, that is working. No heavy atoms are missing anymore.