Menu

#770 SERINE residue incorrectly converted when last residue

2.3.x
open
nobody
5
2012-10-23
2011-12-02
No

Considering a peptide sequences, the SERINE residue is recognized and converted correctly (from Smiles string to any 3D structure) at any position EXCEPT the last one.
In other words. when its smiles string sequence is NC@@(CO)C=O or NC@@(CO)C(=O) , you will notice that one of the carboxylate's oxygens is missing, which corresponds to a SERINE residue bound to another residue by its carboxylate carbon.
On the contrary, when the SERINE is the Cterminal residue of the peptide; i.e. its Smiles string is ...NC@@(CO)C(=O)O , then OpenBabel will not recongnize the SERINE residue and write it in the PDB file as an unknown residue ("UNK"), and its atoms will be considered as "HETATM", which prevents them from being typed (alpha carbon: CA, etc...). Which gives the following for the SERINE residue.

HETATM 60 N UNK A 9 4.954 -4.052 8.717 1.00 0.00 N
HETATM 61 C UNK A 9 4.694 -4.357 10.134 1.00 0.00 C
HETATM 62 C UNK A 9 3.665 -3.415 10.762 1.00 0.00 C
HETATM 63 O UNK A 9 4.268 -2.161 11.068 1.00 0.00 O
HETATM 64 C UNK A 9 4.178 -5.783 10.233 1.00 0.00 C
HETATM 65 O UNK A 9 3.663 -6.429 9.325 1.00 0.00 O
HETATM 66 O UNK A 9 4.330 -6.325 11.454 1.00 0.00 O

IMPORTANT:
The error is specific to the SERINE residue. Any other residue is correctly recognized at any position in the peptide.

I enclosed files which provide a full example:
"0019_QGNVTSIHS_orig.smi" is a file containing a correct smiles string (the last SERINE has a complete carboxlate function)
"0019_QGNVTSIHS_orig.pdb" is the file produced by openBabel from the conversion of the previous one using the options: "-d" and "--gen3d"
look at the SERINE at the end of the peptide, you will see the above lines.

"0019_QGNVTSIHS.smi" is a file where I manually tweaked the Smiles string, taking off one of the oxygens of the carboxylate group of the last SERINE. This is NOT the correct Smiles string. An incomplete carboxylate function is biologically incoherent.
"0019_QGNVTSIHS.pdb" is the file produced by OpenBabel from the conversion of the previous one, using the same options as above. Now it works... but the peptide will look - and be - incorrect in any visualization tool, because of the missing oxygen of the carboxylate function.

I do need to get a new release, where this bug is fixed, as soon as possible to go on with my PhD project. I have a script looking for the alpha carbon of the different residues, and as long as this bug is not fixed (which should be really fast as soon as someone will get to consider it), I can't go on with my project.

Thank you for all the work you put in this nice piece of software, otherwise perfect to me!
All the best

Discussion

  • Kévin RUE-ALBRECHT

    Files with "orig" treat a correct smiles string incorrectly, while the other treat a tweaked case which works

     
  • Noel O'Boyle

    Noel O'Boyle - 2011-12-05

    Does it work if you use the ionised form, i.e. C(=O)[O-] instead of C(=O)O?

     
  • Kévin RUE-ALBRECHT

    Doesn't help.
    The only difference is the following:

    C(=O)O produces:
    HETATM 69 O UNK A 9 -0.608 16.510 12.318 1.00 0.00 O

    While C(=O)[O-] produces:
    HETATM 69 O UNK A 9 9.503 2.414 7.149 1.00 0.00 O1-

    As you can see, the residue remains unknown, instead of SER.

     
  • Noel O'Boyle

    Noel O'Boyle - 2011-12-12

    If you convert to pdb first (without -d and without --gen3d), and then to a 3D PDB, it seems to work for me. Can you confirm?

     
  • Kévin RUE-ALBRECHT

    I confirm that it works this way :)

    However do you plan to fix the bug itself in the near future? In other words, when and how could I be informed that fixed and stable code is available?

    In the mean time, I don't mind designing a special-case in my pipeline, but it would make life much simpler to consider those peptides like any other...

    In the meantime, thanks for this little "hack" avoiding the problem :)

     
  • Kévin RUE-ALBRECHT

    Thanks :)

     
  • Kévin RUE-ALBRECHT

    Hi again Noel,

    I must change my last remark: your "hack" avoiding the SER problem has some issue. Not straightforward to say why, though.

    To generate the bug, follow your advice: take a peptide ending with a SERINE, get the smiles string, turn it into a PDB without -d or --gen3d, then into a 3D PDB.
    As you said, the file "looks" fine, and so does the structure.

    However, there must be something wrong somewhere, I'd say angles or distances, because when I prepare such generated PDB files for docking (prepare_ligand4.py, from the MGL tools kit), the output PDBQT structure is missing bonds. Mainly backbone ones, but the weirdest thing is that aromatic cycles are not generated flat (REALLY not flat, in some cases). This bug is 100% repeatable with OpenBabel 2.3.1 that I'm using. All the peptides generated through those two step PDB have the same issue, while the ones directly generated through the classic -d --gen3d from the smiles string seem to generate correct PDBQT files so far, with the MGL tools kit.

    Interestingly, I found that OpenBabel 2.3.0 occasionally generated PDB files bahaving the same way, even when generated in a single -d --gen3d step. There might have been a fix made to the one-step (smiles -> 3D PDB) generation, that wasn't applied to the (no coordinate PDB -> 3D PDB) conversion.

    Anyway, this two-step conversion idea was worth trying, while waiting for the bug to be fixed, so thanks. I'll wait for the bug fix now. I hope Geoff's work is going well on that part!
    Cheers