From: Kelvin L. <kl...@ac...> - 2014-03-31 09:36:59
|
Hello, I am using PyMOL 0.99rc6. I am wondering if there is a means to obtain a FASTA sequence from a loaded pdb file that maintains the gaps due to missing portions of the structure? I found one program that will strip the sequence from pdb files, but it simply reads out the amino acids that are present linearly. Gaps in the sequence are not maintained. I can't imagine why that would be useful. I have a large number of pdb files to deal with and would like to avoid having to align each pdb sequence to the gene just to recover the gaps. Thanks for your time, -- Kelvin Luther Physiologische Institut Universität Zürich |
From: Edward A. B. <Be...@up...> - 2014-03-31 14:45:04
|
Due to the possibility of insertion codes and non-sequential residue numbering, I believe there is no way to avoid aligning the residues in the ATOM records with the sequence in SEQRES in order to find gaps. I don't know of a program to do this. The structure validation server at RCSB ADIT2 makes this alignment for the depositor to look at, but it would not be easy to include in a script. If all you want are "accurate fasta sequence" for the the protein there are programs to convert the SEQRES records to a string of one-letter codes. The SEQRES record in principle has the sequence of what is present in the crystal, regardless of whether it is visualized or not. However there cannot be conflicts between the SEQRES and the atom records, so if the structure contains unknown ('UNK') residues, they have to be UNK in the SEQRES also, even if the sequence is known. And if a string of UNK residues is disconnected on both ends, i.e. in the middle of a gap of missing residues, then it is pretty arbitrary which residues in seqres get replaced with UNK. Kelvin Luther wrote: > Hello, > > I am using PyMOL 0.99rc6. I am wondering if there is a means to obtain > a FASTA sequence from a loaded pdb file that maintains the gaps due to > missing portions of the structure? I found one program that will strip > the sequence from pdb files, but it simply reads out the amino acids > that are present linearly. Gaps in the sequence are not maintained. I > can't imagine why that would be useful. I have a large number of pdb > files to deal with and would like to avoid having to align each pdb > sequence to the gene just to recover the gaps. > > Thanks for your time, > |
From: Thomas H. <tho...@sc...> - 2014-03-31 14:51:35
|
Hi Kelvin, the "psico" module provides a "fasta" command which maintains the gaps. However, psico will not work with PyMOL 0.99 which is quite old and uses an ancient Python version (psico requires Python 2.6 and and the PyMOL 1.2 API). Psico installations instructions and download link: http://pymolwiki.org/index.php/Psico Cheers, Thomas On 31 Mar 2014, at 05:13, Kelvin Luther <kl...@ac...> wrote: > Hello, > > I am using PyMOL 0.99rc6. I am wondering if there is a means to obtain > a FASTA sequence from a loaded pdb file that maintains the gaps due to > missing portions of the structure? I found one program that will strip > the sequence from pdb files, but it simply reads out the amino acids > that are present linearly. Gaps in the sequence are not maintained. I > can't imagine why that would be useful. I have a large number of pdb > files to deal with and would like to avoid having to align each pdb > sequence to the gene just to recover the gaps. > > Thanks for your time, > > -- > Kelvin Luther > Physiologische Institut > Universität Zürich -- Thomas Holder PyMOL Developer Schrödinger, Inc. |
From: Thomas H. <tho...@sc...> - 2014-04-14 17:08:19
Attachments:
fasta.py
|
Hi Kelvin, that "fasta" command right now only dumps the sequence to the log window, it doesn't return or save the sequence to a file. This is probably because for my use case it was always sufficient to copy and paste the sequence from the log window. However, it's not too difficult to improve the script and hook it up to PyMOL's "save" command. See attachment, after running that script (which is independent of psico) you can do "save FileName.fasta". Cheers, Thomas |