|
From: Eric M. <em...@mi...> - 2020-07-31 16:57:23
|
Dear Bob,
I often find it useful to extract the one-letter amino acid sequence
from the ATOM records in a PDB file with Jmol, using " print {*.ca}.group1".
This lists one residue per line. I have figured out how to convert this
listing to a FASTA format paragraph, but for some this may be an
obstacle. So (1) I suggest an option to print the residues in FASTA
format like this
>filename.pdb (or filename.cif)
ABCDEFG.....
Perhaps "print {*.ca}.group1fasta" would be the command?
(2) Maybe this would be too much fuss to implement, but it would also be
extremely useful to generate a sequence alignment between the SEQRES
(full experimental sequence) and the ATOMs, that is residues that have
coordinates. In such an alignment, there would be gaps where residues
are listed in SEQRES but missing ATOM coordinates (typically due to
disorder in the electron density map). In our example, suppose residues
CDE lack coordinates. Then the output would be
>filename.pdb SEQRES
ABCDEFG...
>filename.pdb ATOM
AB---FG...
(FASTA requires a description line starting ">" in column 1, above the
sequence paragraph.) Such an alignment can be copied and pasted directly
into e.g. http://MSAReveal.Org for flexible display options.
As an alternative to generating the alignment from the SEQRES and ATOM
records, for mmCIF files, this is alignment is included in the file
under *_pdbx_poly_seq_scheme* and would merely need to be converted from
columnar 3-letter codes to paragraph 1-letter FASTA format. It would be
OK to require CIF files for such output, since CIF files are available
for all published PDB entries.
Thanks for considering these suggestions, -Eric
|