[Jmol-users] FASTA sequences? Missing residues?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dear Bob,

I often find it useful to extract the one-letter amino acid sequence 
from the ATOM records in a PDB file with Jmol, using " print {*.ca}.group1".

This lists one residue per line. I have figured out how to convert this 
listing to a FASTA format paragraph, but for some this may be an 
obstacle. So (1) I suggest an option to print the residues in FASTA 
format like this

 >filename.pdb (or filename.cif)
ABCDEFG.....

Perhaps "print {*.ca}.group1fasta" would be the command?

(2) Maybe this would be too much fuss to implement, but it would also be 
extremely useful to generate a sequence alignment between the SEQRES 
(full experimental sequence) and the ATOMs, that is residues that have 
coordinates. In such an alignment, there would be gaps where residues 
are listed in SEQRES but missing ATOM coordinates (typically due to 
disorder in the electron density map). In our example, suppose residues 
CDE lack coordinates. Then the output would be

 >filename.pdb SEQRES
ABCDEFG...
 >filename.pdb ATOM
AB---FG...

(FASTA requires a description line starting ">" in column 1, above the 
sequence paragraph.) Such an alignment can be copied and pasted directly 
into e.g. http://MSAReveal.Org for flexible display options.

As an alternative to generating the alignment from the SEQRES and ATOM 
records, for mmCIF files, this is alignment is included in the file 
under *_pdbx_poly_seq_scheme* and would merely need to be converted from 
columnar 3-letter codes to paragraph 1-letter FASTA format. It would be 
OK to require CIF files for such output, since CIF files are available 
for all published PDB entries.

Thanks for considering these suggestions, -Eric

[Jmol-users] FASTA sequences? Missing residues?

An interactive viewer for three-dimensional chemical structures.

[Jmol-users] FASTA sequences? Missing residues?