HOPPscore v1.0
June 2005
This program may be used freely for academic purposes. For
commercial licensing information please contact the author at
gesims@lbl.gov.
HOPPscore is an outgrowth of research conducted in the Sung-Hou
Kim lab by Gregory E. Sims and In-Geol Choi. The appropriate reference
to cite should be:
Gregory E. Sims, In-Geol Choi, and Sung-Hou Kim (2005).
Protein Conformational Space in Higher Order Phi-psi Maps.
Proc. Nat. Acad. Sci. 102, 618-621.
HOPP(Higher order phi-psi pair)score can assess the quality of
experimentally or theoreticallly determined scores against a reference
database of high resolution structures. In this capacity it is similar
to PROCHECK, however HOPPscore compares not only single phi-psi pairs
of angles within the tested structure to a high resolution reference, but
also higher order pairs. These higher order pairs are fragments of
structures represented by more than 1 pair. The current implementation
scores phi-psi pairs 1 through 5.
The utilities supplied in this package are:
hoppscore - Scores supplied pdb files for structural quality
hoppscoremkdb - Make a customized reference database. This
uility should only be necessary in specialized
situations. Default reference databases are
included.
hppTorsions - Calculates torsion angles from pdb coordinates.
To obtain help see the manual file
man hoppscore
USAGE:
~~~~~~~~~~~~~~~
HOPPscore is a perl script which can be executed:
USAGE:
hoppscore [OPTION] ... [FILE] ...
perl hoppscore [OPTION] ... [FILE] ...
-r NUM, --res=NUM,
Specify resolution (in Angstroms) level of reference dataset
where # is one of the following choices: 1.0, 1.2, 1.5, 1.7
The default value is 1.7 Angstroms
-g NUM, --grid=NUM,
Specify the grid size which the reference databases are binned
into. # Specifies a degree value: 2,4,6,8,10,12,14,16,18,20.
The default value is 12 degrees
-b, --brief
Eliminate 'by fragment' scores and print out just the
average score for each fragment length,
-s NUM, --scale=NUM,
Set Standard devation coefficient. Sets ALLOWED/DISALLOWED
boundary. Default value is 0.5.
-f NUM, --favored=NUM
Set Favored score bonus. Default is 2.0.
-a NUM, --allowed=NUM
Set Allowed score bonus. Default is 1.0.
-u NUM, --unfavored=NUM
Set Unfavored score bonus. Default is 0.5.
-d NUM, --disfavored=NUM
Set Disallowed score bonus. Default is a penalty of -4.0
-p PATH, --dbpath=PATH
Path to reference database dir
-t PATH, --torsionpath=PATH
Path to reference database dir
Example:
hoppscore -r 1.7 -g 12 sample.pdb
Output is sent to STDOUT so you can redirect to a file if you wish
hoppscore -r 1.7 -g 12 sample.pdb > sample.pdb.out
Feel free to try this out for yourself, with the included sample.pdb
Structural Reference Databases
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the process of scoring PDB structures, HOPPscore refers to
data within hash-table files which specify the number of
structures found in that database which have a given length
and specific torsion angle configuration.
Severl reference databases have been included, with the naming
convention:
hash.x.y.1 ... hash.x.y.5
where x is the resolution level: 10 (1.0 Angstrom), 12 (1.2 A),
15 (1.5 A), 17 (1.7 A) and y is the degree grid sizes: 2,4,6,8
10,12,14,16,18.
Only the data for the 12 degree grid size has been included in
this distribution. Other database grid sizes can be obtained at
http://compgen.lbl.gov/~gsims/hoppscore.
Place these additional files in your ~/hopscore/refdata directory.
The pre-built reference databases were created by selecting X-Ray crystal
structures from the PDBSELECT database with the appropriate resolution
levels. PDBSELECT is a non-redundant database with less than 20%
sequence homology. Each resolution level is a subset of the next
highest. In other words, the 1.2 and 1.0 sets have structures ranging
in resolution from 0-1.2 Angstroms and 0-1.0 Angstroms respectively.
Reference databases will be maintained whenever there are significant
additions the PDBSELECT database. The LATEST updates and can be found at:
http://compgen.lbl.gov
Should you desire, the utility makeDataBase.pl has been included to
create your own reference data. This might be useful if you wished
to compare the score properties of a test structure to a specific
class of pdb structures which you have collected.
Keep your self-made database to the same naming convention as
described above hash.x.y.1 - hash.x.y.5 and overwrite the included
prebuilt databases.
Also, each of the resolution levels is built by placing reference
structures into 12 degree bins by phi-psi angles. If a finer or coarser
grained bin structure is desired then run pdbselect with the argument
-g x where x is a degree value from the set (2,4,6,8,10,12,14,16,18,20).
NOTE: The finer the grid used, the less tolerant HOPPscore will
be for phi-psi angle values which deviate slightly from the database.
-r x - Resolution level for the reference database: 1.0, 1.2, 1.5
1.7
Example Output
~~~~~~~~~~~~~~~~~~~~
Program output is sent to STDOUT and looks like this:
!F - Favored, A - Allowed, U - Unfavored, D - Disallowed
!F > 612.7, 183.3 < A > 612.7, 0 < U > 183.3, D=0
!Avg: 183.290662650602, Sigma: 858.791452289234
!Input File: test.pdb
!Reference Database Resolution Limit: 17
!phi-psi pairs: 1
! SS AA FILENAME RES F/Avg Favored?
!-----------------------------------------------
C D test.pdb 2 10.19 F 2
S A test.pdb 3 34.83 F 2
G P test.pdb 4 99.17 F 2
G F test.pdb 5 5.93 F 2
G E test.pdb 6 1.01 A 1
.
.
.
!-----------------------------------------------
Sum: 111
Score: 1.23
Column 1 specifies the secondary structure as determined by
DSSP. C is assigned to pair values without defined DSSP
secondary structure.
Column 2 is the amino acid sequence
Column 3 is the file input name. This is helpful if you've
concatenated several PDB files and you test them all at once.
Column 4 Residues from PDB file.
Column 5 Frequency / Average Occupancy
Column 6 Favorability
Column 7 Score
Scoring Convention
~~~~~~~~~~~~~~~~~~~
HOPPscore places fragments from the PDB structure into one of
4 classes F-Favored, A-Allowed, U-Unfavored, D-Disallowed.
Classes designation is determined by the similarity of the phi-psi
angles in the fragment to the reference structure. Fragments
are placed into 12*Unitlength degree wide bins. The unitLenth
is the number of phi-psi pairs representing the structure in the
fragment. If there are many reference structures in that bin then
that particular conformation is highly favored. The number of
structures in the bin is the 'occupancy' and is expressed in terms
of frequency divided by the average occupancy.
Structures that are favored (F) have an occupancy which is greater
than the average+sigmaF*sigma (where sigma is the standard
deviation). The default value for sigmaF is 1.
Occupancy value STATUS SCORE
-----------------------------------------------------------------
Occupancy > avg+sigmaF*sigma FAVORED +2
avg+sigmaF*sigma >= Occupancy > Avg ALLOWED +1
avg >= Occupancy > 0 UNFAVORED +0.5
Occupancy = 0 DISALLOWED -4
The scores for each fragment are added together and averaged to
determine and overall score for the PDB file.
Phi-psi pair lengths 1 through 5 are scored in that order and
sent to output.