Home / automap1.1
Name Modified Size InfoDownloads / Week
Parent folder
README 2013-12-02 18.7 kB
automap.zip 2013-12-02 15.9 kB
Totals: 2 Items   34.6 kB 0
AutoMap v1.1
============

CONTENTS

1. Requirements
2. Additional requirements
3. Installation
4. Preparing input for AutoMap
5. Running AutoMap
6. Optimizing cumulative sum cutoffs for AutoMap
7. Publications and author contacts

1. Requirements
---------------
AutoMap requires a Perl installation to run. Most modern Linux distributions
will include this. AutoMap has been tested using openSUSE 10.x and 11.x, but
should run on most modern Linux distributions with a Perl installation. AutoMap
has also been tested on Windows XP and 7. For Windows functionality, Cygwin is
required. Perl and dos2unix are required to be installed in the Cygwin
installation. It is also recommended to install compilers (C++ and Fortran), 
tcsh and the X Window System (via the xinit package); while not directly 
required by AutoMap, these packages will greatly enhance Cygwin functionality.

AutoMap has not been tested on Mac systems, but should work as long as Perl is
available.

2. Additional requirements
--------------------------
LigPlot and the Silico toolkit are required to run AutoMap. Maestro and PyMOL
are optional requirements, but recommended for visualization.

LigPlot is no longer distributed, but has been replaced by LigPlot+, available
at no cost to acadamic users from here:

http://www.ebi.ac.uk/thornton-srv/software/LigPlus/

It is recommended to obtain the Het Group Dictionary from the PDB for use with 
LigPlot:

http://rcsb-deposit.rutgers.edu/het_dictionary.txt

The Silico toolkit can be obtained from the following link:

http://silico.sourceforge.net/

Maestro is available at no cost to academic users, and can be obtained from
the following link:

http://www.schrodinger.com/downloadcenter/10/

The Open Source version of the PyMOL Molecular Graphics System is available
here:

http://sourceforge.net/projects/pymol/

To process AutoDock output, the Accelrys DS Visualizer is required. A licence 
for Discovery Studio is not required. DS Visualizer is available at no cost to
academic users, and can be obtained from the following link:

http://accelrys.com/products/discovery-studio/visualization-download.php

Follow the instructions issued with these respective packages to install these
additional on your system. 

AutoMap has been evaluated with LigPlot v4.x and with v5.x available with
LigPlot+. Please see the notes in the Installation section for details of use 
with the LigPlot+ version of LigPlot. Silico v1.01 has been used with AutoMap. 
Maestro and PyMOL are recommended for visualization and do not impact on the 
running of AutoMap.

Examples of the use of some of the AutoMap steps, along with input/output for
these steps are available in a separate download at our SourceForge website
(examples.zip).

3. Installation
---------------
Copy the scripts to the folder containing the ligand ensemble and receptor 
files for analysis. The LigPlot executables and associated parameter files 
must also be copied to this folder. If you have obtained the Het Group 
Dictionary, this too needs to go in the folder with the ligand, receptor and 
LigPlot executables.

If using the LigPlot+ version of LigPlot, executables for a number of platforms
are available within the /lib folder of the LigPlot+ package. Copy the
executables relevant to your platform to the folder containing the ligand
ensemble and receptor files for analysis. You will also need to copy the 
default LigPlot parameter file (ligplot.prm) from the /lib/params folder to this
folder.

4. Preparing input for AutoMap
------------------------------
AutoMap has been evaluated using docking output from Glide, GOLD, DOCK and 
AutoDock. It may also be used with ensembles obtained from other molecular 
docking programs and molecular dynamics simulations, but has not been 
thoroughly evaluated in all of these applications. The use of AutoDock output
requires a modified procedure from that of the other programs.

Prior to docking a multi-residue ligand (e.g. carbohydrate or peptide), it is 
STRONGLY RECOMMENDED that the residues are clearly specified and defined in the 
input ligand. For multi-residue ligands that contain several of the same 
residues in a row, it is highly recommended that these are specified with
different residue names (e.g., trimannose might be specified as MAN-MAN-MAN, but
it is better to specify it as MNA-MNB-MNC, or other way that clearly 
distinguishes the three residues). Failure to do this will result in both
inaccurate mapping of van der Waals interactions and the failure of AutoMap to
correctly process AutoDock results.

Do not use underscores in ligand names as these will affect proper operation of 
some of the scripts.

The protein and ligand ensemble MUST be prepared as separate files.

To prepare the protein file for AutoMap, use the following procedure:

1. Export the protein coordinates as a PDB file. Note that you MUST use the 
   protein coordinates that were used in your docking simulation; if the
   coordinates have changed in any way, then the results of AutoMap may not
   work out as expected.
2. Open the resulting PDB file in a text editor.
3. Remove all lines beginning with anything other than "ATOM". If there are 
   "HETATM" lines (e.g., due to metal ions or other co-factors), change these
   to "ATOM  ". It is ABSOLUTELY IMPERATIVE that ALL "TER" and "ENDMDL/END"
   are removed at this point.
4. Before saving the modified PDB file, ensure that a "TER" line is present at
   the end of the file. The "TER" line should also contain the atom number one
   greater than the final "ATOM" line. For example, if the final "ATOM" line
   is numbered 6535, then the added/altered "TER" line should read:

   TER    6536

   It is important that this "TER" line is only present at the end of the file
   as this is used by AutoMap to assist in renumbering the ligand atoms for
   interaction analysis by LigPlot.
5. Once these changes are made, save the resulting file.

Note that residue numbering for any protein chain should not exceed 999.
Furthermore, the protein MUST have chain identifiers.

To prepare the ligand ensemble for AutoMap, the following procedure, which
uses Maestro and the Silico toolkit, is highly recommended. This procedure
reflects that used in our lab to prepare the ligand ensemble. The first three
steps of this procedure may be replaced to avoid the use of Maestro, however,
Steps 4 and 5 are required. The file conversion utilities in Silico create 
several headings within the PDB file of the ligand ensemble which are used by
AutoMap.

1. Import the ligand ensemble(s) to be analyzed into Maestro.
2. Select all of the relevant entries and export them into one uncompressed
   Maestro file. AutoMap can process different ligands within an
   ensemble, however, each ensemble of a specific ligand must be saved in its own 
   file. Maps will be generated individually for each ligand, and combined 
   across the entire ligand ensemble. Thus, the ligands should be suitably 
   related to one another if the ensemble map is to be of any use (e.g. by 
   containing a common epitope, or being of a particular structural/functional 
   class).
3. Open a terminal in the folder where the ligands were saved, and execute the
   following command (requires Silico):

   read_write_mol2 <ligands>.mae

   where <ligands> is the name of the ligand file. This step will create a
   file called <ligands>_new.mol2
4. Once Step 3 is complete, execute the following command (requires Silico):

   read_write_pdb <ligands>_new.mol2

   This step will create a file called <ligands>_new_new.pdb
5. Rename <ligands>_new_new.pdb to <ligand>_lib.pdb, where <ligand> is the
   name of the ligand.

The ligand ensemble should not have any chain identifiers after conversion
using the Silico tools, even if did have them prior to conversion. It is
important that the ligand has no chain identifiers for use with AutoMap.

If using GOLD output, the data in the COMPND field should be simplified before 
proceeding. For example, typically, the COMPND field might look like this for 
GOLD poses:

1SL5|1sl5_lig|mol2|1|dock1

This string contains information about the protein and ligand that is not 
required for AutoMap. Furthermore, the pipe symbol will cause problems with 
AutoMap. In this example, this would be fixed by opening the converted pose 
library in a text editor, and replacing all instances of 
"1SL5|1sl5_lig|mol2|1|dock" with a simpler string, such as "ligand".

To prepare ligand ensembles obtained using AutoDock, use the following 
procedure. The procedure is quite complicated compared to the procedure to 
convert docking output from other programs; this is due to the way in which 
the atoms are reordered by AutoDock.

1. Run the getallad4clus.pl script in the folder containing the AutoDock DLG 
   output. This will extract the clustered poses from the DLG file.
2. Type the following command:

   ls –ltr cluster*.pdb > ls.lst

3. Open Accelrys DS Visualizer/Discovery Studio.
4. Execute the allad4conv.pl script in DS Visualizer/Discovery Studio.
5. Import the MOL2 files generated by DS Visualizer/Discovery Studio into 
   Maestro.
6. Select all of the relevant entries and export them into one uncompressed 
   Maestro file (poselib.mae).
7. Open a terminal in the folder where the ligands were saved, and execute the 
   following commands (requires Silico):

   rm –rf cluster*.mol2
   read_write_mol2 poselib.mae
   read_write_pdb poselib_new.mol2
   mv poselib_new_new.pdb poselib.pdb
   rm –rf poselib.mae
   rm –rf poselib_new.mol2

   These commands remove the MOL2 files generated by DS Visualizer/Discovery 
   Studio, convert the ensemble saved by Maestro into PDB format, and remove 
   the intermediary Maestro and MOL2 files containing the ensemble.
8. Create a text file (allequiv.lst) which contains equivalence tables for all 
   of the ligands present in the ensemble. The equivalence table should look 
   as set out below:

   >ligandname
   IDS     1,IDS     5
   SGN     2,SGN     4
   IDS     3,IDS     3
   >...
   ...

   Each line starting with a greater-than sign should contain the name of the 
   ligand. This must be identical to the name of the ligand used in the COMPND 
   section of the PDB file, which should be identical across the pose ensemble 
   for that ligand. The subsequent lines indicate the residues that are 
   “equivalent” in the structure. The first part of the line indicates 
   what the residue should be specified as, while the second part indicates what
   the residue currently appears as. Thus, in the above example, all instances 
   of IDS5 are converted to IDS1. Note that existing all instances of IDS1 are 
   retained as IDS1; only atoms that are named according to anything in the 
   second part of the equivalence table are changed. The third line in the 
   example is included so that instances where IDS3 appears are retained.

   Once the table is complete, save the file. If running on Windows, this file 
   must be in UNIX format before proceeding. To convert the file to UNIX format, 
   execute the following command (requires dos2unix installed in Cygwin):

   dos2unix allequiv.lst

   This will ensure that the file uses UNIX line breaks, and not Windows line 
   breaks.
9. Run the batchad4fix.pl script in the folder containing the converted 
   poselib.pdb. This will execute the fixad4lib.pl script to generate the fixed 
   libraries containing ensembles of poses for each ligand.
10. Run the setupmlp.pl script. This renames the files generated by the 
    batchad4fix.pl script to the _lib format expected by mlp.pl.

5. Running AutoMap
------------------
Prior to running AutoMap, ensure that the AutoMap scripts and LigPlot
installation are present in the folder where you wish to run AutoMap, along
with the prepared coordinates of the ligand ensemble and protein.

The standard way to run AutoMap is to perform interaction analysis of a given
protein-ligand ensemble, following by automated site mapping based on the
interaction output. To achieve this, use the following procedure:

1. Open a terminal and execute the following command:

   ./mlp.pl ./<protein>.pdb

   This will perform interaction analysis using LigPlot for each pose in the
   ligand ensemble and collect the results for subsequent analysis.
2. Once complete, execute the following command:

   ./sitemap.pl <hb-cutoff> <vdw-cutoff>

   where <hb-cutoff> and <vdw-cutoff> are the relevant cumulative sum cutoffs
   for hydrogen bonding and van der Waals interactions to be used in selecting
   protein residues of importance to ligand recognition.
3. The output of site mapping is stored in a file called site.map.

mlp.pl collects the ligplot.sum, ligplot.hhb and ligplot.nnb files into one
file for each type of table (allsum.csv, allhhb.csv and allnnb.csv). If you
wish to also collect these files separately for each complex, as well as the 
visual output of LigPlot, create the following folders in the directory where
you run AutoMap prior to executing mlp.pl:

pics -> to collect the visual output of LigPlot for each complex
sum  -> to collect the ligplot.sum file individually for each complex
hhb  -> to collect the ligplot.hhb file individually for each complex
nnb  -> to collect the ligplot.nnb file individually for each complex

Since this can add several hundred megabytes to the output, this information
is not stored by default.

sitemap.pl can generate PDB files with the B-factor column modified to
represent the percentage contribution to interactions made by each residue.
Residues with account for at least 5% of interactions will be visibly colored
in these files. To create and render such maps, the following procedure is
recommended:

1. Execute the following command:

   ./sitemap.pl <hb-cutoff> <vdw-cutoff> ./<protein>.pdb

   This will create the files <protein>_hbd.pdb and <protein>_vdw.pdb, which
   can be used to render the hydrogen bonding and van der Waals site maps
   respectively.
2. Open these newly created PDB files in PyMOL.
3. Execute the following command in the PyMOL terminal:

   cmd.spectrum("b","blue_white_red",selection="<selection-name>",minimum=0,
       maximum=73)

   where <selection-name> is the name of the relevant entry in PyMOL. Note
   that, with the exception of the <selection-name> parameter, the same
   command in PyMOL can be used to render both the hydrogen bonding and van
   der Waals site maps.
4. Render the colored proteins as surfaces.

In programs other than PyMOL, it should be possible to render the maps in the
appropriate colors by selecting the color palette for B/temperature factor.

To execute sitemap.pl on mlp.pl output from multiple ligands, execute the 
following command:

./multisitemap.pl <hb-cutoff> <vdw-cutoff>

This command takes the *allsum.csv files generated by mlp.pl and executes 
sitemap.pl on each one, as well as the original allsum.csv file, which contains 
interaction data for the entire ligand ensemble.

multisitemap.pl can also generate renderable PDB files based on site mapping 
output of multiple ligands:

./multisitemap.pl <hb-cutoff> <vdw-cutoff> <protein>

6. Optimizing cumulative sum cutoffs
------------------------------------
Prior to optimization, ensure all of the relevant files are present in the
folder where you wish to run AutoMap.

1. For each validation system you wish to use, generate a pose ensemble using 
   the cognate ligand, and convert the files into the required formats for 
   AutoMap as described earlier.

2. Execute the following command for each validation system:

   ./mlp.pl ./<protein>.pdb

   Rename the resulting allsum.csv file to <protein>_VAL.csv BEFORE running
   mlp.pl for subsequent validation systems.
3. For each validation system, a list of the contacting residues in the
   crystallographic complex of the validation system must be supplied, and
   must be named <protein>.lst. No distinction between residues involved in
   hydrogen bonding or van der Waals contacts is made. The .lst files should
   contain as many entries as there are interactions. Each entry must be
   formatted as below:

   RESnnnAC

   where RES is the residue ID of the contacting residue, nnn is the residue
   number, A is a letter which may appear after the residue number (e.g., in
   Kabat numbering of antibodies, it is common to have residues numbered 100A,
   100B, etc.), and C is the chain identified. Note that the numbers MUST
   appear in the appropriate columns. Some examples are listed below.

   ASP  5 A <- aspartate which is #5 in chain A
   GLN 27 L <- glutamine which is #27 in chain L
   GLY102 H <- glycine which is #102 in chain H
   TRP100AH <- tryptophan which is #100A in chain H

   The simplest way to create the <protein>.lst file is to perform LigPlot on
   the crystallographic complex, then generate the <protein>.lst file manually
   based on the output given in ligplot.sum.
4. Once mlp.pl has been performed for each validation system, the allsum.csv
   files renamed appropriately, and the .lst files created, execute the
   following command:

   ./optcutoff.pl <resolution>

   where <resolution> is intervals at which to perform mapping. If this is not
   specified, a default resolution of 10% is used. Note that larger resolution
   values may result in an inaccurate cutoff being chosen, while smaller values
   will result in the optimization taking significantly longer for a limited
   increase in accuracy. Resolution values should be selected such that 100 
   divided by the resolution still results in an integer (e.g., 10, 5, 2.5, 
   etc.)
5. Once the procedure completes, the optimal cutoffs will be displayed
   on-screen. These cutoffs should be used in subsequent runs of sitemap.pl
   for protein-ligand systems related to the systems used for validation.

7. Publications and author contacts
-----------------------------------
If you use AutoMap in your publication, please cite the following publications:

1. Agostino et al., Mol Immunol, 2009, 47, 233-246.

For queries relating to AutoMap, please contact:

Dr Mark Agostino     mark.agostino@curtin.edu.au
Dr Elizabeth Yuriev  Elizabeth.Yuriev@monash.edu

or via our Sourceforge website:

http://ligmap.sourceforge.net/
Source: README, updated 2013-12-02