Download Latest Version ParallelVSR.tar.gz (15.0 kB)
Email in envelope

Get an email when there's a new version of ParallelVSR

Home
Name Modified Size InfoDownloads / Week
ParallelVSR.tar.gz 2012-03-24 15.0 kB
README.txt 2012-03-12 11.7 kB
Totals: 2 Items   26.7 kB 0
===============================README=====================================
ParallelVSR is a pipeline (available at http://sourceforge.net/projects/parallelvsr/) for parallel virtual screening on R platform using an integrated approach which combines ligand based virtual screening (LBVS) and strcuture based virtual screening (SBVS). It consists of three programs APGenVSR, SSVSR and DockVSR. The programs APGenVSR and SSVSR performs LBVS using ChemmineR - a cheminformatics package on R platform, whereas DockVSR performs SBVS using AutoDock Vina. The pipeline is freely available at http://sourceforge.net/projects/parallelvsr/files/.

=============================REQUIREMENTS=================================
(1)Architecture: Shared Memory Architecture or Distributed Memory Architecture or Hybrid Distributed-Shared Memory Architecture

(2)OS: Linux,Mac (Not been tested on Windows)

(3)Install Open MPI: available from http://www.open-mpi.org/. Please make sure to configure it as per the architecture of the system. (other MPI implementations may also work, but has not been tested so far. We have tested it on Open MPI 1.4.3.)

(4)Install R: available from http://www.r-project.org/. (We have tested it on R-2.13.0 and R-2.13.1)

(5)Install following R packages:
				(a)ChemmineR: available from http://www.bioconductor.org/packages/release/bioc/html/ChemmineR.html (We have tested on release 2.4.0 and 2.4.3)
				(b)rcdk: available from http://cran.r-project.org/web/packages/rcdk (We have tested on release 3.0.5 and 3.1.3)
				(c)Rmpi: available from http://cran.r-project.org/web/packages/Rmpi. After installing Rmpi, copy the "Rprofile" file from installed Rmpi library to home directory of the user as ".Rprofile". (We have tested on release 0.5.9)

(6)Install MGLTools from http://mgltools.scripps.edu/downloads (We have used release 1.5.4). On Mac systems, replace first line in /Library/MGLTools/$Version/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_ligand4.py from: #! /usr/bin/env python to: #! /usr/bin/env /Library/MGLTools/$Version/bin/pythonsh. Please Note: making any changes at ROOT level requires administrative previlege.

(7)Install AutoDock Vina from http://vina.scripps.edu/download.html (We have used version 1_1_1_linux86 and 1_1_2_linux86).

(8)Install OpenBabel from http://openbabel.org/wiki/Get_Open_Babel (We have used release 2.3.0). The command "Obabel" should be accessible from console.

======================Download and Installation=======================
Download:
ParallelVSR pipeline comes as a compressed file "ParallelVSR.tar.gz" downloadable from http://sourceforge.net/projects/parallelvsr/files/.

Installation:
There is NO SETUP file for ParallelVSR. User need to just uncompress the file "ParallelVSR.tar.gz", which will create a folder "ParallelVSR". This folder contains nine sub-directories along with a README and a sample configuration file for AutoDock Vina.
The sub-directories are:
1.DATASET:- to store chemical library files in SDF format.
2.AP-DATABASE:- to store Atom Pair Descriptors of chemical library molecules.
3.SDFSET-DATABASE:- to store SDFSet class instances (molecular information of library molecules).
4.PROGRAMS:- stores three programs - APGenVSR.R, SSVSR.R and DockVSR.R
5.PARAMETERS:- stores three parameter files - APGenParam.txt, SSParam.txt and DockParam.txt
6.QUERY:- to store Query molecule in SDF format.
7.RECEPTOR-PDBQT:- to store receptor pdbqt(s).
8.SIMILARITY-SEARCH-DIRECTORY:- to store Similarity Search results.
9.DOCKING-DIRECTORY:- to store Docking results.

=========================HOW TO RUN===================================
Our approach of virtual screening in R can be divided into three steps:

 (I) Atom pair descriptor generation using program "APGenVSR": This step corresponds to the creation of database of 2D atom pair descriptors of chemical library molecules using ChemmineR package.

 (II) Similarity Search & Molecular Descriptor Calculation for the structurally similar molecules using program "SSVSR": This step corresponds to searching structurally similar molecules from the chemical library with respect to the Query molecule based on the 2D atom pair descriptors of molecules using ChemmineR package and then , calculating some of the molecular properties of the structurally similar molecules using rcdk package.

 (III) Structure based virtual screening on previously filtered molecules by using "DockVSR" program. It uses AutoDock Vina as docking tool.
  
    =======STEP (I): ATOM PAIR DESCRIPTOR GENERATION========================
    Step 1. The user must have a library of molecules available in SDF file format. The files can be compressed such as .sdf.gz. PubChem contains millions of molecules. So, user can start with downloading the molecules from ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/. Place the library files in the directory "ParallelVSR/DATASET/". 

    Step 2. The R script for 2D atom pair description generation "APGenVSR.R" has been kept at "ParallelVSR/PROGRAMS/" and its parameter file "APGenParam.txt" at "ParallelVSR/PARAMETERS/". There is NO NEED to make any CHANGES in the parameter file. Now its time to run your job. Go to "ParallelVSR/PROGRAMS" directory. Run the job by following methods:

	On Shared Memory Architecture =>
		Run the following command on console: 
		mpirun -np 5 R --no-save -q<"APGenVSR.R" # This creates 1 Master R process & 4 Slave R processes. User should change the number of processes according to the number of cores available.

	On Distributed/Hybrid Distributed-Shared Memory Architecture =>
		Run the following command on console:
		mpirun -np 5 -hostfile localhost.hosts R --no-save -q<"APGenVSR.R" # This creates 1 Master & 4 Slave R processes. The file localhost.hosts holds the information of the nodes and their corresponding number of CPU cores available for the program.

		#===An example of hostfile===
		node1	slots=12
		node2	slots=6
		node3	slots=10
		#============================	

    Successful completion of the above steps will generate 2D atom pair descriptors of the library molecules and store those as binary form ("*_ap.rda" files)into "ParallelVSR/AP-DATABASE/". Instances of SDFset class ("*_sdfset.rda" files), which hold molecular information, will be stored in "ParallelVSR/SDFSET-DATABASE/" while some log files will be generated in "ParallelVSR/Temp_APGen/".
		
     ======STEP (II): SIMILARITY SEARCH & MOLECULAR DESCRIPTOR CALCULATION======
     Step 1. The user must, already, have 2D atom pair descriptors of the chemical library molecules stored as "*_ap.rda" files and instances of SDFset class stored as "*_sdfset.rda" files [As obtained from STEP (I)].

     Step 2. The Query file(s) should be in SDF file format. Each file must contain only one molecule and should be kept at "ParallelVSR/QUERY/".

     Step 3. The parameter file "SSParam.txt" for docking purpose has been placed at "ParallelVSR/PARAMETERS/". Assign correct values to the variables in the parameter file. There is only ONE VARIABLE that NEEDS TO BE CHANGED which is "CutOff". Other variables can be left with their default values. The variable "CutOff" is the threshold of tanimoto coefficient for similarity measurement, lowering its value will bring diverse set of molecules and higher values will bring highly structurally similar molecules.

     Step 4. The program to perform Similarity Search is "SSVSR.R". Go to "ParallelVSR/PROGRAMS" directory. Run your similarity search job by following methods:

     	On Shared Memory Architecture =>
     		Run the following command on console:
     		mpirun -np 5 R --no-save -q<"SSVSR.R" 

     	On Distributed/Hybrid Distributed-Shared Memory Architecture =>
     		Run the following command on console:
     		mpirun -np 5 -hostfile localhost.hosts R --no-save -q<"SSVSR.R"

     	Successfull completion of the above steps will generate two sub-directories "Matched_Query_SDF" and "Temp_Folder_For_Query", along with three files named, say, "Matched_query.sdf", "Matched_query.xls" and "Log_SimilaritySearch.txt", all under the default directory "ParallelVSR/SIMILARITY-SEARCH-DIRECTORY/". While the 1st one will contain the SDF files of the similar molecules, the 2nd .xls file will contain details of the Query and the resultant similar molecules. The .xls file will contain informations such as the similarity score, SMILES format, Molecular Weight, Number of Hydrogen bond acceptors and donars etc. The informations, other than the similarity score can be used furher (manually) to filter molecules on the basis of their molecular properties.

     ======STEP (III): PROTEIN-LIGAND DOCKING of the SELECTED MOLECULES======
     Step 1. The user must have the structurally similar molecules [As obtained from STEP (II)] along with the Query molecule in their SDF file format. Keep these files at "ParallelVSR/DOCKING-DIRECTORY/LIGAND-SDFs/".

     Step 2. Create .pdbqt file(s) for the given receptor protein using standard procedures as available in AutoDockTools(ADT). For flexible docking, the rigid part of the receptor should be like "*RIGID.pdbqt" and flexible part should be like "*FLEX.pdbqt" . Place the file(s) at "ParallelVSR/RECEPTOR-PDBQT/".

     Step 3. Assign appropriate values in "vinaconf.txt" - the configuration file for running AutoDock Vina.

     Step 4. The parameter file "DockParam.txt" for docking purpose has been placed at "ParallelVSR/PARAMETERS/". Assign correct values to the variables in the the parameter file. There are four variables "MGLToolsHOME", "AutoDockVinaHOME","NoOfTimes" and "MachineType" which NEED TO BE PROPERLY ASSIGNED. If "MachineType" is assigned a value "2", then user should indicate number of processors (assign proper value to "cpu") to be used during docking with AutoDock Vina. Others can be left with default values. On Mac system, set MGLToolsHOME="/Library/MGLTools/$Version" and AutoDockVinaHOME="/usr/local" - the default locations where MGLTools and AutoDockVina get installed.

     Step 5. The program to perform docking study is "DockVSR.R". Go to "ParallelVSR/PROGRAMS" directory. Run your docking with following methods:

	On Shared Memory Architecture =>
		Run the following command on console:
		R --no-save -q<"DockVSR.R"

	On Distributed/Hybrid Distributed-Shared Memory Architecture =>
		Run the following command on console:
		mpirun -np 5 -hostfile localhost.hosts R --no-save -q<"DockVSR.R"

	Successfull completion of the above steps will bring a sorted list "DockResult.txt" according to the binding free energy of protein-ligand complexes. There wll be individual directory named as per the corresponding molecule's ID for every molecule undergoing docking. This "DockResult.txt" is the final result of the pipeline "ParallelVSR".



========================USEFUL SITES============================
(1)OpenMPI:- How to install Open MPI? : http://www.open-mpi.org/faq/?category=building.
(2)AutoDock Vina:- Molecular docking and virtual screening program. Go through the manual to use AutoDock Vina for docking study. Manual is available at http://vina.scripps.edu/manual.html.
(3)MGLTools: How to install on Linux machine? : http://mgltools.scripps.edu/downloads/instructions/linux.
(4)MGLTools: How to install on Mac OS X machine? : http://mgltools.scripps.edu/downloads/instructions/mac.
(5)Rmpi package:- How can I implement Rmpi to accelerate my own job? :  A very well described tutorial is available at http://math.acadiau.ca/ACMMaC/Rmpi.

=========Facing PROBLEM????   Please contact=====================
vermasrikant@gmail.com or srikant.verma@igib.in
Source: README.txt, updated 2012-03-12