ERmod Wiki

Approximate solvation free energy calculator

Brought to you by: drmaruyama, e1kojima, kenkasa, nbykmtb, ssakuraba

QuickStartGuide-AMBER

Authors:

Quick Start Guide for AMBER

AMBER format is supported as of ERmod 0.3.0.

Building ERmod plugin

Since not all computer systems have the NetCDF, ERmod does not contain a plugin file for the NetCDF file format, which is used in AMBER.
Consult VMD plugin build to see how to compile the netcdf plugin.

MD Preparation

To use AMBER for ERmod, user must compile AMBER with NetCDF support. NetCDF support may not be enabled depending on user's environment! If user does not understand what NetCDF is, stand back and read the AMBER manual.

Set environment to let AMBER know where AMBER is. If user's shell is bash or zsh,

export AMBERHOME=(path to AMBER)
export PATH=$AMBERHOME/exe:$PATH

or if the shell is csh or tcsh,

setenv AMBERHOME (path to AMBER)
setenv PATH $AMBERHOME/exe:$PATH

(path to AMBER) should be replaced with an appropriate path to the AMBER MD package.
If user cannot execute the AMBER program, consult administrator of his cluster/machine.

Preparing topology

This quick-start guide aims to calculate the solvation free energy of ethanol.
Unpack the ERmod data archive (https://sourceforge.net/projects/ermod/files/data_example/).
In the directory, there are several input files used in this guide.

This guide uses the topology of ethanol existing in the archive, and does not show how to generate a new one.
Please consult e.g. the Antechamber tutorial (http://ambermd.org/antechamber/example.html) for further information.

There is a topology input file etohsol.leap.
Feed it into tleap to generate the input topology:

tleap -f etohsol.leap

After execution of tleap, there will be files named etohsol.top and etohsol.crd which correspond to ethanol-water solution system, water.top and water.crd which correspond to pure water solvent system, and etohiso.top and etohiso.crd which correspond to isolated ethanol molecule in vacuum.
The contents of etohsol.leap read as:

1   # For Ambertools <= 15
2   #source leaprc.ff99SB
3   # For Ambertools >= 16
4   source leaprc.protein.ff19SB
5   # For Ambertools <= 19 next line may not be necessary
6   source leaprc.water.tip3p
7   loadAmberPrep etoh.prep
8   mol = loadPDB etoh.pdb
9   check mol
10  saveAmberParm mol etohiso.top etohiso.crd
11  
12  # To get exactly 1000 molecules we can adjust the length and remove some unnecessary molecules
13  # (Actually you do not have to do this, we just want to show how...)
14  solvateBox mol TIP3PBOX 13.41
15  remove mol mol.1002
16  saveAmberParm mol etohsol.top etohsol.crd
17  
18  # We must have EXACTLY the same number of solvent molecules for both solution system and reference (pure solvent) system.
19  # Thus we remove the solute to ensure they have the same number of solvents
20  remove mol mol.1
21  saveAmberParm mol water.top water.crd
22  quit

Line 4 corresponds to the force field to be used; it is AMBER ff19SB in the above case.
Line 7 corresponds to loading the AMBER parameter file for ethanol. File etoh.prep is included in the data archive downloaded.
Line 10 prepares the input files for the MD of isolated ethanol.
Line 14 creates a system of solvated ethanol. This example uses solvateBox command to create a cubic box of about 13.4 angstrom in size. The number of water molecules recommended to be added is about 1000, if the solute is small and its charge is absent or equal to 1 or -1.
In the following line, a water molecule is removed just to set the number of solvent water molecules exactly to 1000 (this is not necessary but just for demonstration purpose who want to reproduce)
Line 16 prepares the input files for the MD of ethanol-water solution system.
In Lines 20 and 21, the solute molecule (ethanol) is removed, and pure water box is prepared.

Equilibration and production run

To obtain the solvation free energy, user runs the solution system of interest, the pure solvent system, and the isolated solute in vacuum.
The simulation of isolated solute is not necessary if the solute is treated as rigid.

For the solution and pure solvent systems, the system is first minimized with min.mdin and is then equilibrated in the NVT ensemble with nvt.mdin and further with npt.mdin in the NPT ensemble.
Similarly, the isolated solute is minimized and equilibrated with solute_min.mdin and solute_eq.mdin.
After the equilibration, get the output trajectory as the nc file.
All bond lengths may be constrained for speeding up calculation.
For detailed instruction consult the AMBER manual.

solution system

sander -i min.mdin -o solution_min.mdout -p etohsol.top -c etohsol.crd -r solution_min.crd -x solution_min.nc -inf solution_min.mdinfo -O 
pmemd -i nvt.mdin -o solution_nvt.mdout -p etohsol.top -c solution_min.crd -r solution_nvt.crd -x solution_nvt.nc -inf solution_nvt.mdinfo -O
pmemd -i npt.mdin -o solution_npt.mdout -p etohsol.top -c solution_nvt.crd -r solution_npt.crd -x solution_npt.nc -inf solution_npt.mdinfo -O
pmemd -i solution_run.mdin -o solution_run.mdout -p etohsol.top -c solution_npt.crd -r solution_run.crd -x solution_run.nc -inf solution_run.mdinfo -O

solvent system

sander -i min.mdin -o solvent_min.mdout -p water.top -c water.crd -r solvent_min.crd -x solvent_min.nc -inf solvent_min.mdinfo -O 
pmemd -i nvt.mdin -o solvent_nvt.mdout -p water.top -c solvent_min.crd -r solvent_nvt.crd -x solvent_nvt.nc -inf solvent_nvt.mdinfo -O
pmemd -i npt.mdin -o solvent_npt.mdout -p water.top -c solvent_nvt.crd -r solvent_npt.crd -x solvent_npt.nc -inf solvent_npt.mdinfo -O
pmemd -i solvent_run.mdin -o solvent_run.mdout -p water.top -c solvent_npt.crd -r solvent_run.crd -x solvent_run.nc -inf solvent_run.mdinfo -O

isolated solute system

sander -i solute_min.mdin -o solute_min.mdout -p etohiso.top -c etohiso.crd -r solute_min.crd -x solute_min.nc -inf solute_min.mdinfo -O 
sander -i solute_eq.mdin -o solute_eq.mdout -p etohiso.top -c solute_min.crd -r solute_eq.crd -x solute_eq.nc -inf solute_eq.mdinfo -O
sander -i solute_run.mdin -o solute_run.mdout -p etohiso.top -c solute_eq.crd -r solute_run.crd -x solute_run.nc -inf solute_run.mdinfo -O

In the input file, "ioutfm = 1" is specified.
This forces AMBER to write the "NetCDF" version of output trajectory.
The file extension of the trajectory needs to be specified as ".nc", which is necessary to tell erdst that the file is in the NetCDF binary format.

Example configuration files are available on https://sourceforge.net/projects/ermod/files/data_example/.
MD production runs are performed in NPT condition, over 100 ps for the solution system to take 10,000 snapshots with 10 fs interval, and over 50 ps for the pure solvent to take 500 snapshots with 100 fs interval. An MD of isolated solute is done over 50 ns in addition, to take 500,000 snapshots with 100 fs interval.

In the following example, the trajectory files for solution system, pure solvent system, isolated solute system are named solution_run.nc, solvent_run.nc, and solute_run.nc, respectively. The log files are named similarly with an extension of "mdout".

Generate input configuration for solvation free energy calculation

Input configuration file for erdst and slvfe will be generated via various helper scripts. First, execute the following command to generate system information:

(ERmod directory)/share/ermod/tools/AMBER/gen_structure --top etohsol.top

where (ERmod directory) is the directory where the ERmod programs are installed; the directory was specified with the option of --prefix of ./configure described in General build guide.
The option --top can be simplified to -t.
Upon execution, user is asked to state which molecule is the solute.
In the example case, the ethanol molecule is the solute; type type "1" or “ETO” (do not include quotation marks) to proceed.
After the execution, two directories are created, namely soln and refs.
Inside these directories, soln/MDinfo, soln/MolPrm1, soln/SltInfo, refs/MDinfo, refs/MolPrm1, and refs/SltInfo files are generated.
See Parameter files for erdst for detailed description of SltInfo, MolPrm1, and so on.

soln/MDinfo should read as

FRAME 2
1 1000
9 3

The second column of the first line indicates that there are 2 types of molecular species.
The second line states that the system contains 1 ethanol molecule and 1000 water molecules.
The last line states that 9 atoms are present within a single EtOH molecule and that 3 atoms are present within a single water molecule.

In the reference system (refs directory,) MDinfo file is also generated.
But the number of molecular species in refs/MDinfo is 1, since MDinfo in this case carries the information of the pure solvent system.

Both in soln and refs directories, MolPrm1 and SltInfo should be found.
MolPrm1 is the parameter file for solvent molecule, and SltInfo is the parameter file for solute. Both files contain the Lennard-Jones parameters and charge information, which are necessary in the calculation. See Parameter files for erdst for detailed description of MDInfo, SltInfo, MolPrm1, and so on.
Finally, there is an LJTable file in each directory, which specifies the interaction energy table in AMBER.

Now the parameters for the solution system are prepared.
Move to soln directory and run the helper script:

cd soln
(ERmod directory)/share/ermod/tools/AMBER/gen_input --traj ../solution_run.nc --log ../solution_run.mdout

The trajectory (.nc == netcdf file) and log files are fed to the script.
The options --traj and --log can be simplified to -x and -l, respectively.
After running the script, a symbolic link file, namely HISTORY, is to be created.
A file named parameters_er is also to be created; this file controls how the erdst program runs.
MDinfo is further to be updated; “FRAME” on the first line is replaced by the number of snapshots in trajectory.

Start the solute-solvent interaction calculation in the solution system with:

(ERmod directory)/bin/erdst

If the program runs successfully, there should be files named engsln.xx and aveuv.tt, where xx is a two-digit integer, besides several other outputs.
The former is the histogram of the solute-solvent pair interaction energy, while the latter is the averaged sum of the solute-solvent interaction energy.
The two-digit numbers indicate which fraction of the trajectory the file corresponds to.
For example, the trajectory is divided into 10 blocks.
engsln.01 is then the histogram obtained from the first one of the 10 blocks, and engsln.02 is from the second one of the 10 blocks.
Correspondingly, the output distribution functions are obtained from each of the 10 blocks.
The division into blocks is done to examine the convergence of the solvation free energy, as described in the end of Quick Start Guide ---- Running slvfe: Getting Final output

Then move on to refs directory to start calculating the reference part.

cd ../refs
(ERmod directory)/share/ermod/tools/AMBER/gen_input --traj ../solvent_run.nc --log ../solvent_run.mdout --flexible ../solute_run.nc
(ERmod directory)/bin/erdst

The first two arguments are to be taken similarly to the case of the solution (of course, the nc and log files are changed to those of pure solvent).
The third argument, --flexible, specifies that the test-particle insertion calculation is performed with flexible structures of the solute.
The option --flexible can be simplified to -s.
In addition to a symbolic link named HISTORY, a symbolic link file named SltConf is generated; HISTORY refers to the trajectory of the solvent, solvent_run.nc, and SltConf is linked to the trajectory file of the isolated solute, solute_run.nc.

After running erdst, user obtains the energy distributions for the insertion operation.
engref.xx is the histogram of the solute-solvent pair interaction energy at insertions in reference system, and corref.xx is the correlation matrix.
User may notice that there are 5 outputs of energy distribution functions.
This is because the reference trajectory is divided into 5 sections,
and the output distribution functions are obtained from each of the 5 blocks.

When two erdst runs are finished in the soln and refs directories, the solvation free energy can be calculated from the distribution functions obtained.
Go to Quick Start Guide ---- Running slvfe: Getting Final Output.

The trajectory is divided into a set of blocks both for soln and refs.
The default value of the number of divisions is 10 for the solution system and is 5 for the reference.
The number of divisions can be modified with the numdiv parameter in the parameters_er file when the erdst program is run.
See Parameter files for erdst for the description of the parameters which can be specified in parameters_er.

If the solute is rigid (its intramolecular structure is fixed), the procedure is simplified.
The scheme for the solution part is the same as above.
Note, of course, that the solution MD is supposed to be done using a rigid solute.
Further, it is not necessary to perform a simulation of isolated solute when the solute is rigid.
The use of rigid solute simplifies the operation of the reference part.
In the refs directory, run the helper script:

cd ../refs
(ERmod directory)/share/ermod/tools/AMBER/gen_input --traj ../solvent_run.nc --log ../solvent_run.mdout --rigid ../etoh.pdb
(ERmod directory)/bin/erdst

The difference from the flexible solute case is the third argument.
The third argument, --rigid, specifies that the test-particle insertion calculation is performed with fixed structure of the solute.
The option --rigid can be simplified to -r.
The structure of the solute is taken from the SltInfo file when the solute is rigid.
See Parameter files for erdst for the format of the SltInfo file in the case of rigid solute.
When the refs calculation is done, the calculation of solvation free energy of rigid solute can be done with the same procedure as that in the flexible case.

Wiki: FixedLigandBind
Wiki: QuickStartGuide
Wiki: build-Guide
Wiki: build-VMDPlugin
Wiki: parameters-erdst