ERmod Wiki

Approximate solvation free energy calculator

Brought to you by: drmaruyama, e1kojima, kenkasa, nbykmtb, ssakuraba

ProteinSolvation

Free energy calculation of protein solvation with explicit solvent

Since the protein is a large solute, its solvation free energy calculation requires special treatment. Below is the list of cares required to calculate the solvation free energy calculation.

Recommended to read:
Free-energy analysis of hydration effect on protein with explicit solvent: Equilibrium fluctuation of cytochrome c, J. Chem. Phys., 134, 041105 (2011).

System setup and water box size

The box size of the simulation system needs to be more than twice of the protein. Within the periodic boundary condition, the box size of twice the solute size is the minimum required to make the energetics right.

The counterions are often introduced to neutralize the system in MD. They are usually dilute and sweep the configuration space quite slowly. In order to reduce the computation time, it is recommended that counterions not be placed in the system. The system can be apparently non-neutral, however, the effect of the apparent non-neutrality is handled by the self-energy correction scheme in the Ewald and PME methods. This correction scheme is implemented in slvfe as default when the periodic boundary condition is employed. To turn off the correction, set "slfslt='not'" in parameters_fe; see Parameter files for slvfe for detailed description.

Parameter setup

An advantage of the energy-representation method is to allow the molecular flexibility of the solute (and solvent) to be incorporated without any special treatment. When the protein is the solute, however, this advantage may not be effective since the protein structure is too different between solution and vacuum and the structure in vacuum is not informative. In the free-energy calculation of protein, we recommend to fix the structure. In a protein simulation, furthermore, the system size is comparable to the protein size. In this case, the self-energy term of protein depends strongly on the protein orientation when the Ewald or PME is used. The self-energy is the interaction energy of the solute with its own images and the neutralizing background. The orientation dependence of the protein self-energy is a problem of protein MD itself, and is a finite-size effect. In the free-energy calculation, the focus is the solute (protein)-solvent interaction effect. We thus recommend that the finite-size effect be removed in the MD and free-energy analysis. To do so, it is advised to fix the protein orientation in MD and erdst and use an option of insorient = 1 or inscfg = 1 (deprecated) in parameters_er.

insorient : orientation for the solute
0 (default) : random orientation
1 : no orientation change from the value in the file to be read
The file for the solute configuration is SltInfo when slttype = 2 and is SltConf when slttype = 3

inscfg : position and orientation for the inserted solute
0 : only the intramolecular configuration is from the file (default)
1 : orientation is fixed from the file with random position
2 : position and orientation are also fixed from the file
inscfg can be still used in ver 10 and converted to insorient (and insposition if necessary)

It is also advised to reduce the number of particle insertions in the refs calculation. The number of insertions per a single solvent configuration is given by the maxins parameter in parameters_er and its default value is 1000. When the solute is a protein, its size is usually not very small compared to the system size and a large maxins is not useful to improve the statistics. Separated regions of the solvent system provide independent dataset upon insertion, and many datasets are obtained with random insertion. When the solute is large, though, this machanism for improving the statistics is less effective. A typical setting is maxins = 200 when a protein is the solute.

The default mesh size defined in the &hist section of parameters_er can be modified. The bin sizes are set to be small around zero energy since there are a lot of solvent molecules that are separated from the solute. When the solute is a protein, its size is comparable to the system size, and the bin sizes can be set differently. A typical set reads

&hist
      eclbin=5.0e-2, ecfbin=2.5e-2, ec0bin=1.0e-3, finfac=10.0e0,
      ecdmin=-50.000000, ecfmns=-1.00e0, ecdcen=0.0e0, eccore=30.0e0,
      ecdmax=1.0e11, pecore=300
/

When the slvfe calculation is done, in addition, the warning condition for mesh error can be relaxed. The mesherr parameter within parameters_fe identifies the threshold value to give a warning message of mesh error; its default value is 0.1 kcal/mol. When the solute is a protein, it is typically enough to set mesherr = 1.0 in parameters_fe.

See Parameter files for erdst for the description of the parameters in parameters_er and Parameter files for slvfe for those in parameters_fe

Self-energy range: `selfcal`, `peread`, and `EcdInfo`

The following description is concerned with the energy meshes for the protein self-energy; these meshes are used for constructing the distribution function slfeng.XX for the self-energy. As of ver 0.3.alpha5, slfeng.XX is not computed unless a newly introduced key selfcal is set to 1, while its default value is 0. The following note thus needs be taken into account only when selfcal = 1 and the distribution function slfeng.XX for the protein self-energy is to be constructed.

The self-energy of protein can be quite large. In this case, it is necessary to set the energy meshes separately for the protein self and the protein-solvent (water) interactions. By conducting a short run of ermod of the solution system, user will get the uvrange.tt output. See the next paragraph for the format of uvrange.tt. By looking at this output, user can prepare a separate set of meshes for the protein self and the protein-solvent (water) interactions by setting "peread=1" in parameters_er file and preparing an EcdInfo file. When "peread=1" is set and the EcdInfo file is used, user also has to set "peread='yes'" in parameters_fe file when running slvfe. See the end of Parameter files for erdst for the format of the EcdInfo file.

The format of uvrange.tt reads

 species     minimum        maximum
    0     -176.08186       -176.07186
    1      -30.66799         12.66799
    2      -13.02511          5.66799

The species 0 corresponds to the self-enegy of the solute (protein). Shown are the minimum and maximum of the solute self-energy. The species 1 is the 1st solvent species, and the minimum and maximum of the pair energy of the solute and the 1st solvent species are shown. Similarly for the species 2 and so on.

In the short erdst run of the solution system to obtain uvrange.tt, it is advised to set "engdiv=1" in parameters_er. To run a short erdst, user may simply change the first entry of the first line of MDinfo, i.e., the number of snapshots treated in the single job. It is further advised to set ecdmin very small, for example to -200. When user still has an error message of

energy of -212 for 0-th species

ecdmin is made further small and user runs the short run of erdst again. Do not forget that when the short run of erdst is finished and an appropriate value is found for the ecdmin parameter, the parameters in the MDinfo and parameters_er files need to be changed back to the original ones (except for ecdmin).

If "peread=1" is not set in parameters_er and an EcdInfo file is not prepared, ecdmin in parameters_er needs be smaller than about -176. With this setting, the computed energy distribution functions are simply 0 between about -176 and -30. This is a waste of data storage. It is thus recommended to prepare separate energy meshes for protein self and protein-solvent. For protein self, it is necessary to prepare meshes only around -176 in the above example of uvrange.tt

EcdInfo can be something like

species  eclbin ecfbin ec0bin finfac  ecmin ecfmns eccen eccore ecmax pecore 
    0     1.0e-3 1.0e-3 1.0e-3 1.0e0   -179.0 -177.5 -176.5 -175.0

The energy is meshed for species 0 (self-energy of protein) according to EcdInfo. For the species not listed in EcdInfo, the energy is meshed with the &hist section of parameters_er.

In summary, to set up the energy meshes for protein self energy, perform a short erdst run, get the self-energy values, and prepare an EcdInfo with a margin of 1 or 2 to the computed values of self energy. Since the self energy is almost constant, it is all right to set eclbin = ecfbin = ec0bin (= 1.0e-3 in the above case). In the above example, ecdmin = -179.0 and eccore = 175.0. ecfmns and eccen can be simply taken to be the numbers partitioning the ecdmin and eccore values at equal intervals.

Wiki: Home
Wiki: MigrationGuide0203
Wiki: parameters-erdst
Wiki: parameters-slvfe