[R-gregmisc-users] SF.net SVN: r-gregmisc: [948] trunk/PathwayModeling/thesispaper

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Revision: 948
Author:   r_burrows
Date:     2006-03-25 06:41:09 -0800 (Sat, 25 Mar 2006)
ViewCVS:  http://svn.sourceforge.net/r-gregmisc/?rev=948&view=rev

Log Message:
-----------
minor edits

Modified Paths:
--------------
    trunk/PathwayModeling/thesispaper/discussion.Snw
    trunk/PathwayModeling/thesispaper/introduction.Snw
    trunk/PathwayModeling/thesispaper/methods.Snw
    trunk/PathwayModeling/thesispaper/results.Snw
Modified: trunk/PathwayModeling/thesispaper/discussion.Snw
===================================================================

--- trunk/PathwayModeling/thesispaper/discussion.Snw	2006-03-20 17:31:46 UTC (rev 947)
+++ trunk/PathwayModeling/thesispaper/discussion.Snw	2006-03-25 14:41:09 UTC (rev 948)
@@ -9,11 +9,10 @@
 these models can be difficult because the pathways are described by sets 
 of reaction rate equations with overlapping sets of parameters. We have 
 found that the MCMC algorithms handled this situation well, and produced 
-reasonable joint probability densities for the model parameters. In 
-addition to point estimates for the parameters, this output can be used 
+reasonable joint probability densities for the model parameters. 
+This output can be used 
 for the estimation of confidence intervals for the parameters and the 
-detection of correlations and multimodality. This additional information 
-cannot be obtained with any other method in routine use.
+detection of correlations and multimodality. Thus MCMC compares favorably with maximum likelyhood methods that produce point estimates of the parameters and nonlinear regression methods that find approximations of the parameters and their variances.
 
 The models used here have the form of the Hill function, 
 $\frac{x^n}{\theta^n+x^n}$, with an exponent of 1. This form was chosen 
@@ -36,7 +35,7 @@
 Normal Kernel Coupler, is more computationally intensive but it can sample 
 from complex densities more efficiently than the other two algorithms.
 
- All four algorithms produced very similar results but the execution times 
+ All three algorithms produced very similar results but the execution times 
 varied considerably depending on the size of the model and the algorithm. 
 The all-component Metropolis and NKC algorithms converged fairly easily 
 but the 1-component algorithm was quite slow. This is undoubtably due to 
@@ -53,7 +52,7 @@
 also demonstrate the usefulness of MCMC simulation. The derived
 probability densities can be used to find good estimates for the
 parameters as well as reveal the correlations between the parameters.
-The result is a well-fitted model with realistic confidence intervals.
+The result is a well-fitted model with realistic confidence intervals (Figure~\ref{fitted}).
 
 It was of interest to see if the calculated parameter values could be used 
 to infer values for the underlying rate constants by fitting a simple 

Modified: trunk/PathwayModeling/thesispaper/introduction.Snw
===================================================================
--- trunk/PathwayModeling/thesispaper/introduction.Snw	2006-03-20 17:31:46 UTC (rev 947)
+++ trunk/PathwayModeling/thesispaper/introduction.Snw	2006-03-25 14:41:09 UTC (rev 948)
@@ -5,20 +5,21 @@
 \end{verbatim}
 
 We know a great deal about the individual components of living
-organisms but much less about how the properties of these components
-affect the systems of which they are a part. Our ignorance in this
+organisms but much less about the properties of the systems
+ of which they are a part. Our ignorance in this
 regard prevents us from developing a deeper theoretical understanding
 of living systems and from developing more effective methods for
 altering the system behavior in beneficial ways.
 
 In order to gain some insight into the properties of biological
-systems, we would like to use quantitative models of sequences of
-reactions.
+systems, we would like to use quantitative models that describe the processes at the molecular level.
  Specifically, we would like to develop methods that can be used to
-deduce the kinetic parameters of the enzymes in a pathway. With this
+deduce the kinetic parameters of enzyme-catalyzed reactions. With this
 information, it should be possible to identify the most effective
 control points in a pathway and modify the pathway behavior in
-predictable ways. For example, there are thought to be four mechanisms
+predictable ways.
+
+An example of how a model of a biochemical pathway could be used is in the elucidation of a mechanism for the pathology seen with diabetes. There are thought to be four mechanisms
 that explain hyperglycemic toxicity in diabetes
 \cite{Oates02,Brownlee01}. One mechanism is illustrated in
 Figure~\ref{glycolysis}.
@@ -37,14 +38,13 @@
 these conditions. The inhibition of this
 enzyme then increases the concentration of glyceraldehyde-3-phosphate
 (G3P) and dihydroxyacetone phosphate
-(DHAP) which breaks down to form methylglyoxal. To test this
+(DHAP) which breaks down to form methylglyoxal, an active aldehyde which readily reacts with and inactivates proteins. To test this
 hypothesis we need to be able to answer
 questions such as: Is the observed inhibition of this enzyme by 50\% sufficient
 to explain the observed increase in methylglyoxal concentration? If
 not, then what other mechanisms might be operative? What is the most
 effective way to modulate these reactions?
-We address these questions by developing analytical methods that will allow us to characterize
-the intact systems at the molecular level.
+We address these questions by developing analytical methods that will allow us to characterize these types of biochemical systems at the molecular level.
 
 We will use a Bayesian modeling approach with fitting via Markov chain
 Monte Carlo (MCMC) simulation
@@ -83,13 +83,6 @@
 behavior of the pathway as a single system. The MCMC methods described
 here are very useful for analyzing this type of data.
 
-The MCMC results are compared to
-parameter estimates found by least squares fitting of the models to
-the data. The modes of the MCMC parameter distributions are
-essentially identical to the least squares parameter values, and this
-is taken as evidence that the MCMC simulation is producing valid
-estimates.
-
 MCMC simulation was found to be a very useful method for fitting
 systems of equations to data. It is capable of fitting several
 equations simultaneously, and does not require an excessive number of

Modified: trunk/PathwayModeling/thesispaper/methods.Snw
===================================================================
--- trunk/PathwayModeling/thesispaper/methods.Snw	2006-03-20 17:31:46 UTC (rev 947)
+++ trunk/PathwayModeling/thesispaper/methods.Snw	2006-03-25 14:41:09 UTC (rev 948)
@@ -11,7 +11,7 @@
     \cite{Gillespie77}. In this paper we present the results for a 
     5-reaction sequence that is sampled at 12, 16, or 25 time points 
     with 3 replicates at each time point. 
-<<data, echo=F, eval=T>>=1
+<<data, echo=F, eval=F>>=1
 # gillespie.out - output from the Gillespie simulation
 source('R/getPaperData.R')
 source('R/getPaperRates.R')
@@ -58,8 +58,7 @@
     For the 5-reaction sequence of reactions 
     the perturbation of $R1$ at $time = 20$ results in the time
     courses plotted in Figure~\ref{pulse}.
-
-<<echo=F,eval=T>>=2
+<<echo=F,eval=F>>=2
 source("R/pulse.R")
 rawdata <- read.table("data/rawdata.dat",header=T)
 attach(rawdata)
@@ -78,7 +77,6 @@
     \end{figure}
     For each time point there are 5 values for the reactant
     concentrations and 5 values for the estimated reaction rates.
-
     The data at each time point was generated by sampling 3 values for
     each reactant from a normal distribution centered on the Gillespie
     output and with a standard deviation of 50. Three sets of time
@@ -222,16 +220,9 @@
     The performance of a MCMC simulation
     \cite{Metropolis53,Hastings70} depends heavily on the
     method used to find candidate points to add to the Markov chain
-    [$q(\cdot |\cdot)$]. A
-    standard proposal distribution for the continuous problem is a
-    multivariate normal distribution. However,
-    the size of the space to be sampled grows exponentially with the
-    number of dimensions and if the joint density is multimodal,
-    finding all the modes can be a problem. For this reason proposal
-    distributions have been developed that result in more efficient
-    sampling than a simple one-component-at-a-time sampler.
-    
-    Three algorithms are implemented in the Hydra
+    [$q(\cdot |\cdot)$]. 
+    For this reason different methods have been developed to increase the efficiency of sampling from the sample space. We have used 
+    three algorithms from the Hydra
     library \cite{Hydra}:
      component-wise Metropolis, all-components Metropolis, 
     and Normal Kernel Coupler \cite{Warnes00}.

Modified: trunk/PathwayModeling/thesispaper/results.Snw
===================================================================
--- trunk/PathwayModeling/thesispaper/results.Snw	2006-03-20 17:31:46 UTC (rev 947)
+++ trunk/PathwayModeling/thesispaper/results.Snw	2006-03-25 14:41:09 UTC (rev 948)
@@ -12,15 +12,15 @@
 The marginal distributions for the parameters were mound-shaped 
 (Figure~\ref{densities}). They
 generally were skewed to the right and had thicker
-tails than normal distributions. /textit{The modes of the distributions 
+tails than normal distributions. The modes of the distributions 
 turned out to be poor estimates of the maximum likelihood values for 
 the parameters. The maximum likelihood values were all larger than 
 the modes, and the difference increased as the position of the reaction 
 in the sequence increased. Correlated pairs of parameters 
 (e.g., $d_1$ --$d_2$ and $d_3$ --$d_4$, Figure~\ref{scatter}) were similar relative distances from their 
-modes. The significance of all this is not clear at this point.}
+modes. The significance of all this is not clear at this point.
 
-<<fig4,echo=F,eval=T>>=3
+<<fig4,echo=F,eval=F>>=3
 source("R/paperSSQ.R")
 source("R/getModes.R")
 source("R/do.optim.R")
@@ -62,8 +62,7 @@
 \end{figure}
 The effect of the number of data points on the parameter distributions
 can be seen in Figure~\ref{converged}.
-
-<<fig5,echo=F,eval=T>>=4
+<<fig5,echo=F,eval=F>>=4
 source("R/plotDensity.R")
 source("R/plotConverged.R")
 output12 <- read.table("data/output12.AllComp.thinned")
@@ -86,15 +85,14 @@
   \label{converged}
 \end{figure}
 There is some narrowing of the distributions as the number of points
-increases from 16 to 25 but it is not pronounced. \textit{Overall, the 
+increases from 16 to 25 but it is not pronounced. Overall, the 
 reduction in width varied from 18-fold for $d_9$ and 9-fold for $d_1$ to 
-1.5-fold for $d_4$}.
-
+1.5-fold for $d_4$.
 Figure~\ref{scatter} is an example of a bivariate scatterplot of the
 distributions. There is correlation between some pairs of parameters, e.g.,
 $d_1$ -- $d_2$, but no evidence of multi-modality.
 
-<<fig6, echo=F,eval=T>>=5
+<<fig6, echo=F,eval=F>>=5
 source("R/paperPairs.R")
 mcmcML <-scan("figures/tempDir/mcmcML.dat")
 output16 <- read.table("data/output16.AllComp.thinned")
@@ -127,7 +125,7 @@
 is shown in Figure~\ref{fits}. Quantitative measures of the fits for
 all the algorithms are given in Table~\ref{MSq}.
 
-<<fig7, echo=F, eval=T>>=6
+<<fig7, echo=F, eval=F>>=6
 source("R/paperSSQ.R")
 source("R/getModes.R")
 source("R/do.optim.R")
@@ -147,12 +145,12 @@
 @
 \begin{figure}
   \centering
-  \subfloat[v2]{\includegraphics[scale=0.35]{figures/tempDir/V1fitted}}
-  \subfloat[v3]{\includegraphics[scale=0.35]{figures/tempDir/V2fitted}}\\
-  \subfloat[v4]{\includegraphics[scale=0.35]{figures/tempDir/V3fitted}}
-  \subfloat[v5]{\includegraphics[scale=0.35]{figures/tempDir/V4fitted}}
+  \subfloat[v2]{\includegraphics[scale=0.33]{figures/tempDir/V1fitted}}
+  \subfloat[v3]{\includegraphics[scale=0.33]{figures/tempDir/V2fitted}}\\
+  \subfloat[v4]{\includegraphics[scale=0.33]{figures/tempDir/V3fitted}}
+  \subfloat[v5]{\includegraphics[scale=0.33]{figures/tempDir/V4fitted}}
   \caption{Curves fit to the 16-point data with the all-components
-  algorithm}
+  algorithm. The green curves are drawn with the maximum likelihood estimates for the parameters found with the L-BFGS-B algorithm \cite{Byrd95} as implemented in the \textit{R optim()} function. }
   \label{fits}
 \end{figure}
 
@@ -176,7 +174,7 @@
 
 Rates of convergence are illustrated in Figure~\ref{SSQvsIter}. The
 mean sums of squared residuals (SSQ) are plotted vs. the number of
-likelihood evaluations for the three algorithms. \textit{The relatively 
+likelihood evaluations for the three algorithms. The relatively 
 long convergence time for the 1-component Metropolis algorithm is a result 
 of the high correlations between some of the parameters (Figure~\ref{scatter}).
 With the 1-component Metropolis algorithm, one parameter is
@@ -184,9 +182,9 @@
 searches the parameter space by moving parallel to the axes and thus 
 it can sample a density with a diagonal axis of symmetry very slowly. 
 This is not a problem with the all-component Metropolis and NKC 
-algorithms since they update all the parameters at each iteration}.
+algorithms since they update all the parameters at each iteration.
 
-<<fig8, echo=F,eval=T>>=7
+<<fig8, echo=F,eval=F>>=7
 source("R/getMSDvsEvals.R")
 source("R/plotMSD16.R")
 source('R/paperSSQ.R')


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.