## Diff of /cdk-taverna-2-paper/bmc_article.tex[r15595] .. [r15596]  Maximize  Restore

### Switch to side-by-side view

--- a/cdk-taverna-2-paper/bmc_article.tex
+++ b/cdk-taverna-2-paper/bmc_article.tex
@@ -270,7 +270,7 @@
% TODO Andreas: Bitte ausfüllen, auch Vergleich mit CDK-Taverna 1.0 worker
%A CDK-Taverna 2.0 worker implements
\subsection*{CDK-Taverna 2.0 worker implementation}
- Here are our focus was to make the CDK-Taverna 2.0 plugin  easily expandable. Our implementation allows for the creation of workers by just inheriting from a single abstract class \texttt{org.openscience.cdk.applications.taverna.AbstractCDKActivity}, is which is the analogue to the CDKLocalWorker interface of the CDK-Taverna version 1.0. This class provides all the  necessary data to the underlying worker registration mechanism, freeing the developer from handling these tasks. Methods which need to be overwritten in order to implement a worker are:
+ Here are our focus was to make the CDK-Taverna 2.0 plugin easily expandable. Our implementation allows for the creation of workers by just inheriting from a single abstract class \texttt{org.openscience.cdk.applications.taverna.AbstractCDKActivity}, is which is the analogue to the CDKLocalWorker interface of the CDK-Taverna version 1.0. This class provides all the necessary data to the underlying worker registration mechanism, freeing the developer from handling these tasks. Methods which need to be overwritten in order to implement a worker are:
\begin{itemize}
@@ -335,7 +335,9 @@
\item \texttt{M5P regression trees}
\end{itemize}

-For attribute analysis are two workers available \texttt{GA Attribute Selection} \texttt{Leave-one-out Attribute Selection}. The former worker uses an genetic algorithm implementation to find an optimal set of attribute curations. Therefor each individual has a different subset of attributes enabled. The evaluation is performed either with the complete dataset nor by using a n-fold cross-validation. The used fitness function is $s(RMSE)=\left(\frac{1}{RMSE}\right)^{2}$ where RMSE is the root mean squared error of the evaluated subset. A mutation is made up of switching attributes on or off and cross-over is performed by interchanging the states of regions of attributes. The second worker uses a Leaf-one-out' strategy for evaluating the peerformance of each attribute. Therefor attributes are consecutively leaved out. The worst performing attribute is discarded. This step is repeated until only a single attribute survives. The result should give an impression on how important the different attributes are. Figure \ref{fig:LeaveOneOutResults} shows the results from a Leaf-one-out' analysis and figure \ref{fig:LeaveOneOutWorkflow} the related workflow.
+For attribute analysis are two workers available \texttt{GA Attribute Selection} \texttt{Leave-one-out Attribute Selection}. The former worker uses an genetic algorithm implementation to find an optimal set of attribute curations. Therefor each individual has a different subset of attributes enabled. The evaluation is performed either with the complete dataset nor by using a n-fold cross-validation. The used fitness function is $s(RMSE)=\left(\frac{1}{RMSE}\right)^{2}$ where RMSE is the root mean squared error of the evaluated subset. A mutation is made up of switching attributes on or off and cross-over is performed by interchanging the states of regions of attributes.
+
+The second worker uses a Leaf-one-out' strategy for evaluating the peerformance of each attribute. Therefor attributes are consecutively leaved out. The worst performing attribute is discarded. This step is repeated until only a single attribute survives. The result should give an impression on how important the different attributes are for the underlying machine learning problem. Figure \ref{fig:LeaveOneOutResults} shows the results from a Leaf-one-out' analysis and figure \ref{fig:LeaveOneOutWorkflow} the related workflow.

The plugin provides the \texttt{Split Dataset Into Train-/Testset} worker for splitting the data into a training set and a test set. There are three different algorithms available:
\begin{itemize}
@@ -343,7 +345,9 @@
\item \texttt{Cluster Representatives}
\item \texttt{Single Global Max}
\end{itemize}
-As the name implies the first algorithm splits the data randomly. The second algorithm first clusters the data whereas the cluster number is set to the number of training set datapoints. Afterwards one datapoint from each cluster is randomly inserted in the training set. The rest is transferred into the test set. The last algorithm starts like the former one. Then the worst predictiv datapoint in the test set is evaluated and exchanged with the corresponding datapoint in the trainings set. This step is repeated for a predefined number of iterations. Figure \ref{fig:RegressionSplitWorkflow} shows a workflow using the described worker. Also the \texttt{Weka Regression} worker is utilised which generates the machine learning models. The worker delivers many configuration possibilities which are shown in figure \ref{fig:RegressionConfUI} and figure \ref{fig:RegressionVisualisation} shows the capabilities for result visualisation of the plugin.
+As the name implies the first algorithm splits the data randomly. The second algorithm first clusters the data whereas the number of cluster is set to the number of training set datapoints. Afterwards one datapoint from each cluster is randomly inserted in the training set. The rest is transferred into the test set. The last algorithm starts like the former one. Then the worst predicted datapoint in the test set is evaluated and exchanged with the corresponding datapoint in the trainings set. For Regression tasks the evaluation function is the root mean square error of the dataset for classification the evaluation function is $s(\underline{d})=1-MAX(\underline{d})$ where $\underline{d}$ is the likelihood distribution of the assigned classes. The evaluation step is repeated for a predefined number of iterations.
+
+ Figure \ref{fig:RegressionSplitWorkflow} shows a workflow using the described worker. The \texttt{Weka Regression} worker is used to generate the machine learning models for later evaluation and visualisation of the produces sets by the \texttt{Evaluate Regression Results as PDF} worker. The worker \texttt{Weka Regression} delivers many configuration possibilities which are shown in figure \ref{fig:RegressionConfUI} and figure \ref{fig:RegressionVisualisation} shows the capabilities for result visualisation of the plugin.

% TODO Andreas: Bitte ausfüllen.
Last but not least parallel computing with multicore processors is supported by multiple-threading workers where possible. Many operations scale nearly linearly with the number of cores. Exspecially the machine learning components and the QSAR calculation workers benefit from the parrallel computation. Table \ref{tab:ThreadedWorker} gives an overview of all workers which are capable of using multiple threads for their calculations.