Thread: [Cdk-commits] SF.net SVN: cdk:[12147] cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Revision: 12147
          http://cdk.svn.sourceforge.net/cdk/?rev=12147&view=rev
Author:   steinbeck
Date:     2008-09-03 16:12:08 +0000 (Wed, 03 Sep 2008)

Log Message:
-----------
Comments from meeting with Achim, Thomas and Chris

Modified Paths:
--------------
    cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex

Modified: cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex
===================================================================

--- cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex	2008-09-03 15:37:30 UTC (rev 12146)
+++ cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex	2008-09-03 16:12:08 UTC (rev 12147)
@@ -230,6 +230,23 @@
 %% Background %%
 %%
 \section*{Background}
+There are large Chemistry databases popping up in the public domain (cite pubchem, ChEMBL, ZINC) 
+and call for easily customisable tools to process them.
+
+Open Drug Discovery and Open Notebook Science
+
+Areas calling for chemoinformatics workflow support include
+\begin{itemize}
+  \item Chemical data filtering, transformation and migration workflows
+  \subitem testtesttesttest
+  \item Chemical information retrieval related workflows (structures, reactions, object relational data etc.)
+  \item Data analysis workflows (statistics, clustering, soft computing/computational intelligence, QSAR/QSPR/pharmacophore oriented workflows)
+\end{itemize}
+
+
+Why Open and why existing tools are not open?
+
+
 The workflow paradigm allows  scientists a flexible creation of generic
 workflows using different kind of data sources, filters and algorithms which fits the
 ever-changing demand of current research. Two open source tools were used to
@@ -246,6 +263,8 @@
 of different functionality. KNIME \cite{KNIMEWeb} is a open source modular data
 exploration platform which is licensed under the Aladdin free public license and is developed by the group of Michael Berthold at the University of Konstanz, Germany. KNIME is based on the open-source Eclipse
 platform. 
+
+For all these different scenarios a number of different worker are implemented.
 %ToDo Cite Wendy
 
 \section*{Implementation}
@@ -293,43 +312,36 @@
 from the given repository location. After a restart of Taverna the new
 installed plug-in is usable with its features.
 
+\subsection*{Iteration over large datasets}
 
+XXX Move section from below XXXX
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %% Results and Discussion %%
 %%
 \section*{Results}
 
-\subsection*{Potential CDK-Taverna Workflows}
-The chemoinformatics extension of Taverna leads to different potential workflow
-scenarios like:
-\begin{itemize}
-  \item Data filtering and transformation workflows
-  \item Data migration workflows
-  \item Chemical information retrieval related workflows (structures, reactions, object relational data etc.)
-  \item QSAR/QSPR/pharmacophore oriented workflows
-  \item Data analysis workflows (statistics, clustering, soft computing/computational intelligence etc.)
-\end{itemize}
-For all these different scenarios a number of different worker are implemented.
+\subsection*{Current Status}
 
-\subsection*{Current Worker allocation}
+XXX Table of implemented Workers, group by subject XXX
 The CDK-Taverna plug-in provides approximately 140 different worker. These
 worker are allocated into file IO (15), database IO (7), QSAR descriptors (76,
 42 molecular / 6 bond / 1 atom pair / 27 atomic), data analysis worker (17), SMILE tools (2) and
 miscellaneous (ca. 30) like substructure filter, aromaticity detection, atom
 typing and reaction enumeration.
 
-\subsection*{Database Back-End}
+We will exemplify some of the components as part of example workflows described below.
+
+\subsection*{Database I/O}
 The CDK-Taverna project decided to use the PostgresSQL \cite{PostgreSQLWeb}
 database with the open-source Pgchem::tigress \cite{PGChemWeb}extension. This
 combination allows the storing and querying of molecules on the database using an implementation of the GIST index
 of the PostgresSQL database.
 %TODO: Cite Postgres DB and PGChem:Tigres and add the version numbers
 
-\subsection*{Example Workflows}
-The first simple example workflow, a substructure search workflow, already shows
-a lot of the potential of such  workflow systems. 
 
-\subsubsection*{Substructure Workflow}
+\subsection*{Substructure Workflow}
+XXX split into CDK based and database based XXX
 The substructure workflow performs a topological substructure search on a list
 of given molecules and a given molecular substructure. (see
 figure~\ref{fig:substructureworkflow.ps}) The inputs of this workflow will be
@@ -349,7 +361,7 @@
   \label{fig:substructureworkflow.ps}
 \end{figure*}
 
-\subsubsection*{QSAR Calculation Workflow}
+\subsubsection*{Descriptor Calculation Workflow}
 This more complex example (see figure~\ref{fig:QSARWorkflow})loads its molecules from a PostgresSQL database. 
 The perception of the atom types is the next step, after loading the molecules
 from the database. In the following steps of the workflow each molecule gets
@@ -382,6 +394,9 @@
 
 
 \subsubsection*{Iterative QSAR Workflow}
+
+XXX move up to technical part XXX
+
 The iterative QSAR workflow is more or less a hack which allows the user to
 handle many thousands of molecules. The Taverna architecture do not support
 things like for or while loops. That's the reason why this detour is so important for


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.




Thread: [Cdk-commits] SF.net SVN: cdk:[12147] cdk-taverna-paper/trunk/cdk-taverna/bmc_article.tex

cdk-commits