This will be replaced by a comma in the final article. Ensure there are no trailing spaces at the ends of the lines.

\title{A Benchmark Study of the CDK Fingerprints}

\begin{abstract}
\paragraph*{Background:} Text for this section of the abstract.
\paragraph*{Results:} Text for this section of the abstract \ldots
\paragraph*{Conclusions:} Text for this section of the abstract \ldots
\end{abstract}

\section*{Background}
Binary fingerprints are bit string representations of molecular structures and come in a variety of types. In the most common type, each bit of the fingerprint corresponds to a specific substructural feature (say an aromatic ring or an aldehyde group). Other forms of fingerprints include hashed fingerprints and atom environment fingerprints. While these representations were initially designed for similarity searching in databases, they have been become an important component of virtual screening pipelines. That this is possible, is due to the similarity principle''\cite{Martin:2002ab}, underlying much of virtual screening in drug discovery scenarios, which states that similar molecules will have similar activities. While there have been many counter-examples\cite{Maggiora:2006aa}, this approach has been fruitful in a number of cases. A number of fingerprint implementations are available from commercial vendors and a few from academic groups. The Chemistry Development Kit (CDK) is an Open Source Java library\cite{Steinbeck:2003bh,Steinbeck:2006aa} for cheminformatics and provides several fingerprint implementations. More specifically, it provides two structural key type fingerprints and two hashed fingerprints. While the library has been used in a number of projects, there has been no formal testing of how well the CDK fingerprints perform in a virtual screening scenario. It should be noted that the two structural key fingerprints are implementations of well studied schemes (MACCS\cite{Durant:2002aa} and EState keys) and their performance is well known. However the two hashed fingerprints, while based on the well known Daylight specification, have never been formally benchmarked. The goal of this study is to compare the performance of the CDK hashed fingerprints to other well known fingerprint types. %%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Results and Discussion %% %% \section*{Results and Discussion} \subsection*{Enrichment curves} \subsection*{Information content} \subsection*{Effect of fingerprint size} \subsection*{Influence of ring systems} %%%%%%%%%%%%%%%%%%%%%% \section*{Conclusions} Text for this section \ldots %%%%%%%%%%%%%%%%%% \section*{Methods} \subsection*{Fingerprints} Summary of how CDK FP's are calculated. In addition to the path based fingerprints described above, we also considered structural key type fingerprints. In these types, each bit positions corresponds explicitly to a substructural features. In this study we employed the MACCS 166 bit keys\cite{Durant:2002aa} (implemented in the CDK) and the BCI 1052 bit keys\cite{Barnard:1997aa}. Finally, we also considered atom environment fingerprints, specifically the extended connectivity fingerprints (ECFP) as implemented in Pipeline Pilot (Scitegic, Inc.). These types of fingerprints characterize each atom in terms of the environment around it, usually going up to 6 or 8 bonds from the atom in question. The ECFP's characterize the atoms using features such as hydrogen bonding donor capability, lipophilicity and so on. In this study we considered the ECFP-6 type, which considers atoms up to 6 bonds away from a central atom. \subsection*{Measures of effectiveness} Use of enrichment curves, enrichment factors. Note that they are not the best of measures\cite{Bender:2005aa,Truchon:2007aa,Nicholls:2008aa,Clark:2008aa}. Use of ROC curves and AUC \subsection*{Time efficiency} \subsection*{Benchmark Datasets} A number of datasets have been employed for benchmarking fingerprint methods including ZINC\cite{Irwin:2005aa} and the MDL Drug Discovery Report (MDDR). For the purposes of this study we employed the 17 virtual screening benchmark datasets described by Rohrer and Baumann\cite{Rohrer:2008ab}, collectively termed the Maximum Unbiased Validation (MUV) datasets. These datasets are derived from PubChem bioassays, each dataset corresponding to a specific bioassay. Examples of the targets considered by these datasets include FXIa inhibitors, FXIIa inhibitors, SF1 and HIV RT-RNase inhibitors. More broadly, the datasets cover several target classes including proteases, GPCR's, kinases and nuclear receptors. These datasets were constructed to specifically avoid the problem encountered with other datasets, namely, that many datasets lend an unfair advantage for 2D methods over 3D methods. More specifically, the actives in each of the datasets exhibit a wide variety of scaffold classes, thus avoiding the problems of analog bias\cite{Good:2008aa} and artificial enrichment\cite{Verdonk:2004aa}.

\section*{Authors contributions}

\section*{Acknowledgements} All graphics must be submitted separately and NOT included in the Tex document.

\section*{Figures}
\subsection*{Figure 1 - Sample figure title}
A short description of the figure content should go here.
\subsection*{Figure 2 - Sample figure title}
Figure legend text. This is where the description of the table should go.

\subsection*{Table 2 - Sample table title}
Large tables are attached as separate files but should still be described here.

\section*{Additional Files}
\subsection*{Additional file 1 --- Sample additional file title}
Additional file descriptions text (including details of how to view the file, if it is in a non-standard format or the file extension). This might refer to a multi-page table or a figure.
\subsection*{Additional file 2 --- Sample additional file title}
Additional file descriptions text.