Grid Deconvolution README
# ------------------------------------------------------------------------------
# Overview
# ------------------------------------------------------------------------------
This is a software library to run the JCVI barcode deconvolution pipeline using
Sun Grid Engine, or optionally without the use of a grid.
This software deconvolves a FASTA, FASTQ or SFF file based on the barcode
sequences, as determined by running fuzznuc to find the best hits of read to
barcode sequences. The results are a report of trim points, and a list of
barcode fasta files with each entry representing a read sequence that has its
unambiguous best hit to the bar code. The bar code sequences are trimmed by
default, unless otherwise specified.
This software is open-source and available free of charge subject to the GNU
General Public License, version 2.
# ------------------------------------------------------------------------------
# Getting Started
# ------------------------------------------------------------------------------
# 1. Checkout Grid and FileIO libraries from deconvolver svn repository
svn co https://deconvolver.svn.sourceforge.net/svnroot/deconvolver deconvolver
# 2. Run the test script and verify test complete successfully
deconvolver/Grid/trunk/t/grid_deconvolve.t
# 3. Read the help menu for the deconvolution command-line API
Grid/trunk/bin/grid-deconvolve.pl --help
# ------------------------------------------------------------------------------
# Standard mode for custom DNA barcode protocols
# ------------------------------------------------------------------------------
The standard mode for grid deconvolution will search for the barcode pattern in
both strands on the entire sequence of the input fragment data sets at an
optionally provided number of allowable mismatches to the pattern.
# ------------------------------------------------------------------------------
# sfffile_mode
# ------------------------------------------------------------------------------
Grid/bin/grid-deconvolve.pl --sfffile_mode <OPTIONS>
The release of the deconvolution code includes an "sfffile_mode" option that
enables the deconvolution process to work similarly to the sfffile deconvolver
for any input dataset (.sff, .fasta, .fastq). This option strictly searches for
the key sequence adjacent to the barcode in the 5’-to-3’ direction.
This option was added to provide a single software solution to handle both
standard Roche MID barcoded data, and to handle custom barcode design protocols.
Running in sfffile_mode does the following
1) Automatically sets and/or validates the key sequence
a. For .sff files, it sets the key sequence using sffinfo, or validates a
user-provided key sequence.
b. For non-sff files, it requires that the user provide the key sequence.
2) Searches for barcode pattern only on positive strand on input sequence files.
Note that there may be slight differences between the results of running sfffile
versus grid-deconvolve.pl in sfffile_mode. The key differences are the sfffile
tolerates gaps and does not count them against the allowable mismatch threshold,
and sfffile looks for a sequence alignment only at the very beginning of the
read sequence. grid-deconvolve.pl uses fuzznuc from Emboss tools to search the
entire string for an allowable match.
There is a test case included in the repository to validate sfffile_mode
Grid/t/grid_deconvolve_sfffile_mode.t
# ------------------------------------------------------------------------------
# Author
# ------------------------------------------------------------------------------
Nelson Axelrod
J. Craig Venter Institute
http://www.jcvi.org
naxelrod@jcvi.org