Grid Deconvolution README # ------------------------------------------------------------------------------ # Overview # ------------------------------------------------------------------------------ This is a software library to run the JCVI barcode deconvolution pipeline using Sun Grid Engine, or optionally without the use of a grid. This software deconvolves a FASTA, FASTQ or SFF file based on the barcode sequences, as determined by running fuzznuc to find the best hits of read to barcode sequences. The results are a report of trim points, and a list of barcode fasta files with each entry representing a read sequence that has its unambiguous best hit to the bar code. The bar code sequences are trimmed by default, unless otherwise specified. This software is open-source and available free of charge subject to the GNU General Public License, version 2. # ------------------------------------------------------------------------------ # Getting Started # ------------------------------------------------------------------------------ # 1. Checkout Grid and FileIO libraries from deconvolver svn repository svn co https://deconvolver.svn.sourceforge.net/svnroot/deconvolver deconvolver # 2. Run the test script and verify test complete successfully deconvolver/Grid/trunk/t/grid_deconvolve.t # 3. Read the help menu for the deconvolution command-line API Grid/trunk/bin/grid-deconvolve.pl --help # ------------------------------------------------------------------------------ # Standard mode for custom DNA barcode protocols # ------------------------------------------------------------------------------ The standard mode for grid deconvolution will search for the barcode pattern in both strands on the entire sequence of the input fragment data sets at an optionally provided number of allowable mismatches to the pattern. # ------------------------------------------------------------------------------ # sfffile_mode # ------------------------------------------------------------------------------ Grid/bin/grid-deconvolve.pl --sfffile_mode <OPTIONS> The release of the deconvolution code includes an "sfffile_mode" option that enables the deconvolution process to work similarly to the sfffile deconvolver for any input dataset (.sff, .fasta, .fastq). This option strictly searches for the key sequence adjacent to the barcode in the 5’-to-3’ direction. This option was added to provide a single software solution to handle both standard Roche MID barcoded data, and to handle custom barcode design protocols. Running in sfffile_mode does the following 1) Automatically sets and/or validates the key sequence a. For .sff files, it sets the key sequence using sffinfo, or validates a user-provided key sequence. b. For non-sff files, it requires that the user provide the key sequence. 2) Searches for barcode pattern only on positive strand on input sequence files. Note that there may be slight differences between the results of running sfffile versus grid-deconvolve.pl in sfffile_mode. The key differences are the sfffile tolerates gaps and does not count them against the allowable mismatch threshold, and sfffile looks for a sequence alignment only at the very beginning of the read sequence. grid-deconvolve.pl uses fuzznuc from Emboss tools to search the entire string for an allowable match. There is a test case included in the repository to validate sfffile_mode Grid/t/grid_deconvolve_sfffile_mode.t # ------------------------------------------------------------------------------ # Author # ------------------------------------------------------------------------------ Nelson Axelrod J. Craig Venter Institute http://www.jcvi.org naxelrod@jcvi.org
Source: README.txt, updated 2010-08-11