Looking for the latest version? Download bin.zip (16.5 kB)
Home / SHREC 2.1
Name Modified Size Downloads / Week Status
Parent folder
Totals: 3 Items   30.8 kB 3
bin.zip 2011-05-27 15.9 kB 11 weekly downloads
src.zip 2011-05-27 10.5 kB 11 weekly downloads
README.txt 2011-03-09 4.4 kB 11 weekly downloads
Short Read Error Correction (SHREC) pSHREC v.2.1 Contact: schroder@csse.unimelb.edu.au --------- Overview: --------- SHREC is a JAVA program to correct sequencing errors in short-read hight-throughput data, such as those generated by the Illumina Genome Analyzer. ---------- Reference: ---------- J.Schroeder, H.Schroeder, R.Sinha, S.J.Puglisi, B.Schmidt: "SHREC: A short-read error correction method", Bioinformatics 2009 ------------------- System Requirements: ------------------- SHREC has been tested on systems running Linux on an x86_64 architecture. Compiling the program requires a recent version of JVM. We recommend to allow the JVM to allocate as much memory as possible (for example -Xmx3072m), because the memory consumption is vast. ------------- Installation: ------------- One of the following: Unpack src.zip. Compile SHREC using javac *.java Or: unpack bin.zip (precompiled binaries) ------ Usage: ------ java -Xmx<#>g Shrec [options] <input reads> <corrected reads output> <discarded reads output> (where <#> is the memory in gigabytes allowed for Shrec to use) Options: -i #n: Postive Integer to specify how many times the error correction should be run on the set of reads: more iterations allow the algorithm to correct more than one error in a read. A higher value than 5 doesn't make much sense for most problem settings. Default value is 3. -l #f #t: specify the levels to check in the suffix trie. #f is the level on which the nodes are compared. The #t parameter then specifies how many more levels of the trie are to be constructed. A higher value results in higher memory consumption but allows a more accurate comparison of subtrees to analyse a possible correction. Default values are 21 to 24. -c #n: cutoff value - specify the threshold of node counts for an error (default 5) - this is the most crucial parameter to set right, because it depends on the expected coverage of the input data. -d #n: parallelisation depth. specify the depth in which the suffix trie is to be divided (higher values for machines with small memory - default 3) -f x: specify the input file format (fasta, fastq - default fasta) -p #n: number of threads to run simultaneously (default 2) Required Parameters: <input reads> - Path and name of the input file containing the reads in FASTA format, e.g. >Read_1 TGGCAAAGTATGTGTGTCCTATGTCCTCAAGAC >Read_2 CCCCATACACTTCAAAAAACAAAAAACCCTAGA ..... With the -f option, FASTQ files can be handled as well. In the current SHREC version, Reads must be of equal length and contain only the letters {A,C,G,T}. <corrected reads output> - Path and name of output file containing all reads from <input reads> that SHREC has detected as erroneous and corrected. <discarded reads file> - Path and name of output file containing all reads from <input reads> that SHREC has detected as erroneous but NOT corrected -------------- Configuration: -------------- A few tips to configure Shrec for the machine in use: The first thing to do is, to identify, how much main memory the computer has available and then make that available to the java virtual machine. For example, if the machine has 32Gb, you might run it with java -Xmx30g Shrec ... (don't take the full amount, that might cause the computer to crash). Next, you identify, how many processors you would like to use in parallel, to speed things up. Is it a 2 core, 4 core, 8 core? -> -p 8 The next parameter to set is the parallelisation depth (-d). If the default value runs into memory problems (out of memory exception or swapping - check the memusage: the reserved memory should be lower than the virtual to run efficiently), try a higher -d value. If the reserved meory is significantly lower than the available (set above), try a lower -d value, because this will run faster. -------- Example: -------- On the SHREC homepage you will find a dataset (in the folder "Sample Data") with simulated read that was obtained by randomly sampling 1.1M reads of 35 bases from the 576,869bp S.cer5 (NC_00137) genome. A base error rate of 1% has been uniformly introduced in the read sequences. SHREC can now be called as follows: java -Xmx4g Shrec NC_001137_generated_reads.fas NC_001137_corrected_reads.fas NC_001137_discarded_reads.fas
Source: README.txt, updated 2011-03-09