Atlas-SNP Atlas-SNP2 is designed to evaluate and distinguish true SNP from sequencing and mapping errors in whole-exome capture sequencing (WECS) data. For 454/Illumina data ruby Atlas-SNP2.rb Atlas-SNP2.rb -i [in.sorted.bam] -r [reference.fa] -o [output file] -n [sample name] [choosing platform] Atlas-SNP2 is coded in Ruby and basic usage can be viewed by running the program without any argument. -i FILE BAM format alignment file (Required to be sorted by start position) -r FILE FASTA format reference sequence file (Required) -o STR name of output result file (Required) -n Sample name used in VCF file (Required) -t STR Only call SNP on given target region (Optional, please refer "samtools view" for the target region format) -a FILE file containing sites will always be included(optional) -w only evaluate sites in the list (use with -a, optional) -F Include filtered lines in the output that have a QUAL of at least 1 Choosing Platform --Illumina Illumina --454_FLX 454 FLX --454_XLR 454 Titanium Setting up prior probability -e FLT Prior(error|c) when variant coverage number is above 2 for 454 and Illumina data (Default is 0.1) -l FLT Prior(error|c) when variant coverage number is 1 or 2 for 454 data (Default is 0.9) Setting up filters -c Posterior probability cutoff (Default is 0.95) -y Minimal Coverage required for high confidence SNP calls (Default is 6) -m FLT maximum percentage of substitution bases allowed in the alignment (Default is 5.0) -g FLT maximum percentage of insertion and deletion bases allowed in the alignment (Default is 5.0) -f INT maximum number of alignments allowed to be piled up on a site (Default is 1024) -p INT insertion size for pair-end re-sequencing data (Default is OFF) Example # Call SNP of one Illumina BAM only on chr1 and output the SNP results in VCF file ~/NA12275.chr1.snp.vcf ruby Atlas-SNP2.rb -i NA12275.bam -r ~/refs/human_g1k_v37.fasta -o NA12275.chr1.snp –-Illumina –t chr1 –v –n NA12275 For SOLiD data SOLiD-SNP-caller \<in.bam> \<ref.fa> [.bed region] > [output.vcf] SOLiD-SNP-caller is coded in C++ and basic usage can be viewed by running the program without any argument. \<in.bam> FILE BAM format alignment file (Required to be sorted by start position) \<ref.fa> FILE FASTA format reference sequence file (Required) \<.bed> FILE Only call SNP on given regions defined in bed format (optional) Example # Call SNP of one SOLiD BAM only on coding regions and output the SNP results in VCF file NA20532.ontarget.vcf SOLiD-SNP-caller NA20532.bam ~/refs/human_g1k_v37.fasta ccds.bed > NA20532.ontarget.vcf For Ion Torrent data The Atlas2 Ion software incorporates the well-established algorithms of current Atlas2 builds, but also focuses attention on the specific systematic error modes of the platform's base-caller in order to return results of the highest quality. Although analysis platforms already exist that have been adapted to call Ion data to a good standard (for example GATK), we aim to surpass these projects by building an Ion-specific platform. This should provide end-users with increased confidence in their data and in the Ion torrent platform itself. It will also expand the scope of the Atlas2 project itself, which aims to remain the leading platform of its type. Analysis capabilities for the Ion Torrent PGM are currently in the late stages of development. The features are currently being fine-tuned prior to the software's version 1 release. Several new algorithms have been developed, with a focus on Ion-specific error modes such as those related to read length, homopolymer length and GC content. Other, more generally-applicable, algorithms are also being calibrated to the Ion platform. Atlas2 Ion will initially be made available as a stand-alone product, with integration into the main Atlas2 suite possible in the future. There will be an announcement once Atlas2 Ion Version 1 is released. Comprehensive usage documentation will also be added to this wiki page at that time.
Last updated: 2014-08-28
Full Commit List : https://sourceforge.net/p/atlas2/code/commit_browser Trunk Stable Version 1.4.3 (03-01-2013) Atlas-SNP2 Atlas-INDEL2 VcfPrinter Version 1.4.1 (09-06-2012) Atlas-SNP2 Atlas-Indel2 Version 1.3 (08-18-2011) Atlas-SNP2 Version 1.2 (01-18-2011) Atlas-SNP2 Version 1.1 (04-26-2010) Atlas-SNP2 Version 1.0 (01-20-2010) Atlas-SNP2 Atlas-Indel2 Version 0.3.1 Atlas-Indel2 Version 0.3 Atlas-Indel2 Version 0.2.1 Atlas-Indel2 Version 0.2 Atlas-Indel2 Trunk VcfPrinter: New flag "--indel" added to handle indels when merging vcfs. VcfPrinter: Added unit test cases for VcfLine class. Stable Version 1.4.3 (03-01-2013) Atlas-SNP2 Add an option “-w” to only evaluate the sites in a given VIP list. Atlas-SNP2 will evaluate the VIP sites of extra-high coverage, regardless of the setting of “maximum coverage”. And these VIP sites will be marked as “high_coverage” in the filter column if they are higher than “maximum coverage”. (In previous version, these sites are skipped) Require the users to set the sequencing platform. The labels are “--Illumina”,”454_FLX”,”454_XLR”. To make it compatible to previous submission scripts of various users, “-s” for Illumina data will still work. Atlas-INDEL2 Bug fixed in -always-include option which caused included low-Quality sites to be printed with a QUAL of 'false' and missing the P value. See ticket. Removed the ReqIncl filter, this is now only indicated in the INFO column. VcfPrinter New "--fast" implemented. Since it uses memory for storing variant information across all samples should only be used to merge small number of sites (~50000) across small number of samples (~20). New option "--cluster". Designed to be used in a HPC environment. Useful when merging millions of variants across thousands of samples. Logging implemented. Version 1.4.1 (09-06-2012) Atlas-SNP2 Added always-include option Added show-filtered option Added version information and running commend in the VCF header Atlas-Indel2 Added alway-include option Added show-filtered option Fixed bug caused by passing a non-fasta reference genome Fixed bug occationally returning infinite P value in INFO column Fixed bug caused by reads mapping past the end of the reference genome Made Atlas2-Indel2 more tolerant of malformed SAM lines For Illumina/ 454 platforms Version 1.3 (08-18-2011) Atlas-SNP2 Add a new option to call SNP on given regions or by chromosomes Change the default maximum coverage for SNP calling to 1024 For pair-end data, add an option to use insertion size for mapping quality control Improve the performance of crossmatch2SAM Version 1.2 (01-18-2011) This is a major upgrade of Atlas2 Atlas-SNP2 New features one-stop running: take sorted BAM files and reference file as input and output SNP genotypes in VCF format use mapping quality score as alignment quality control use insertion size as mapping quality control for pair-end re-sequencing data more filters are integrated for higher quality SNP calls Performance whole genome SNP calling is doable on a typical PC with 4G memory now. In our test, it can process 1 million reads per 5 minutes for whole exome SNP calling only using one CPU core of Xeon 5520 and 4G memory Bugs fixed and compatibility more robust to alignment errors crossmatch2SAM tool is compatible to Ruby 1.9.X now a few minor bugs Version 1.1 (04-26-2010) Atlas-SNP2 added a heuristics-based genotyping module added a column of “numRefReads_afterFilter” in Atlas-SNP2 result file revised the header line in Atlas-SNP2 output file to be more explicit skipped duplicate reads masked in the BAM files when processing added an option for the user to setup the max number of alignments allowed to be piled up at a particular site printed more running information and more detailed alignments statistics more robust to various alignments errors fixed several bugs Version 1.0 (01-20-2010) Atlas-SNP2 added Illumina Platform support all calculations are now based on required fields of SAM to get maximum compatibility added CIGAR and reference sequence test code used pileup number to calculate TotalCoverage improved performance migrated to Ruby 1.9 many minor improvements Draft release version 0.1 (12-10-2009): initial implementation initial support of SAM files For SOLiD platform (08-18-2011) Major SNP calling model update Support GATK base quality re-calibrated BAM by using OQ tags Call SNPs only on regions defined in a bed format file Output the SNP calls in vcf format directly Atlas-Indel2 updated SOLiD model and adjusted P cutoffs changed -P cutoff to apply to both 1bp insertions and deltions (rather than just 1bp deletions) Version 0.3.1 Atlas-Indel2 added options to use original base quality fixed bug that sometimes returned success exit code when there was a failure fixed bug in simple_genotyper that caused samples with exactly 0.05 variant read ratio to be 0/0 fixed bug in simple genotyper that caused genotypes to occationaly read ./. fixed bug in bed_filter that was filtering some on-target reads in very small target regions Version 0.3 Atlas-Indel2 updated SOLiD and Illumina models and recalibrated default settings implemented the ability to input a bed file to call only on-target indels switched from using z cutoffs to using p cutoffs modified 1bp p cutoff to only filter 1bp deletions fixed bug where the strand direction filter failed to be enabled added check for proper ruby version fixed bug that occasionally allows an indel quality of 110 (max should be 100) minor code-structure changes Version 0.2.1 Atlas-Indel2 added read_level model and improved site level model for SOLiD data adjusted default SOLiD z cutoff to 0.0 (to reflect new model) added check for proper ruby version minor codes structure changes added additional heuristic filter that allows for a stricter z cutoff for 1bp indels, very useful for SOLiD data integrated heuristic genotyping –implemented fixed bug where Atlas-Indel2 crashes if a BAM chromosome is not in the reference now will keep ‘chr’ in the chromosome label if it is in the BAM the depreciated script "Atlas-Indel2-Illum-Exome.rb, has been removed. Please use Atlas- Indel2.rb with the -I flag instead Version 0.2 Atlas-Indel2 Implemented regression model for SOLiD data. You must now specify a regression model with -S or -I. Renamed main script to Atlas-Indel.rb. Modified Reference sequence class to allow for unsorted reference genomes. Added the indel z to the info column of the VCF output (not included after running VCF printer). Now echos all settings back onto the command line. Fixed a bug that caused loss of precision in the normalized variant square variable of the Illumina site model. Fixed a bug in the depth coverage algorithm that caused reads not to be counted in total depth at the deleted sites. Fixed the sample columns order to be comaptible with vcfPrinter. Removed "x flagged lines skipped" message at end of run.
Last updated: 2013-06-19