Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Atlas-Indel

Uday Evani Atlas2Team

Atlas-Indel

Atlas-Indel2 is designed to evaluate and distinguish true insertions and deletions (indels) from sequencing and mapping errors in whole-exome capture sequencing (WECS) data.

    ruby Atlas-Indel2.rb -b \[input_bam\] -r \[reference\] -o \[outfile\] \[-S | I\]

Basic usage information may be viewed at any time by running Atlas-Indel2 without any arguments.

Mandatory arguments

-b, --bam=FILE The input BAM file. It must be sorted. It does not need to be indexed. A read mask of 1796 is used on the bitwise flag.
-r, --reference=FILE The reference sequence to be used in FASTA format. This must be the same version used in mapping the sequence. It does not need to be indexed.
-o, --outfile=FILENAME The name of the output VCF file. Output is a simple VCFv4 file with a single sample. These files can be merged into a more complete VCF file using the vcfPrinter (included). If the file already exists, it will be overwritten. NOTE: For use with vcfPrinter, you should name your vcf file the same as your BAM file, simply replacing ".bam" with ".vcf".
–I or -S You must include one of these flags to specify either the Illumina or SOLiD regression model to be used.

Optional arguments

Note: Different platform modes have different defaults.

-s, --sample=STRING The name of the sample to be listed in the output VCF file. If not specified the sample name is harvested from the input BAM file name, taking the first group of characters before a . (dot) is found. For example, with the filename "NA12275.chrom1.bam" the sample name would be "NA12275".
-p, --p-cutoff=FLOAT Defaults: Illumina:0.5, Solid:0.5 The indel probability (p) cutoff value for the logistic regression model. Indels with a p less than this cutoff will not be called. Increasing this cutoff will increase specificity, but will lower sensitivity. Ifyou adjust this cutoff, you should usually also adjust the p-1bp-cutoff (see below).
-P --p-1bp-cutoff Defaults: Illumina:0.5, Solid:0.88 The indel probability (p) cutoff value for 1bp deletions. This may be set to a stricter standard than the normal p-cutoff to increase callset specificity. This is very useful for SOLiD data, but should not be generally needed for Illumina data.
-B --bed=FILE Here you may specify a bed file which contians the region you wish to limit your indel calling to. Only reads inside the region will be process, which can significantly shorten the runtime.
-a --always-include (file of sites with annotation to always include in VCF)
-F --show-filtered (include filtered indels with a QUAL>=1 in the VCF)
-O --orig-base-qual This is the default for SOLiD, it is not recommended for Illumina data. This option has the algorithm use the original base qualities as specified in the OQ tag if included in the BAM file. If the BAM file does not include OQ tags, the normal base quality is used.
-N --norm-base-qual This is the default for Illumina, it is not recommended for SOLiD data. This option specifies the algorithm should use the normal base qualities, as specified in the QUAL column of the BAM file.

Heuristic Cutoffs

Most of these variables have already been considered by the regression model, so you shouldn't usually need to alter them. However you are free to change them to meet your specific project requirements.

-t, --min-total-depth=INT Defaults: Illumina:2, SOLiD:2 The minimum total depth coverage required at an indel site. Indels at a site with less depth coverage will not be called. Increasing this value will increase specificity, but lower sensitivity.
Suggested range: 2-12
-m, --min-var-reads=INT Defaults: Illumina:2, SOLiD:2 The minimum number of variant reads required for an indel to be called. Increasing this number may increase specificity but will lower sensitivity.
Suggested range: 1-5
-v, --min-var-ratio=FLOAT Defaults: Illumina:0.06, SOLiD:0.05 The variant-reads/total-reads cutoff. Indels with a ratio less than the specified value will not be called. Increasing this value may increase specificity, but will lower sensitivity.
Suggested range: 0-0.1
-f, --strand-dir-filter Default: Illumina:disabled, SOLiD:disabled When included, requires indels to have at least one variant read in each strand direction. This filter is effective at increasing the specificity, but also carries a heavy sensitivity cost.
-n, --near-read_end_ratio=FLOAT Default: Illumina:0.8, SOLiD:1.0 (disabled) The read end ratio is defined as the number of variant reads where the variant is within 5bp of a read end divided by the total variant read depth. If this ratio is greater than the specified value, the indel is filtered.
Suggested range: 0.7-1.0
-h, --homo-var-cutoff Default: Illumina:0.6, Solid:0.5 The homozygous variant cutoff. This cutoff is used in the preliminary genotyping performed by Atlas-Indel2. If the variant reads divided by the variant reads + the reference reads is greater than this cutoff it will be marked as a homozygote, otherwise it will be marked as a heterozygote.

Example

    ruby Atlas-Indel2.rb -b seq1.10.2010.chrom1.bam -r ~/refs/human_g1k_v37.fasta -o ~/NA12275.chrom1.vcf -t 10 -m 5 -s NA12275 -B target_region.bed -S

Related

Wiki: Atlas2 Suite