<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Variants</title><link>https://sourceforge.net/p/bamformatics/wiki/Variants/</link><description>Recent changes to Variants</description><atom:link href="https://sourceforge.net/p/bamformatics/wiki/Variants/feed" rel="self"/><language>en</language><lastBuildDate>Sun, 27 Jul 2014 06:40:21 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/bamformatics/wiki/Variants/feed" rel="self" type="application/rss+xml"/><item><title>Variants modified by Tomasz</title><link>https://sourceforge.net/p/bamformatics/wiki/Variants/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v3
+++ v4
@@ -10,14 +10,15 @@
    java -jar Bamformatics.jar callvariants
    --bam alignment.bam --output mycalls.vcf.gz options

-The last field, *options*, refers to details/thresholds used during variant calling. These options can be skipped as long as a [default] reference genome has been set.
+The last field, *options*, refers to details/thresholds used during variant calling. These options can be skipped as long as a [default] reference genome has been set. To see a complete list of the available options, run the *callvariants* tool without any arguments. 

+By default, the caller reports variants in moderately-to-well mappable regions, requires several high quality reads to document the variant, low strand bias, etc. The caller handles several idiosynchracies such as overlapping reads. It does not assume a ploidy model, so it can be used to process data from haploid, diploid, or polyploid samples, including mixtures. During indel calling, the caller performs some local realignment. The output is compatible with the vcf format.

-*Comment:* Several other variant calling programs are available (see [Resources]). However, the Bamformatics caller includes some interesting and distinguishing features (see [Features]).
+*Comment:* Several other variant calling programs are available (see [Resources]). However, the Bamformatics caller includes some interesting and distinguishing features (see [Features]). 

 *Comment:* The variant call quality scores output by this program are positive real numbers wherein large values indicate higher call confidence. However, the values are not phred representations of p-values and should not be interpreted as such.

-*Comment:* The GT field in the last column of the vcf output is indicative of whether a variant is hetero- or homo-zygous. However, no attempt is made to estimate the actual variant genotype, i.e. parental origin of heterozygous variants.
+*Comment:* The GT field in the last column of the vcf output is indicative of whether a variant is hetero- or homo-zygous. However, no attempt is made to estimate the actual variant haplotype, i.e. parental origin of heterozygous variants.

 *Comment:* The output is automatically compressed if the extension ends in .gz or .bz2

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tomasz</dc:creator><pubDate>Sun, 27 Jul 2014 06:40:21 -0000</pubDate><guid>https://sourceforge.net69cce2ef3742de9a7b2b9ee066218e74c8bc3166</guid></item><item><title>Variants modified by Tomasz</title><link>https://sourceforge.net/p/bamformatics/wiki/Variants/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v2
+++ v3
@@ -77,4 +77,4 @@

 *Comment:* The program can merge information about variants from several samples – just specify several *label/vcf/bam* sets.

-The variant details can be used to find differences in related samples, for example somatic mutations. Because the data is presented in a clear tabular format, all control over mutation calling and scoring is left to the follow-up analysis. An R function to score changes between samples, for example somatic mutations, can be found here.
+The variant details can be used to find differences in related samples, for example somatic mutations. Because the data is presented in a clear tabular format, all control over mutation calling and scoring is left to the follow-up analysis. For example, this R [function](http://sourceforge.net/projects/bamformatics/files/bamformatics/Rscripts/getVarChangeEvents.r/download) scores changes between samples, for example somatic mutations.
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tomasz</dc:creator><pubDate>Thu, 17 Jul 2014 11:28:55 -0000</pubDate><guid>https://sourceforge.netd7f2e0135e59a5c0e3227490c6b843301f4c3b79</guid></item><item><title>Variants modified by Tomasz</title><link>https://sourceforge.net/p/bamformatics/wiki/Variants/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v1
+++ v2
@@ -62,7 +62,7 @@
 &lt;br /&gt;
 ####Variant details####

-For custom post-processing of variant calls, a separate program can extract more information about variants than is encoded in the vcf file. In particular, this program can make explicit the read counts used in determining variant calls/qualities as well as indicate how many bases at a locus were observed but ignored.
+For custom post-processing of variant calls, a separate program can extract more information about variants than is encoded in the vcf file. In particular, this program can make explicit the read counts used in determining variant calls/qualities.

 To obtain tables with variant details, use

@@ -77,3 +77,4 @@

 *Comment:* The program can merge information about variants from several samples – just specify several *label/vcf/bam* sets.

+The variant details can be used to find differences in related samples, for example somatic mutations. Because the data is presented in a clear tabular format, all control over mutation calling and scoring is left to the follow-up analysis. An R function to score changes between samples, for example somatic mutations, can be found here.
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tomasz</dc:creator><pubDate>Tue, 15 Jul 2014 20:02:48 -0000</pubDate><guid>https://sourceforge.net0bf58ea49b602b775fb28e2e96479563b6cbee11</guid></item><item><title>WikiPage Variants modified by Tomasz</title><link>https://sourceforge.net/p/bamformatics/wiki/Variants/</link><description>&lt;div class="markdown_content"&gt;&lt;h2 id="variants"&gt;Variants&lt;/h2&gt;
&lt;p&gt;The Bamformatics toolkit contains several programs related to the identification and characterization of genetic variants in biological samples.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h4 id="calling"&gt;Calling&lt;/h4&gt;
&lt;p&gt;Calling of variants starting from an alignment (bam) file is performed using the command&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;java&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;Bamformatics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;callvariants&lt;/span&gt;
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;bam&lt;/span&gt; &lt;span class="n"&gt;alignment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bam&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;mycalls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;The last field, &lt;em&gt;options&lt;/em&gt;, refers to details/thresholds used during variant calling. These options can be skipped as long as a &lt;span&gt;[default]&lt;/span&gt; reference genome has been set.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Comment:&lt;/em&gt; Several other variant calling programs are available (see &lt;span&gt;[Resources]&lt;/span&gt;). However, the Bamformatics caller includes some interesting and distinguishing features (see &lt;span&gt;[Features]&lt;/span&gt;).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Comment:&lt;/em&gt; The variant call quality scores output by this program are positive real numbers wherein large values indicate higher call confidence. However, the values are not phred representations of p-values and should not be interpreted as such.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Comment:&lt;/em&gt; The GT field in the last column of the vcf output is indicative of whether a variant is hetero- or homo-zygous. However, no attempt is made to estimate the actual variant genotype, i.e. parental origin of heterozygous variants.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Comment:&lt;/em&gt; The output is automatically compressed if the extension ends in .gz or .bz2&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h4 id="annotation"&gt;Annotation&lt;/h4&gt;
&lt;p&gt;Output obtained from the calling procedure is independent of any database, e.g. dbSNP. To incorporate such annotations into a table of called variants, use the command&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;java&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;Bamformatics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;annotatevariants&lt;/span&gt;
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt; &lt;span class="n"&gt;mycalls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt; –&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;annotatedcalls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt; –&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dbSNP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;em&gt;Comment:&lt;/em&gt; This command replaces existing ID and INFO columns with values from the annotation database.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Comment:&lt;/em&gt; In cases where a position in the variant table matches a position in the database, but the variant information does not (e.g. replacement A to G in vcf file vs. A to C in database), the ID code of the variant is put in square brackets (e.g. [rs00000])&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h4 id="filtering"&gt;Filtering&lt;/h4&gt;
&lt;p&gt;Output obtained from the calling procedure should be considered in raw form and contains “.” in the FILTER column. The raw calls can be filtered in two distinct ways, by key/threshold pair or by genomic region.&lt;/p&gt;
&lt;p&gt;To filter by a key/threshold pair, use a command such as&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;java&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;Bamformatics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;filtervariants&lt;/span&gt;
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filtered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt; 
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="n"&gt;strand&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; “&lt;span class="n"&gt;SF&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;12”
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Here the string &lt;em&gt;strand&lt;/em&gt; is the label used in the FILTER column to identify relevant variants. The string passed to the &lt;em&gt;key&lt;/em&gt; option instructs the program to detect variants wherein the value of the &lt;em&gt;SF&lt;/em&gt; is greater than 12. In this particular example, the program flags variants wherein the strand fisher test gives a p-value with phred score above 12.&lt;/p&gt;
&lt;p&gt;To filter by genomic region, use a command such as&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;java&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;Bamformatics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;filtervariants&lt;/span&gt;
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt; –&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filtered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="n"&gt;repeats&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;bed&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;bed&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;repeats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Here, the last argument should be a definition of a genomic region in bed format. In this particular example, the aim is to flag variants in repetitive regions of the genome using the string “repeats”.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Comment:&lt;/em&gt; Multiple filters can be applied to a vcf file simultaneously – just specify several filter/key or filter/bed arguments in the same command. Make sure, however, to specify names of key-based filters before region-based filters.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h4 id="variant-details"&gt;Variant details&lt;/h4&gt;
&lt;p&gt;For custom post-processing of variant calls, a separate program can extract more information about variants than is encoded in the vcf file. In particular, this program can make explicit the read counts used in determining variant calls/qualities as well as indicate how many bases at a locus were observed but ignored.&lt;/p&gt;
&lt;p&gt;To obtain tables with variant details, use&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;java&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;Bamformatics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;variantdetails&lt;/span&gt;
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;details&lt;/span&gt;
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="n"&gt;mysample&lt;/span&gt; –&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt; &lt;span class="n"&gt;mycalls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt; –&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bam&lt;/span&gt; &lt;span class="n"&gt;myalignment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bam&lt;/span&gt;
&lt;span class="n"&gt;options&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Here, the string &lt;em&gt;/path/to/details&lt;/em&gt; is interpreted as a prefix for output files, which will include a log file, a summary file, and two tables, one for single-base substitution variants and one for indels. The &lt;em&gt;vcf&lt;/em&gt; and &lt;em&gt;bam&lt;/em&gt; arguments should refer to matching call and alignment files, and the &lt;em&gt;label&lt;/em&gt; field should be a short string describing the sample – it will be used to identify columns in the output tables. &lt;/p&gt;
&lt;p&gt;The &lt;em&gt;options&lt;/em&gt; refers to the same options that are used during variant calling (see previous sections). For convenience and to promote consistency, &lt;span&gt;[default]&lt;/span&gt; values for these options can be set separately.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Comment:&lt;/em&gt; The program can merge information about variants from several samples – just specify several &lt;em&gt;label/vcf/bam&lt;/em&gt; sets.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tomasz</dc:creator><pubDate>Sun, 03 Mar 2013 22:01:43 -0000</pubDate><guid>https://sourceforge.netf28cafe4be3dc64862c0291db03d08bfee44f4f9</guid></item></channel></rss>