<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to PoPOOLationWalkthrough</title><link>https://sourceforge.net/p/popoolation/wiki/PoPOOLationWalkthrough/</link><description>Recent changes to PoPOOLationWalkthrough</description><atom:link href="https://sourceforge.net/p/popoolation/wiki/PoPOOLationWalkthrough/feed" rel="self"/><language>en</language><lastBuildDate>Mon, 16 Mar 2015 14:28:49 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/popoolation/wiki/PoPOOLationWalkthrough/feed" rel="self" type="application/rss+xml"/><item><title>PoPOOLationWalkthrough modified by Anonymous</title><link>https://sourceforge.net/p/popoolation/wiki/PoPOOLationWalkthrough/</link><description>&lt;div class="markdown_content"&gt;&lt;ul&gt;
&lt;li&gt;Data set&lt;/li&gt;
&lt;li&gt;Requirements&lt;/li&gt;
&lt;li&gt;Walkthrough&lt;ul&gt;
&lt;li&gt;Download PoPoolation&lt;/li&gt;
&lt;li&gt;Trimming of the reads&lt;/li&gt;
&lt;li&gt;Resulting trimming statistics&lt;/li&gt;
&lt;li&gt;Mapping of reads (BWA)&lt;/li&gt;
&lt;li&gt;Prepare reference sequence&lt;/li&gt;
&lt;li&gt;Mapping using bwa&lt;/li&gt;
&lt;li&gt;Filter reads by mapping quality and convert to a pileup file&lt;/li&gt;
&lt;li&gt;Convert sam-file into a sorted bam-file&lt;/li&gt;
&lt;li&gt;Crosscheck&lt;/li&gt;
&lt;li&gt;Convert the sorted bam file into a pileup file&lt;/li&gt;
&lt;li&gt;Run PoPoolation&lt;/li&gt;
&lt;li&gt;Run Variance-sliding.pl&lt;/li&gt;
&lt;li&gt;Visualise output of Variance-sliding.pl&lt;/li&gt;
&lt;li&gt;Run Variance-at-position&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="data-set"&gt;Data set&lt;/h1&gt;
&lt;p&gt;file &lt;br /&gt;
read number&lt;br /&gt;
read count&lt;br /&gt;
read length&lt;/p&gt;
&lt;p&gt;s_7_1_sequence_Jul2009.fastq&lt;br /&gt;
1&lt;br /&gt;
17957720&lt;br /&gt;
75&lt;/p&gt;
&lt;p&gt;s_7_2_sequence_Jul2009.fastq&lt;br /&gt;
2&lt;br /&gt;
17957720&lt;br /&gt;
75&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: Data available at short read archiv &lt;a href="http://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=SRA023610" rel="nofollow"&gt;http://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=SRA023610&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: A very small sample data set can alternatively be found in the Quick Guide: &lt;a href="http://code.google.com/p/popoolation/wiki/TeachingPoPoolation" rel="nofollow"&gt;http://code.google.com/p/popoolation/wiki/TeachingPoPoolation&lt;/a&gt;&lt;/p&gt;
&lt;h1 id="requirements"&gt;Requirements&lt;/h1&gt;
&lt;p&gt;see &lt;a class="" href="../Manual"&gt;Manual&lt;/a&gt;&lt;/p&gt;
&lt;h1 id="walkthrough"&gt;Walkthrough&lt;/h1&gt;
&lt;h2 id="download-popoolation"&gt;Download PoPoolation&lt;/h2&gt;
&lt;p&gt;PoPoolation may be obtained directly from the subversion repository. Just go to the directory where you want to install PoPoolation in the command line and enter the command: &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;svn&lt;/span&gt; &lt;span class="n"&gt;checkout&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="c1"&gt;//popoolation.googlecode.com/svn/trunk/ popoolation&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To update your copy of PoPoolation with the latest improvements enter your PoPoolation directory and enter the command: &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;svn&lt;/span&gt; &lt;span class="n"&gt;update&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Alternatively PoPoolation may be downloaded from the project main page: &lt;a href="http://code.google.com/p/popoolation" rel="nofollow"&gt;http://code.google.com/p/popoolation/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;However we recommend to use subversion as bugfixes will be immediately available in the repository&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="trimming-of-the-reads"&gt;Trimming of the reads&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;perl&lt;/span&gt; &lt;span class="n"&gt;trim&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;fastq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;input1&lt;/span&gt; &lt;span class="n"&gt;s_7_1_sequence_Jul2009&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fastq&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;input2&lt;/span&gt; &lt;span class="n"&gt;s_7_2_sequence_Jul2009&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fastq&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;quality&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id="resulting-trimming-statistics"&gt;Resulting trimming statistics&lt;/h3&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;FINISHED&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt;
&lt;span class="n"&gt;Read&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="n"&gt;processed&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;17957720&lt;/span&gt;
&lt;span class="n"&gt;Read&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="n"&gt;trimmed&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pairs&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;13495995&lt;/span&gt;
&lt;span class="n"&gt;Read&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="n"&gt;trimmed&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;singles&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3802238&lt;/span&gt;

&lt;span class="n"&gt;FIRST&lt;/span&gt; &lt;span class="n"&gt;READ&lt;/span&gt; &lt;span class="n"&gt;STATISTICS&lt;/span&gt;
&lt;span class="n"&gt;First&lt;/span&gt; &lt;span class="n"&gt;reads&lt;/span&gt; &lt;span class="n"&gt;passing&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15656946&lt;/span&gt;
&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="n"&gt;poly&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="n"&gt;sequences&lt;/span&gt; &lt;span class="n"&gt;trimmed&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;16377&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="n"&gt;poly&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="n"&gt;sequences&lt;/span&gt; &lt;span class="n"&gt;trimmed&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;61990&lt;/span&gt;
&lt;span class="n"&gt;Reads&lt;/span&gt; &lt;span class="n"&gt;discarded&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="s1"&gt;'remaining N filtering'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;Reads&lt;/span&gt; &lt;span class="n"&gt;discarded&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="n"&gt;filtering&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2300774&lt;/span&gt;
&lt;span class="n"&gt;Count&lt;/span&gt; &lt;span class="n"&gt;sequences&lt;/span&gt; &lt;span class="n"&gt;trimed&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="n"&gt;quality&lt;/span&gt; &lt;span class="n"&gt;filtering&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8551460&lt;/span&gt;

&lt;span class="n"&gt;Read&lt;/span&gt; &lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="n"&gt;distribution&lt;/span&gt; &lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="n"&gt;read&lt;/span&gt;
&lt;span class="n"&gt;length&lt;/span&gt;  &lt;span class="n"&gt;count&lt;/span&gt;
&lt;span class="mi"&gt;50&lt;/span&gt;  &lt;span class="mi"&gt;110528&lt;/span&gt;
&lt;span class="mi"&gt;51&lt;/span&gt;  &lt;span class="mi"&gt;112155&lt;/span&gt;
&lt;span class="mi"&gt;52&lt;/span&gt;  &lt;span class="mi"&gt;115546&lt;/span&gt;
&lt;span class="mi"&gt;53&lt;/span&gt;  &lt;span class="mi"&gt;119517&lt;/span&gt;
&lt;span class="mi"&gt;54&lt;/span&gt;  &lt;span class="mi"&gt;120622&lt;/span&gt;
&lt;span class="mi"&gt;55&lt;/span&gt;  &lt;span class="mi"&gt;123213&lt;/span&gt;
&lt;span class="mi"&gt;56&lt;/span&gt;  &lt;span class="mi"&gt;127909&lt;/span&gt;
&lt;span class="mi"&gt;57&lt;/span&gt;  &lt;span class="mi"&gt;132907&lt;/span&gt;
&lt;span class="mi"&gt;58&lt;/span&gt;  &lt;span class="mi"&gt;139580&lt;/span&gt;
&lt;span class="mi"&gt;59&lt;/span&gt;  &lt;span class="mi"&gt;143970&lt;/span&gt;
&lt;span class="mi"&gt;60&lt;/span&gt;  &lt;span class="mi"&gt;151528&lt;/span&gt;
&lt;span class="mi"&gt;61&lt;/span&gt;  &lt;span class="mi"&gt;166721&lt;/span&gt;
&lt;span class="mi"&gt;62&lt;/span&gt;  &lt;span class="mi"&gt;178000&lt;/span&gt;
&lt;span class="mi"&gt;63&lt;/span&gt;  &lt;span class="mi"&gt;185352&lt;/span&gt;
&lt;span class="mi"&gt;64&lt;/span&gt;  &lt;span class="mi"&gt;192283&lt;/span&gt;
&lt;span class="mi"&gt;65&lt;/span&gt;  &lt;span class="mi"&gt;202482&lt;/span&gt;
&lt;span class="mi"&gt;66&lt;/span&gt;  &lt;span class="mi"&gt;221528&lt;/span&gt;
&lt;span class="mi"&gt;67&lt;/span&gt;  &lt;span class="mi"&gt;247456&lt;/span&gt;
&lt;span class="mi"&gt;68&lt;/span&gt;  &lt;span class="mi"&gt;270715&lt;/span&gt;
&lt;span class="mi"&gt;69&lt;/span&gt;  &lt;span class="mi"&gt;312185&lt;/span&gt;
&lt;span class="mi"&gt;70&lt;/span&gt;  &lt;span class="mi"&gt;364457&lt;/span&gt;
&lt;span class="mi"&gt;71&lt;/span&gt;  &lt;span class="mi"&gt;461386&lt;/span&gt;
&lt;span class="mi"&gt;72&lt;/span&gt;  &lt;span class="mi"&gt;830395&lt;/span&gt;
&lt;span class="mi"&gt;73&lt;/span&gt;  &lt;span class="mi"&gt;1220619&lt;/span&gt;
&lt;span class="mi"&gt;74&lt;/span&gt;  &lt;span class="mi"&gt;9405892&lt;/span&gt;

&lt;span class="n"&gt;SECOND&lt;/span&gt; &lt;span class="n"&gt;READ&lt;/span&gt; &lt;span class="n"&gt;STATISTICS&lt;/span&gt;
&lt;span class="n"&gt;Second&lt;/span&gt; &lt;span class="n"&gt;reads&lt;/span&gt; &lt;span class="n"&gt;passing&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15137282&lt;/span&gt;
&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="n"&gt;poly&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="n"&gt;sequences&lt;/span&gt; &lt;span class="n"&gt;trimmed&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;99060&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="n"&gt;poly&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="n"&gt;sequences&lt;/span&gt; &lt;span class="n"&gt;trimmed&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;35878&lt;/span&gt;
&lt;span class="n"&gt;Reads&lt;/span&gt; &lt;span class="n"&gt;discarded&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="s1"&gt;'remaining N filtering'&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;Reads&lt;/span&gt; &lt;span class="n"&gt;discarded&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="n"&gt;filtering&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2820438&lt;/span&gt;
&lt;span class="n"&gt;Count&lt;/span&gt; &lt;span class="n"&gt;sequences&lt;/span&gt; &lt;span class="n"&gt;trimed&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="n"&gt;quality&lt;/span&gt; &lt;span class="n"&gt;filtering&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;9051269&lt;/span&gt;

&lt;span class="n"&gt;Read&lt;/span&gt; &lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="n"&gt;distribution&lt;/span&gt; &lt;span class="n"&gt;second&lt;/span&gt; &lt;span class="n"&gt;read&lt;/span&gt;
&lt;span class="n"&gt;length&lt;/span&gt;  &lt;span class="n"&gt;count&lt;/span&gt;
&lt;span class="mi"&gt;50&lt;/span&gt;  &lt;span class="mi"&gt;103225&lt;/span&gt;
&lt;span class="mi"&gt;51&lt;/span&gt;  &lt;span class="mi"&gt;102568&lt;/span&gt;
&lt;span class="mi"&gt;52&lt;/span&gt;  &lt;span class="mi"&gt;108397&lt;/span&gt;
&lt;span class="mi"&gt;53&lt;/span&gt;  &lt;span class="mi"&gt;117810&lt;/span&gt;
&lt;span class="mi"&gt;54&lt;/span&gt;  &lt;span class="mi"&gt;130272&lt;/span&gt;
&lt;span class="mi"&gt;55&lt;/span&gt;  &lt;span class="mi"&gt;146265&lt;/span&gt;
&lt;span class="mi"&gt;56&lt;/span&gt;  &lt;span class="mi"&gt;153847&lt;/span&gt;
&lt;span class="mi"&gt;57&lt;/span&gt;  &lt;span class="mi"&gt;166468&lt;/span&gt;
&lt;span class="mi"&gt;58&lt;/span&gt;  &lt;span class="mi"&gt;182299&lt;/span&gt;
&lt;span class="mi"&gt;59&lt;/span&gt;  &lt;span class="mi"&gt;191046&lt;/span&gt;
&lt;span class="mi"&gt;60&lt;/span&gt;  &lt;span class="mi"&gt;199036&lt;/span&gt;
&lt;span class="mi"&gt;61&lt;/span&gt;  &lt;span class="mi"&gt;192029&lt;/span&gt;
&lt;span class="mi"&gt;62&lt;/span&gt;  &lt;span class="mi"&gt;223088&lt;/span&gt;
&lt;span class="mi"&gt;63&lt;/span&gt;  &lt;span class="mi"&gt;224462&lt;/span&gt;
&lt;span class="mi"&gt;64&lt;/span&gt;  &lt;span class="mi"&gt;190667&lt;/span&gt;
&lt;span class="mi"&gt;65&lt;/span&gt;  &lt;span class="mi"&gt;183919&lt;/span&gt;
&lt;span class="mi"&gt;66&lt;/span&gt;  &lt;span class="mi"&gt;190513&lt;/span&gt;
&lt;span class="mi"&gt;67&lt;/span&gt;  &lt;span class="mi"&gt;202838&lt;/span&gt;
&lt;span class="mi"&gt;68&lt;/span&gt;  &lt;span class="mi"&gt;230031&lt;/span&gt;
&lt;span class="mi"&gt;69&lt;/span&gt;  &lt;span class="mi"&gt;269701&lt;/span&gt;
&lt;span class="mi"&gt;70&lt;/span&gt;  &lt;span class="mi"&gt;360601&lt;/span&gt;
&lt;span class="mi"&gt;71&lt;/span&gt;  &lt;span class="mi"&gt;460748&lt;/span&gt;
&lt;span class="mi"&gt;72&lt;/span&gt;  &lt;span class="mi"&gt;773657&lt;/span&gt;
&lt;span class="mi"&gt;73&lt;/span&gt;  &lt;span class="mi"&gt;1181002&lt;/span&gt;
&lt;span class="mi"&gt;74&lt;/span&gt;  &lt;span class="mi"&gt;8852793&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id="mapping-of-reads-bwa"&gt;Mapping of reads (BWA)&lt;/h2&gt;
&lt;h3 id="prepare-reference-sequence"&gt;Prepare reference sequence&lt;/h3&gt;
&lt;p&gt;First obtain a reference genome of D. melanogaster from &lt;a href="http://flybase.org" rel="nofollow"&gt;http://flybase.org/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We used: dmel-all-chromosome-&lt;span&gt;[r5]&lt;/span&gt;.22.fasta.gz &lt;/p&gt;
&lt;p&gt;Remove everything after the first whitespace from the reference genome. This is a precautioniary measure as some mappers and software downstream of mapping have difficulties with fasta-ids containing whitespace. &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;awk&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;print&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;chromosome&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r5&lt;/span&gt;&lt;span class="mf"&gt;.22&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fasta&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="kt"&gt;short&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fa&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Index the reference sequence &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;bwa&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="kt"&gt;short&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fa&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id="mapping-using-bwa"&gt;Mapping using bwa&lt;/h3&gt;
&lt;p&gt;Assuming the reference sequence is in the folder 'wg' the following command will map the reads which have been trimmed in a pair to the reference: &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;bwa&lt;/span&gt; &lt;span class="n"&gt;aln&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="kt"&gt;short&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fa&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed_1&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed_1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sai&lt;/span&gt;
&lt;span class="n"&gt;bwa&lt;/span&gt; &lt;span class="n"&gt;aln&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="kt"&gt;short&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fa&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed_2&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed_2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sai&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Converting mapping results to a sam file &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;bwa&lt;/span&gt; &lt;span class="n"&gt;sampe&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="kt"&gt;short&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fa&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed_1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sai&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed_2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sai&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed_1&lt;/span&gt; &lt;span class="n"&gt;dmel_trimed_2&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sam&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id="filter-reads-by-mapping-quality-and-convert-to-a-pileup-file"&gt;Filter reads by mapping quality and convert to a pileup file&lt;/h2&gt;
&lt;h3 id="convert-sam-file-into-a-sorted-bam-file"&gt;Convert sam-file into a sorted bam-file&lt;/h3&gt;
&lt;p&gt;Filter by a mapping quality of 20 and convert the sam file into a sorted bam file. Filtering by a mapping qualiy of 20 removes the ambiguously mapped reads. &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;samtools&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sam&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;samtools&lt;/span&gt; &lt;span class="n"&gt;sort&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id="crosscheck"&gt;Crosscheck&lt;/h3&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;samtools&lt;/span&gt; &lt;span class="n"&gt;flagstat&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bam&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The result should be this &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="mi"&gt;22467169&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;QC&lt;/span&gt; &lt;span class="n"&gt;failure&lt;/span&gt;
&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;duplicates&lt;/span&gt;
&lt;span class="mi"&gt;22467156&lt;/span&gt; &lt;span class="n"&gt;mapped&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;100.00&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;22467169&lt;/span&gt; &lt;span class="n"&gt;paired&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sequencing&lt;/span&gt;
&lt;span class="mi"&gt;11234128&lt;/span&gt; &lt;span class="n"&gt;read1&lt;/span&gt;
&lt;span class="mi"&gt;11233041&lt;/span&gt; &lt;span class="n"&gt;read2&lt;/span&gt;
&lt;span class="mi"&gt;22324803&lt;/span&gt; &lt;span class="n"&gt;properly&lt;/span&gt; &lt;span class="n"&gt;paired&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;99.37&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;22439809&lt;/span&gt; &lt;span class="n"&gt;with&lt;/span&gt; &lt;span class="n"&gt;itself&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="n"&gt;mate&lt;/span&gt; &lt;span class="n"&gt;mapped&lt;/span&gt;
&lt;span class="mi"&gt;27347&lt;/span&gt; &lt;span class="n"&gt;singletons&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.12&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;73897&lt;/span&gt; &lt;span class="n"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mate&lt;/span&gt; &lt;span class="n"&gt;mapped&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;different&lt;/span&gt; &lt;span class="n"&gt;chr&lt;/span&gt;
&lt;span class="mi"&gt;73897&lt;/span&gt; &lt;span class="n"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mate&lt;/span&gt; &lt;span class="n"&gt;mapped&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;different&lt;/span&gt; &lt;span class="n"&gt;chr&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mapQ&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id="convert-the-sorted-bam-file-into-a-pileup-file"&gt;Convert the sorted bam file into a pileup file&lt;/h3&gt;
&lt;p&gt;Convert the bam file into a pileup file: &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;samtools&lt;/span&gt; &lt;span class="n"&gt;pileup&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bam&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pileup&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id="run-popoolation"&gt;Run PoPoolation&lt;/h2&gt;
&lt;h3 id="run-variance-slidingpl"&gt;Run Variance-sliding.pl&lt;/h3&gt;
&lt;p&gt;Run the script Variance-sliding.pl with the dme.pileup file requesting Tajima's Pi. We use a window size and a step size of 10000. &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;perl&lt;/span&gt; &lt;span class="n"&gt;Variance&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;sliding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pileup&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;measure&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;coverage&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;coverage&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;qual&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We assume a population size of 500 individuals. Furthermore we require a minimum allele count of 2, a minimum base quality of 20, a minimum coverage of 4 and a maximum coverage of 400. &lt;/p&gt;
&lt;h3 id="visualise-output-of-variance-slidingpl"&gt;Visualise output of Variance-sliding.pl&lt;/h3&gt;
&lt;h4 id="create-an-overview"&gt;Create an overview&lt;/h4&gt;
&lt;p&gt;Create the overview: &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;perl&lt;/span&gt; &lt;span class="n"&gt;Visualise&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pdf&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;ylab&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;chromosomes&lt;/span&gt; &lt;span class="s"&gt;"X 2L 2R 3L 3R 4"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and check out the result: &lt;/p&gt;
&lt;p&gt;&lt;a href="http://popoolation.googlecode.com/files/dmel.pi.pdf" rel="nofollow"&gt;http://popoolation.googlecode.com/files/dmel.pi.pdf&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="use-igv"&gt;Use IGV&lt;/h4&gt;
&lt;p&gt;The IGV may be downloaded from: &lt;a href="http://www.broadinstitute.org/igv" rel="nofollow"&gt;http://www.broadinstitute.org/igv/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;First convert the Drosophila melanogaster pi-file into a wiggle file &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;perl&lt;/span&gt; &lt;span class="o"&gt;~/&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;PopGenTools&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;VarSliding2Wiggle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;trackname&lt;/span&gt; &lt;span class="s"&gt;"dmel Pi"&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wig&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;than index the bam file: &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;samtools&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bam&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Than: &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;open the IGV. &lt;/li&gt;
&lt;li&gt;switch to the genome D. melanogaster reference 5.22 &lt;/li&gt;
&lt;li&gt;load the sorted bam file: dmel.sort.bam &lt;/li&gt;
&lt;li&gt;load the wiggle file: dmel.pi.wig &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;and check out the results &lt;/p&gt;
&lt;p&gt;&lt;a href="http://popoolation.googlecode.com/files/igv_ex1.pdf" rel="nofollow"&gt;http://popoolation.googlecode.com/files/igv_ex1.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://popoolation.googlecode.com/files/igv_ex2.pdf" rel="nofollow"&gt;http://popoolation.googlecode.com/files/igv_ex2.pdf&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="run-variance-at-position"&gt;Run Variance-at-position&lt;/h3&gt;
&lt;h4 id="obtain-and-prepare-a-annotation"&gt;Obtain and prepare a annotation&lt;/h4&gt;
&lt;p&gt;Go to the FlyBase homepage (&lt;a href="http://flybase.org" rel="nofollow"&gt;http://flybase.org/&lt;/a&gt;) and get the annotation: dmel-all-&lt;span&gt;[r5]&lt;/span&gt;.22.gff.gz &lt;/p&gt;
&lt;p&gt;Unzip the file. &lt;/p&gt;
&lt;p&gt;Filter for exons and convert it into a gtf file: &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;cat&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r5&lt;/span&gt;&lt;span class="mf"&gt;.22&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gff&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;awk&lt;/span&gt; &lt;span class="err"&gt;'$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;"FlyBase"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;"exon"&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;perl&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pe&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="o"&gt;^:&lt;/span&gt;&lt;span class="p"&gt;;]&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="p"&gt;)([&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="p"&gt;;]&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;.&lt;/span&gt;&lt;span class="err"&gt;*/&lt;/span&gt;&lt;span class="n"&gt;gene_id&lt;/span&gt; &lt;span class="s"&gt;"$1"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;transcript_id&lt;/span&gt; &lt;span class="s"&gt;"$1:1"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;exons&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gtf&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; the script Variance-at-position.pl will group all exons with the same gene_id; the transcript_id is not considered &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; UCSC (the Tables section) already provides gtf formated annotation -&amp;gt; if you use those and the proper reference sequence no reformating would be necessary! &lt;/p&gt;
&lt;p&gt;Run Variance-at-position.pl &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;perl&lt;/span&gt; &lt;span class="n"&gt;Variance&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;at&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;qual&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;coverage&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;coverage&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;pileup&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pileup&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;gtf&lt;/span&gt; &lt;span class="n"&gt;exons&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gtf&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;dmel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;genes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;measure&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We set the pool size to 500, request pi as Population Genetic estimator, use the previously generated pileup file and the reformated exon annotation. &lt;/p&gt;
&lt;p&gt;Output will be something like the following: &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;FBgn0042083&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;   &lt;span class="mf"&gt;0.961&lt;/span&gt;   &lt;span class="mf"&gt;0.002524220&lt;/span&gt;
&lt;span class="n"&gt;FBgn0027066&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;   &lt;span class="mf"&gt;0.970&lt;/span&gt;   &lt;span class="mf"&gt;0.001810036&lt;/span&gt;
&lt;span class="n"&gt;FBgn0033100&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;   &lt;span class="mf"&gt;0.989&lt;/span&gt;   &lt;span class="mf"&gt;0.001972413&lt;/span&gt;
&lt;span class="n"&gt;FBgn0033101&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;   &lt;span class="mf"&gt;0.664&lt;/span&gt;   &lt;span class="mf"&gt;0.001424038&lt;/span&gt;
&lt;span class="n"&gt;CG9438&lt;/span&gt;  &lt;span class="mi"&gt;8&lt;/span&gt;   &lt;span class="mf"&gt;0.900&lt;/span&gt;   &lt;span class="mf"&gt;0.003091130&lt;/span&gt;
&lt;span class="n"&gt;FBgn0085421&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;  &lt;span class="mf"&gt;0.936&lt;/span&gt;   &lt;span class="mf"&gt;0.003365280&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;column 1: the gene id (or id of the region) &lt;/li&gt;
&lt;li&gt;column 2: the number of SNPs found in the gene &lt;/li&gt;
&lt;li&gt;column 3: covered fraction; how much of the gene is sufficiently covered by the pileup file; values 0-1; 0..not a single base of the gene has the required coverage, 1.. all bases of the gene have the required coverage &lt;/li&gt;
&lt;li&gt;column 4: the measure; here Tajima's pi &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; This script is loading the whole annotation into the memory and internally converts it into a per-base hash. For very large gtf files this can be quite memory demanding. In case you do not have enough memory you may split your gtf file and analyse them separately. As long as you are not splitting genes, this will not affect the outcome. For example the following command will retrieve only entries from chromosome 2R; &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;cat&lt;/span&gt; &lt;span class="n"&gt;exons&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gtf&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;awk&lt;/span&gt; &lt;span class="err"&gt;'$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;"2R"&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;exons&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gtf&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Mon, 16 Mar 2015 14:28:49 -0000</pubDate><guid>https://sourceforge.net7b1e45d81812ea38c150d53c7957ca46a1e1021c</guid></item></channel></rss>