<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to FlankDesign</title><link>https://sourceforge.net/p/cuscoquality/wiki/FlankDesign/</link><description>Recent changes to FlankDesign</description><atom:link href="https://sourceforge.net/p/cuscoquality/wiki/FlankDesign/feed" rel="self"/><language>en</language><lastBuildDate>Wed, 15 Sep 2021 08:46:45 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/cuscoquality/wiki/FlankDesign/feed" rel="self" type="application/rss+xml"/><item><title>FlankDesign modified by Filip Wierzbicki</title><link>https://sourceforge.net/p/cuscoquality/wiki/FlankDesign/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v1
+++ v2
@@ -11,6 +11,8 @@
 The approach requires a reference genome with annotations of your regions of interest.
 In this walkthrough, we use the reference genome of *A. thaliana* TAIR10 (genbank assembly accession: GCA_000001735.1) and annotations of 10 KEEs  (DOI: https://doi.org/10.1016/j.molcel.2014.07.009) in bed format. 
 Both input file can be found here: https://sourceforge.net/projects/cuscoquality/files/Walkthrough/FlankDesign/
+
+Note that names of the reference fasta and annotations (e.g. piRNA cluster IDs) are required not to contain dashes since "-" is reserved as field separator in the step "Converting the sam file...".

 #Obtaining annotations of flanks
 Based on piRNA cluster annotations, we create a file that contains the information on the flanks using the script flankbeder.sh.
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Filip Wierzbicki</dc:creator><pubDate>Wed, 15 Sep 2021 08:46:45 -0000</pubDate><guid>https://sourceforge.net46cd4c2ea8f1480b1c49439356a466381af5fc10</guid></item><item><title>FlankDesign modified by Filip Wierzbicki</title><link>https://sourceforge.net/p/cuscoquality/wiki/FlankDesign/</link><description>&lt;div class="markdown_content"&gt;&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;This walkthrough shows how to obtain flanking sequences of certain regions that can be used by CUSCO.&lt;/p&gt;
&lt;h2 id="requirements"&gt;Requirements&lt;/h2&gt;
&lt;p&gt;For this walkthrough you need to install the following&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;bwa (alignment alogrithm)&lt;/li&gt;
&lt;li&gt;samtools&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="preparatory-work"&gt;Preparatory work&lt;/h1&gt;
&lt;p&gt;The approach requires a reference genome with annotations of your regions of interest.&lt;br/&gt;
In this walkthrough, we use the reference genome of &lt;em&gt;A. thaliana&lt;/em&gt; TAIR10 (genbank assembly accession: GCA_000001735.1) and annotations of 10 KEEs  (DOI: &lt;a href="https://doi.org/10.1016/j.molcel.2014.07.009" rel="nofollow"&gt;https://doi.org/10.1016/j.molcel.2014.07.009&lt;/a&gt;) in bed format. &lt;br/&gt;
Both input file can be found here: &lt;a href="https://sourceforge.net/projects/cuscoquality/files/Walkthrough/FlankDesign/"&gt;https://sourceforge.net/projects/cuscoquality/files/Walkthrough/FlankDesign/&lt;/a&gt;&lt;/p&gt;
&lt;h1 id="obtaining-annotations-of-flanks"&gt;Obtaining annotations of flanks&lt;/h1&gt;
&lt;p&gt;Based on piRNA cluster annotations, we create a file that contains the information on the flanks using the script flankbeder.sh.&lt;br/&gt;
For each piRNAc cluster, it writes entries of 5 1kb-flanks at both ends. &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;bash flankbeder.sh KEEs.bed .
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="obtaining-the-sequences-of-flanks-using-samtools"&gt;Obtaining the sequences of flanks using samtools&lt;/h1&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;bash flankparser.sh IDs flanks.bed TAIR10.fasta flank-fasta
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="aligning-flanks-back-to-the-reference-genome-using-bwa"&gt;Aligning flanks back to the reference genome using bwa&lt;/h1&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;bwa index TAIR10.fasta
bwa bwasw TAIR10.fasta resources/flanks.fasta &amp;gt; resources/TAIR10.sam
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="converting-the-sam-file-to-a-custom-format-and-filtering-for-unique-flanks-min-mq-5"&gt;Converting the sam file to a custom format and filtering for unique flanks (min. mq 5)&lt;/h1&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;cat resources/TAIR10.sam|grep -v '^@'|awk '$5&amp;gt;4'|awk '{print $1,$2,$3,$4,$5}'|awk -F'[+-]' '{print $1,$2,$3,$4,$5,$6,$0}'|awk '{print $1,$2,$3,$4,$5,$6,$7}'|grep '\-l\|\+r' &amp;gt; resources/TAIR10.mod
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="validating-flanks-using-flank_validationpy"&gt;Validating flanks using flank_validation.py&lt;/h1&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;python flank_validation.py --bed KEEs.bed --modsam resources/TAIR10.mod --inner 100 --outer 5000 &amp;gt; resources/TAIR10.validated.tmp
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="generating-the-cluster-definition-file-for-cusco"&gt;Generating the cluster definition file for cusco&lt;/h1&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;cat resources/TAIR10.validated.tmp|grep -v '&amp;lt;class'|awk '{print $1,$0}'|sort -u -k1,1|awk '{print $2,$14,$10,$11,$8,$9,$9+1000,"+",$13,$12,$12+1000,"+"}' &amp;gt; resources/cluster-definition-file
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Now, we use the files resources/flanks.fasta and resources/cluster-definition-file to run perform the cusco. The files should be identical to those provided here: &lt;a href="https://sourceforge.net/projects/cuscoquality/files/Walkthrough/FlankDesign/output/"&gt;https://sourceforge.net/projects/cuscoquality/files/Walkthrough/FlankDesign/output/&lt;/a&gt;&lt;br/&gt;
md5sum:&lt;br/&gt;
flank.fasta 0052aa4cc2012b389a5004918065890b&lt;br/&gt;
cluster-definition-file 31100240013fa20b1be37f0367ae617c&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Filip Wierzbicki</dc:creator><pubDate>Thu, 15 Apr 2021 12:13:22 -0000</pubDate><guid>https://sourceforge.netd4eb73883891d4decb4f3533daa76011c4150d40</guid></item></channel></rss>