<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Miscellaneous</title><link>https://sourceforge.net/p/geneticthesaurus/wiki/Miscellaneous/</link><description>Recent changes to Miscellaneous</description><atom:link href="https://sourceforge.net/p/geneticthesaurus/wiki/Miscellaneous/feed" rel="self"/><language>en</language><lastBuildDate>Sat, 03 May 2014 06:54:34 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/geneticthesaurus/wiki/Miscellaneous/feed" rel="self" type="application/rss+xml"/><item><title>Miscellaneous modified by Tomasz</title><link>https://sourceforge.net/p/geneticthesaurus/wiki/Miscellaneous/</link><description>&lt;div class="markdown_content"&gt;&lt;h2 id="tips-and-tricks"&gt;Tips and Tricks&lt;/h2&gt;
&lt;h4 id="index-your-genome"&gt;Index your genome&lt;/h4&gt;
&lt;p&gt;Many of the programs making up GeneticThesaurus require information about the reference genome, provided through the --genome argument. Index your genome for better performance - see the samtools page for help.&lt;/p&gt;
&lt;h4 id="target-regions"&gt;Target regions&lt;/h4&gt;
&lt;p&gt;Instead of using a whole-genome thesaurus, you may want to study only select regions. To avoid manipulating very large files, you can create a smaller version of the thesaurus containing entries that are pertitent for you, e.g. &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;java&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;GeneticThesaurus&lt;/span&gt; &lt;span class="n"&gt;subset&lt;/span&gt;
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;thesaurs&lt;/span&gt; &lt;span class="n"&gt;thesaurus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tsv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;small&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;thesaurus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tsv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="n"&gt;chr1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2000000&lt;/span&gt;
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;bed&lt;/span&gt; &lt;span class="n"&gt;mybed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bed&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This should create a small thesaurus file pertaining for a one-megabase section of chr1 and whatever regions you specify in the bed file. &lt;/p&gt;
&lt;h4 id="summarize-thesaurus-regions"&gt;Summarize thesaurus regions&lt;/h4&gt;
&lt;p&gt;You can obtain a summary of all the regions that can be annotated with the thesaurus resource,&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;java&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;GeneticThesaurus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt; &lt;span class="n"&gt;summarize&lt;/span&gt;
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;genome&lt;/span&gt; &lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;genome&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fa&lt;/span&gt;
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;thesaurus&lt;/span&gt; &lt;span class="n"&gt;thesaurus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tsv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;thesaurus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;align&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bed&lt;/span&gt;
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;what&lt;/span&gt; &lt;span class="n"&gt;align&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This should create a bed file containing intervals that are described in the thesaurus. In more detail, the bed file contains intervals upon which reads were mapped during thesaurus generation. &lt;/p&gt;
&lt;p&gt;You can also obtain a similar track showing the intervals from which reads originated but were also mapped elsewhere. Ideally, this track should be equivalent to the one described above (mapping symmetry). Unfortunately, this is not always the case in practice because of imperfect mapping. However, discrepancies affect only small repetiive segments - similar fragments substantially longer than the read length should be captured correctly.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tomasz</dc:creator><pubDate>Sat, 03 May 2014 06:54:34 -0000</pubDate><guid>https://sourceforge.netf981aef0df9f48cab594014bade1f3a5f0fa84dc</guid></item></channel></rss>