RepeatMap Code

Status: Beta

Brought to you by: aarvey

Tree [r7] /

History

HTTPS access

File	Date	Author	Commit
.externalToolBuilders	2008-10-21	aarvey	[r1] Adding all files.
.settings	2008-10-21	aarvey	[r1] Adding all files.
bin	2008-10-21	aarvey	[r1] Adding all files.
condor	2008-10-21	aarvey	[r1] Adding all files.
data	2009-06-29	aarvey	[r6] Outputs dictionary if argument given to dict se...
introns	2008-10-21	aarvey	[r1] Adding all files.
lib	2008-10-21	aarvey	[r1] Adding all files.
nbproject	2008-10-21	aarvey	[r1] Adding all files.
scripts	2009-07-31	aarvey	[r7] Should be final version prior to paper submission
src	2009-06-29	aarvey	[r6] Outputs dictionary if argument given to dict se...
.classpath	2008-10-21	aarvey	[r1] Adding all files.
.cvsignore	2008-10-21	aarvey	[r1] Adding all files.
.project	2008-10-21	aarvey	[r1] Adding all files.
LICENSE	2008-10-21	aarvey	[r2] Added test data file, also removed classpath fr...
README	2008-10-21	aarvey	[r1] Adding all files.
build-user.xml	2008-10-21	aarvey	[r1] Adding all files.
build.xml	2008-10-21	aarvey	[r1] Adding all files.
execute.xml	2008-10-21	aarvey	[r1] Adding all files.
make_dictionary.sh	2009-07-31	aarvey	[r7] Should be final version prior to paper submission
manifest.mf	2008-10-21	aarvey	[r1] Adding all files.
svn-commit.tmp	2008-10-21	aarvey	[r1] Adding all files.

Read Me

Authors: Aaron Arvey and Eugene Ie

####################################################
Installing
####################################################

There is currently no real installation method.  You'll need to have
the Apacha Ant (preferably a more recent version, we didn't back text)
to compile the package.  Binary distributions of Ant are available
from the Apache software group.

To get the repeatmap code,  download the from sourceforge via

$> 

and compile using ant:

$> ant jar 

####################################################
Creating Dictionaries
####################################################

Start with a simple test.  To test the code, try

$> ./make_dictionary.sh data/test/test.fasta 20 2 

Now look at the files and see if they make sense.  You may want to
start a server to "look" at the file contents (see below). 

If the files make sense, then you should continue and make a
dictionary for your desired genome.  

You should run 

$> ./make_dictionary.sh

without any arguments to better understand the arguments.  Making a
dictionary can require *a lot* of memory and a very long time
(depending on your computation power).  YOU NEED TO UNDERSTAND HOW THE
SCRIPT WORKS BEFORE RUNNING IT!  IT CAN EASILY CRASH YOUR COMPUTER AND
DO IRREPAIRABLE DAMAGE TO YOUR DISK, CPU, AND/OR OTHER HARDWARE
COMPONENTS OF YOUR COMPUTER!

Once you have a solid understanding of the individual components of
how to make a dictionary, you can make a dictionary.  An appropriate
way to do this is may be:

$> ./make_dictionary.sh data/yourgenome/genome.fasta 20 10

If you have abunch of chromosome files, perhaps the easiest way to
combine them is

$> cat chr*.fasta > genome.fasta

It is quick and dirty, but it'll work as long as you have the space on
your harddrive.


####################################################
Starting Dictionary Servers
####################################################

First things first, make sure your computer allows server ports to be
created.  Amazingly, you shouldn't need to have any special priveleges
(at least on Unix systems).  However, if your computer has strict port
privileges (as on many online servers), you may need to add holes to
the firewall.  Depending on how you configured your machine, you may
need to toy with iptables (Linux) or the Windows equivilent.

Assuming you get the system configured appropriately, starting a
server should be very easy.  You can start the server on you desktop
at work, a laptop at home, or a fancy managed enterprise server.  The
size of the memory (RAM) should be at least twice the size of the
dictionaries you wish to put in memory.  The size of the dictionary
can be determined by

$> ll  -h repeats.bin  counts.bin

The sum of these two files is the complete dictionary.  So if both
files are 100MB, resulting in a 200MB dictionary, you need to have
400MB RAM.

You now need to create a dictionary server configuration file.
Examples are provided in the ./data directory.  For instance, you
probably want to use a port above 1024 (below are meant for root/admin
purposes), so we'll use port 5454 in this example.  Then, we specify
the kmer-size, path to repeats.bin, path to counts.bin, a very brief
description of the dictionary.  So an example might be:

5454
16, ./dmel/dmel16/repeats.bin, dmel/dmel16/kmers_count.txt, D melanogaster
20, ./dmel/dmel20/repeats.bin, dmel/dmel20/kmers_count.txt, D melanogaster

This results in 2 dictionary servers being started on 5454.  Both are
for D. melanogaster, but different kmer-sizes are used.  Both of these
dictionaries listen for incoming queries on 5454.  The client program
(e.g. ProbeDesigner) can then query the server on port 5454 to
determine kmer-counts.

####################################################
Testing
####################################################

You can edit code and recompile, erase old runs, and rerun with:

$> ant jar && rm -rf data/test/20/ && ./make_dictionary.sh data/test/test.fasta 20 2

RepeatMap Code

Tree [r7] / Download Snapshot History

Read Me

Tree [r7] /

History