RepeatMap Code
Status: Beta
Brought to you by:
aarvey
Authors: Aaron Arvey and Eugene Ie #################################################### Installing #################################################### There is currently no real installation method. You'll need to have the Apacha Ant (preferably a more recent version, we didn't back text) to compile the package. Binary distributions of Ant are available from the Apache software group. To get the repeatmap code, download the from sourceforge via $> and compile using ant: $> ant jar #################################################### Creating Dictionaries #################################################### Start with a simple test. To test the code, try $> ./make_dictionary.sh data/test/test.fasta 20 2 Now look at the files and see if they make sense. You may want to start a server to "look" at the file contents (see below). If the files make sense, then you should continue and make a dictionary for your desired genome. You should run $> ./make_dictionary.sh without any arguments to better understand the arguments. Making a dictionary can require *a lot* of memory and a very long time (depending on your computation power). YOU NEED TO UNDERSTAND HOW THE SCRIPT WORKS BEFORE RUNNING IT! IT CAN EASILY CRASH YOUR COMPUTER AND DO IRREPAIRABLE DAMAGE TO YOUR DISK, CPU, AND/OR OTHER HARDWARE COMPONENTS OF YOUR COMPUTER! Once you have a solid understanding of the individual components of how to make a dictionary, you can make a dictionary. An appropriate way to do this is may be: $> ./make_dictionary.sh data/yourgenome/genome.fasta 20 10 If you have abunch of chromosome files, perhaps the easiest way to combine them is $> cat chr*.fasta > genome.fasta It is quick and dirty, but it'll work as long as you have the space on your harddrive. #################################################### Starting Dictionary Servers #################################################### First things first, make sure your computer allows server ports to be created. Amazingly, you shouldn't need to have any special priveleges (at least on Unix systems). However, if your computer has strict port privileges (as on many online servers), you may need to add holes to the firewall. Depending on how you configured your machine, you may need to toy with iptables (Linux) or the Windows equivilent. Assuming you get the system configured appropriately, starting a server should be very easy. You can start the server on you desktop at work, a laptop at home, or a fancy managed enterprise server. The size of the memory (RAM) should be at least twice the size of the dictionaries you wish to put in memory. The size of the dictionary can be determined by $> ll -h repeats.bin counts.bin The sum of these two files is the complete dictionary. So if both files are 100MB, resulting in a 200MB dictionary, you need to have 400MB RAM. You now need to create a dictionary server configuration file. Examples are provided in the ./data directory. For instance, you probably want to use a port above 1024 (below are meant for root/admin purposes), so we'll use port 5454 in this example. Then, we specify the kmer-size, path to repeats.bin, path to counts.bin, a very brief description of the dictionary. So an example might be: 5454 16, ./dmel/dmel16/repeats.bin, dmel/dmel16/kmers_count.txt, D melanogaster 20, ./dmel/dmel20/repeats.bin, dmel/dmel20/kmers_count.txt, D melanogaster This results in 2 dictionary servers being started on 5454. Both are for D. melanogaster, but different kmer-sizes are used. Both of these dictionaries listen for incoming queries on 5454. The client program (e.g. ProbeDesigner) can then query the server on port 5454 to determine kmer-counts. #################################################### Testing #################################################### You can edit code and recompile, erase old runs, and rerun with: $> ant jar && rm -rf data/test/20/ && ./make_dictionary.sh data/test/test.fasta 20 2