README
Mobster
Accurate Detection of Mobile Element Insertions (MEIs)
Version 0.1.6
Mobster is used to detect novel (non-reference) Mobile Element Insertion (MEI)
events in BAM files.
Mobster is released under a GNU GPLv2 licence.
Date: April 13 2013
Author: Djie Tjwan Thung
Contact: djieth@users.sourceforge.net
Pre-requisites
==============
Mobster is a Java based program, and requires a JRE of 1.6 or higher to be installed.
It has been tested on a Linux based (CentOS) 64-bit machine.
Furthermore it requires a mapper to map discordant reads to the mobilome.
This mapper can be either MOSAIK (tested with v 2.1.33)
or bwa (tested with v 0.5.9).
Optionally, Picard (picard.sourceforge.net) needs to be installed for estimating
insert size metrics and aiding in the clustering of discordant reads. Picard is
provided in the release.
Installation
============
#Unzip the Mobster release
unzip Mobster-<VERSION>.zip
#Install MOSAIK (we have tested Mobster with MOSAIK v 2.1.33, but newer versions should be compatible)
#Try the binaries directly or try building from source
wget https://mosaik-aligner.googlecode.com/files/MOSAIK-2.1.33-Linux-x64.tar
wget https://mosaik-aligner.googlecode.com/files/MOSAIK-2.1.33-source.tar
#Even if the binaries are working, you should untar (tar -xf) the *-source.tar and copy
#the two files 2.1.26.pe.100.0065.ann and 2.1.26.se.100.005.ann from the folder networkFile
#into the Mobster/MOSAIK folder.
#Finally add the Mosaik binaries like MosaikBuild to your $PATH, you can do this by:
cd
vim .bash_profile #or .profile
#add the directory where the mosaik binaries are located to the PATH= line.
#or if you do not want this, change the MOBIOME_MAPPING_CMD in the ./Mobster/jars/Mobster.properties file to point to the location of both MosaikBuild and MosaikAlign
#You are ready to go!
#Test mobster by:
cd ./Mobster/jars/
java -jar Mobster.jar -properties Mobster.properties
After a few minutes there will be MEI predictions in ./Mobster/test-out/
Input files
===========
Mobster calls MEIs on .BAM files. BAM files need to have a X0 tag (i.e. BWA aligned)
or a ZA tag (i.e. MOSAIK aligned) to identify uniquely mapped reads.
Running Mobster
===================
Running Mobster can be done either through:
(1) Running of Master jar file (Mobster.jar) with one properties file.
In this case, with a single command, MEI predictions will be made.
(2) Running multiple (3) jar files with either properties files or
with command line arguments: for easier integration in existing pipelines.
Run Mobster (1): Through Master jar file
===========================================
cd ./Mobster/jars
java -jar Mobster.jar -properties Mobster.properties -in in_file.bam -out out_file_prefix -sn sample_name
#Note you probably need to reserve more memory in order for Mobster to work. For instance on the CEU trio (~300GB
#bam files), Mobster was run reserving 8GB of memory: java -Xmx8G -jar Mobster.jar
#Note a detailed explanation on all the Mobster properties
#are found in the comment lines in the Mobster.properties file itself
#Note 2: It might be important to change some of the default Mobster.properties for your specific .bam file
#I.e. an important property to set is MAPPING_TOOL. Use bwa if your original bam file was aligned with bwa.
#Use mosaik if your original bam file was aligned with MOSAIK.
#Another property to take into account is for instance READ_LENGTH.
Run Mobster (2): Through multile jar files
===========================================
cd ./Mobster/jars
In sequential order execute without arguments to see the help, then run with either a properties file or with command line arguments (dont forget to run with more memory, i.e. -Xmx8G):
Step 1. java -jar PotentialMEIReadFinder.jar
Step 2. Now do the mobiome mapping on the resulting .fq file with MOSAIK
Step 3. java -jar RefAndMEPairFinder.jar
Step 4. java -jar AnchorClusterer.jar
Frequently asked questions
==========================
Q: I am getting the following error: Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
A: Try feeding Mobster with more memory through the -Xmx argument. I.e. java -Xmx8G -jar Mobster.jar for reserving 8Gb of memory
When increasing the amount of reserved memory does not help, it may be that some of the read groups in your BAM file are of low quality.
If a relatively high percentage of reads are unmapped or discordantly mapped, they will fill up the memory. You can check the
statistics of mapping using Picard or samtools and then remove the appropriate read groups from your BAM file.
Q: I am interested in doing a trio analysis / multi sample analysis. Is this possible?
A: Multi sample analysis is possible for merged .BAM files with an appropriate header. I.e. read group tags @RG should be defined
and each @RG tag should have an appropriate sample name. Then, by using the following values in the Mobster.properties file,
multi-sample calling can be activated by making sure the following properties are set:
MULTIPLE_SAMPLE_CALLING=true
MULTIPLE_SAMPLE_CALLING_STRINGENT=false
Q: I want to run Mobster on a .bam file mapped with an other aligner than bwa or MOSAIK, is this possible?
A: Currently this is only possible when the .bam file has ZA tags or X0 tags for reads in the .bam file.
In a future release, support for all BAM files is planned by defining uniquely mapped reads not through
the ZA or X0 tag, but by MAPQ value.
Known Issues
===========================================
Reference sequences in the original bam file which are not in the following dictionary (chr may or may not be prefixed to the chromosome number):
{"chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX", "chrY", "chrM"}
Will give a warning when running the AnchorClusterer phase: You should not compare regions with invalid chromosomes!
You can ignore this warning. Predictions will still be made on these reference sequences.
Updates
==========================================
January 5 2014:
Updated merging algorithm in AnchorClusterer when merging discordant and split clusters. Predictions will now be merged, when possible,
when prediction windows overlap: resulting in no - almost no redundant predictions.
March 26 2014:
Added support for multi-sample analysis: the number of supporting reads from each sample for a predicted MEI event are given, thus increasing sensitivity.
References / Acknowledgements
=============================
The authors wish to thank the authors of the following tools, which make Mobster possible:
1) Developers of Picard tools and the sam JDK (picard.sourceforge.net).
2) The authors of the MOSAIK alignment software. Our software of choice for mapping reads against mobile elements:
MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping by Wan-Ping Lee, Michael P. Stromberg, Alistair Ward, Chip Stewart, Erik P. Garrison, Gabor T. Marth
http://dx.plos.org/10.1371/journal.pone.0090581