Menu

Home

Junho Kim

SoloDel

SoloDel (Somatic Low-frequent Deletion call model) is a Java based somatic deletion caller designed for whole-genome sequencing data from unmatched samples. SoloDel is specialized for identifying somatic deletions with frequently existing sampling issues : low mutational frequency in cell population and absence of the matched control samples.
The important features of SoloDel are:

  • Estimation of mutational frequency in a mixed disease sample without matched control
  • Improved somatic deletion calling based on the probabilistic model with the parameter estimation

Get the most recent version of SoloDel here:



Prerequisite softwares

Installation of SoloDel

SoloDel was developed using JAVA JDK 7 64bit. To run SoloDel, Java Runtime Environment (JRE) version 1.7.x or later is required. Download the most recent version of SoloDel program in the code page and extract the gzipped archive.

tar -zxvf SoloDel-1.x.y.tar.gz

It contains the followings:

  1. SoloDel.jar
    : Executable JAR file of SoloDel
  2. README.txt
    : Nothing much in this file currently. Please refer to this page instead.
  3. config.txt
    : Configuration file for SoloDel. Parameters can also be applied by input options. Note that config.txt has lower priority.
  4. hg18/hg19_gaps.txt
    : Files for gap information
  5. lib/
    : directory containing other JAR libraries for SoloDel

Running SoloDel

To run SoloDel, you can simply run the JAR executable file like below:

java -jar SoloDel.jar

The -h or -? options will bring the following usage of SoloDel. If you see this, you are ready to run SoloDel.


SoloDel: Somatic Low-frequent Deletion Caller model.
Version: 1.0.0

Usage: java -jar SoloDel.jar -R <reference.fa> -D <mixed_disease_sample.bam> -d <initial_disease_deletion_list> [options]</initial_disease_deletion_list></mixed_disease_sample.bam></reference.fa>

-R STRING   human genome reference fasta
-D STRING   indexed bam file for disease input
-d STRING   initial deletion list from disease bam file [breakdancer .out or DELLY .vcf]

Input options:

-N STRING   indexed bam file for matched control input
-n STRING   initial deletion list from matched control bam file [breakdancer .out, DELLY .vcf]
-w PATH     working directory [directory of disease input]
-b PATH     path for initial deletion list
-f PATH     path for fastahack
-t PATH     path for blat
-g STRING   human genome build [hg18/hg19]
-i INT      insert size
-l INT      read length
-q INT      mapping quality [20]

Sampling options:

-p FLOAT    mutation frequency thrd for homozygous deletions [0.6]
-c INT      minimum depth of coverage for deletion call [0]
-C INT      maximum depth of coverage for deletion call (duplicate threshold) [1000]
-m FLOAT    minimum deviation between mean values of mixture model [0.05]

Output options:

-o FILE     header of report files [<brainsample.bam>]


Inputs and options

Mandatory inputs:

There are three mandatory inputs for SoloDel. Those inputs can be given by configuration file (config.txt).

Input Option Description
Reference sequence -R FASTA formatted reference sequence file. The reference must be indexed.
Mixed disease data -D BAM formatted alignment file for mixed disease sample. The BAM file must be (coordinate) sorted and indexed.
Initial deletion list -d Breakdancer SV call file (.out) or DELLY SV call file (.vcf).


Options:

There are several options you can give to SoloDel for more accurate running.

Option Default Value Description
N Matched control data BAM formatted alignment file for matched control sample. The BAM file must be (coordinate) sorted and indexed. (optional)
n Initial deletion list from matched control data Breakdancer SV call file (.out) or DELLY SV call file (.vcf). (optional)
w Directory containing alignment file Working directory of SoloDel. All outputs will be saved in this directory
b Directory containing initial deletion list file Path for initial deletion list files.
f Path for fastahack. Be sure to set the full path to executable.
t Path for blat. Be sure to set the full path to executable.
g Human genome build [hg18/hg19]
i Insert size
l Read length
q 20 Mapping quality threshold to filter deletion candidates with low-confidence.
p 0.6 Mutation frequency threshold to filter homozygous deletions.
c 0 Minimum depth of coverage for deletion call. The read-depth of flanking regions less than this value will not be considered as deletion candidates.
C 1000 Maximum depth of coverage for deletion call. The read-depth of flanking regions more than this value will not be considered as deletion candidates.
m 0.05 Minimum deviation threshold between mean values of mixture model. Mixture models that the deviation between mean values is less than this value will be rejected.
o Header of report files.


Output

SoloDel outputs the following files:

  1. OutputHeader.somatic.call
    : List of final somatic candidates with somatic probability scores. This file will be empty if the mixture model is rejected.
  2. OutputHeader.EM.summary
    : Reports estimated parameters by EM algorithm with the log-likelihoods of germline and mixture model.
  3. OutputHeader.germLL
    : Reports probabilities of each deletion candidate based on the germline model.
  4. OutputHeader.mixedLL
    : Reports probabilities of each deletion candidate based on the mixture model.

The following directories and files are generated by SoloDel for its internal use:

  1. fasta/
    : Captured sequences to check false positive deletion calls. Used as query for blat.
  2. psl/
    : Outputs from blat to filter false positive deletion calls by sequence homology.

Notes

  • SoloDel is a software that accurately classifies low-frequent somatic deletions from germline ones called by external initial SV caller. Therefore, raw SV calls from the external tool are mandatory for running. Currently, SoloDel supports BreakDancer and DELLY.
  • Since SoloDel selects somatic deletions only from the initial call sets, enough coverage of WGS data is recommended to capture low-frequent somatic deletions by external callers. As more than three of supporting anomalous reads are required for general SV detection, ~70x coverage is needed to detect somatic deletions with less than 10% of cell population.
  • As mentioned above, SoloDel does not directly detect somatic deletions but utilizes the results of initial SV callers and classifies them into germline and somatic deletions. Therefore, we recommend to incorporate SoloDel with your SV analysis framework to provide additional analysis of somatic deletion.

MongoDB Logo MongoDB