qSV is a sensitive multi-method structural variant detection tool that has been developed for whole genome paired end or mate pair sequencing. This implementation of qSV integrates independent findings from soft clipping and discordant mapped pair analyses and increases accuracy of breakpoint, micro homology and non template sequence detection by the incorporation of a localized de novo assembly of abnormal reads and split contig alignment.
qSV requires java7 and a machine with 8 cores (hyperthreaded) and at least 40GB of RAM.
Download the qsv tar file
Untar the tar file into a directory of your choice
You should see jar files for qsv and its dependencies:
[oholmes@minion0 qsv]$ tar xjvf qsv-0.3.tar.bz2
x antlr-3.2.jar
x htsjdk-1.140.jar
x ini4j-0.5.2-SNAPSHOT.jar
x jopt-simple-4.6.jar
x picard-lib.jar
x qbamannotate-0.3pre.jar
x qbamfilter-1.2.jar
x qcommon-0.3.jar
x qio-0.1pre.jar
x qpicard-1.1.jar
x qsv-0.3.jar
x trove-3.1a1.jar
[oholmes@minion0 qsv]$
Before running qSV, you must install the BLAT software suite.
qsv requires 2 arguments in order to run:
Run the following command to start execution.
java -Xmx40g -jar /full/path/qsv-0.3.jar -ini /full/path/qsv.demo.ini -tmp /tmp/directory
qSV has two modes which can be run separately or together:
An example ini file follows:
[general]
log = name of log file
loglevel = INFO or DEBUG
sample = donor or patient id
sv_analysis = Type of sv analysis: pair, clip, both
output = output directory
reference = path to fasta reference file
platform=solid or illumina
min_insert_size = minimum size of SV insert. Default 50.
isize_records=number of records per read group used to calculate isize
range=specify one or more chromosomes or inter for translocations
repeat_cutoff=specified number of clipped reads to define a potential repeat region
[pair]
pairing_type = type of reads: lmp (solid Long Mate Pair), pe (Paired End) or imp (illumina mate-pair)
mapper = mapping eg bioscope, lifescope, bwa,bwa-mem, novoalign
pair_query = Filtering query for discordant pairs eg. and(Cigar_M> 34, option_SM>10, MD_mismatch < 3, Flag_DuplicateRead == false)
cluster_size = number of discordant reads required to define a cluster
filter_size = number of control reads in a cluster to classify it germline
[clip]
clip_query = Filtering query for clips: eg and(Cigar_M> 34, MD_mismatch < 3, MAPQ >0,Flag_DuplicateRead == false)
clip_size = number of reads required to proceed with soft clip SV signature detection
consensus_length = minimum length of soft clip consensus sequence
blatpath = path to blat executable /home/Software/BLAT
blatserver = name of blat server eg:localhost
blatport = port for blat server
single_side_clip = If SV signatures with soft clip evidence at one breakpoint should be included
[test]
name = id for the sample
input_file = location of the test/disease bam. Must be co-ordinate sorted
[test/size_1]
rgid = Read Group ID
lower = lower insert size
upper = upper insert size
[control]
name = id for the control sample
input_file = location of the control sample
[control/size_1]
rgid = Read Group ID
lower = lower insert size
upper = upper insert size
A more detailed description of the ini file options is listed in the table below:
Section | Option | Required/Optional | Description [Default value] |
---|---|---|---|
general | log | optional | Name of log file [sample_name.log] |
loglevel | optional | Logging level required, e.g. INFO,DEBUG. [INFO] | |
sample | required | Donor/sample id eg PatientA | |
sv_analysis | optional | Use this option to specify what type of sv_analysis will be carried out in qsv. - pair: discordant pair SV detection - clip: soft clipping SV detection - both: SV detection using both discordant pairs and soft clips [both] | |
output | required | Output directory. A results folder for the analysis will be automatically created based on sample and date. Eg. output directory is /home/test/qsv. Results will be written to: /home/test/qsv/qSV_patientA_20121025_1111 | |
reference | required | Path to the reference genome file. Must also have a .fai index file. This can be generated using samtools ‘faidx’ program [1] | |
platform | required | Platform used for sequencing: solid or illumina [illumina] | |
min_insert_size | optional | Minimum size of insert for potential SVs. [50] | |
range | optional | Specify one or more chromosomes. Specify inter for translocations. | |
repeat_cutoff | optional | The number of clipped reads that will define define a potential repeat region (see SV category 5) [1000] | |
pair | pairing_type | Required for discordant pair | Specify the type of read pairing used: - lmp =solid long mate pair - imp (illumina mate pair) - pe = paired end [pe] |
mapper | Required for discordant pair | Mapping tool used to map reads: - for lmp: bioscope or lifescope - for pe: bwa [bwa] | |
pair_query | optional | The filtering query to filter the discordant pair reads. (See FILTER OPTIONS) Default is: - If minimal/no filtering is required, use: Flag_DuplicateRead == false [- for lmp: and(Cigar_M> 35, option_SM> 14, MD_mismatch< 3, Flag_DuplicateRead == false) - for pe or imp: and(Cigar_M> 35, option_SM> 10, MD_mismatch< 3, Flag_DuplicateRead == false)] | |
cluster_size | optional | Number of reads required to define a cluster. [3] | |
filter_size | optional | Number of control reads in a cluster required to call the cluster germline. [1] | |
clip | clip_query | optional | The filtering query to filter the soft clipped reads. (See FILTER OPTIONS) [and(Cigar_M> 34,MD_mismatch < 3,MAPQ >0,Flag_DuplicateRead == false)] |
clip_size | optional | Number of clipped reads required to proceed with soft clip SV signature detection. [3] | |
consensus_length | optional | Minimum length of soft clip consensus sequence. [20] | |
blatpath | Required for soft clipping | Path to blat executable /home/Software/BLAT | |
blatserver | Required for soft clipping | Name of blat server eg: localhost | |
blatport | Required for soft clipping or local split read contig | Port for blat server: eg 8000 | |
single_side_clip | optional | Set to true if SV signatures with soft clip evidence at one breakpoint should be identified | |
test | name | required | Name of the test sample eg tumour |
input_file | required | Path to the test bam. Must be co-ordinate sorted | |
test/size : Use nomenclature test/size_numbe eg test/size_1, test/size_2 etc) | rgid | required | Read Group ID. Found in the header of the bam |
lower | required | Lower insert size | |
upper | required | Upper insert size | |
control | name | required | Name of the control sample eg. normal |
input_file | required | Path to the control bam. Must be co-ordinate sorted | |
control/size : Use nomenclature control/size_number e.g. control/size_1, control/size_2 etc) | rgid | required | Read Group ID. Found in the header of the bam |
lower | required | Lower insert size | |
upper | required | Upper insert size |
“operator (condition, condition, query)”
“and (Cigar_M>35, or (MAPQ> 50, option_SM ==1), Flag_DuplicateRead == false)”
“key comparator value”
The below table lists current options available for important BAM fields:
BAM Field | Key | Comparator | Value | Examples |
---|---|---|---|---|
Flag | flag_ReadPaired | "==" | string | to report all duplicated reads: flag_duplicated == true flag_duplicated != 0 |
flag_ProperPair | "!=" | |||
flag_ReadUnmapped | ||||
flag_Mateunmapped | ||||
flag_ReadNegativeStrand | ||||
flag_MateNegativeStrand | ||||
flag_FirstOfpair | ||||
flag_SecondOfpair | ||||
flag_NotprimaryAlignment | ||||
flag_ReadFailsVendorQuality | ||||
flag_DuplicateRead | ||||
Cigar | Cigar_I | "==" | int | to report all reads with mapped base more than 15 Cigar_M>= 16 Cigar_M> 15 |
Cigar_D | "!=" | |||
Cigar_N | ">=" | |||
Cigar_S | "<=" | |||
Cigar_H | ">" | |||
Cigar_P | "<" | |||
Cigar_M | ||||
MAPQ | MAPQ | "==" | int | to report all reads with higher mapping quality than 16 MAPQ > 16 |
"!=" | ||||
">=" | ||||
"<=" | ||||
">" | ||||
"<" | ||||
Optional field | option_<tag> | "==" | string | To report the tag “RG” with value :: Tumor Option_RG == Tumor |
"!=" | ||||
Optional field | option_<tag> | "==" | int | to report the tag “SM” which has a value greater than 14 option_SM> 14 |
"!=" | ||||
">=" | ||||
"<=" | ||||
">" | ||||
"<" |
Discordant pair mode requires the provision of a normal range of expected insert sizes for paired sequencing reads (lower and upper isize). This can be attained as follows:
1. use Picard's CollectInsertSizeMetrics to give you an ISize distribution
2. use qProfiler
Once the user has calculated the expected insert size ranges, they should be added in the ini file (see ini file options test/size section and control/size section)
An upper and lower insert size must be provided for each read group in the input bam file/s.
qSV takes mapped next-generation sequencing data as input. It has been tested with:
To determine somatic and germline events, qSV requires 2 input files in the BAM file format which are co-ordinate sorted:
To identify SVs with a single test sample (no comparison with control sample), qSV requires 1 input file in the BAM file format which is co-ordinate sorted.
For discordant pair mode each sequencing record must contain several fields that are described in the SAM format specification (1):
For soft-clipping mode, we recommend the reads are mapped by BWA. (Other mapping algorithms can be added).
qSV will generate a number of output files.
Log file:
Summary file:
Tab delimited structural variants file:
Header | Description |
---|---|
analysis_id | in format of qSV_sample_date_time |
sv_id | id of the structural variant |
sm = somatic | |
gm = germline | |
sv_type | DEL/ITX – deletion/other intrachromosomal |
CTX – interchromosomal translocation | |
DUP/INS/ITX – duplication/insertion/other intrachromosomal | |
INV/ITX – inversion/other intrachromosomal | |
chr1 | chromosome 1 of SV |
pos1 | position 1 of SV |
strand1 | |
chr2 | chromosome 2 of SV |
pos2 | position 2 of SV |
strand2 | |
test_discordant_pairs_count | number of discordant pair reads which pass the filter from test bam that support the current SV |
control_discordant_pairs_count | number of discordant pair reads which pass the filter from control bam that support the current SV |
control_low_qual_reads_count | number of low quality discordant pair reads from control bam for the current SV. These reads are lower quality reads from the controlbam that were excluded by the original filtering parameters. Presence of a large number of the reads may indicate the event is germline rather than somatic. |
test_clips_count_pos1 | number of high quality soft clipped reads at position1 from test/disease bam that support the current SV |
test_clips_count_pos2 | number of high quality soft clipped reads at position2 from test/disease bam that support the current SV |
control_clips_count_pos1 | number of high quality soft clipped reads at position1 from control bam that support the current SV |
control_clips_count_pos2 | number of high quality soft clipped reads at position2 from control bam that support the current SV |
microhomology | bases of microhomology found. If microhomology was tested, and no microhomology found, the result will be “not found”. If microhomology was not tested this column will list: “not tested” |
non-template | bases of non-template sequence found. If non-template was tested, and no non-template was identified, the result will be “not found”. If no non-template was not tested result will be: “not tested” |
Category | Evidence for the SV (1-6) |
1. High level of evidence: eg discordant pair evidence, clipping at both SV breakpoints, local split read contig evidence observed. | |
2. Medium level of evidence: eg discordant pair signature (both breakpoints) and soft clipping signature | |
3. Lower level of evidence eg. discordant pair signature alone | |
4. Possible germline due to the presence of low quality control reads or evidence in the control bam from local split read alignment | |
5. Possible repeat region. Greater than 1000 clips identified in the region of the SV breakpoint/s. | |
6. Low level evidence - Soft clipping signature for one breakpoint |
Unaligned soft clips:
Verbose output for structural variants.
Sample example files are provided here: <todo>Image Qsv Example Files</todo>
Edit the following ini file options in example.ini:
Make sure the BLAT server dependency has been installed and is running.
Run qSV using the following command:
java –jar qsv-0.3.jar –ini example.ini –tmp [path/to/tmp/directory]
Results will be written to the specified output directory and can be found under the directory: