INTEGRATE Wiki
Brought to you by:
jin-wash-u
INTEGRATE version 0.1c
Discover fusions by combining RNA-Seq and WGS data sets*
usage: Integrate <subcommand> [options] list of data sets
Integrate subcommands include:
fusion: call fusions.
mkbwt: build BWTs for reference genome. This has to be run one time before running subcommand fusion.
*Note: Integrate can run with RNA only data sets.
INTEGRATE version 0.1c
Creat directory:
mkdir directory_to_bwts
Run subcommand mkbwt:
Integrate mkbwt (options) reference.fasta
options:
-mb integer : sequences in the reference fasta that are shorter than this value default: 10000000
are not included in the evaluation of repetitive reads.
-dir string : directory to store the BWTs. default: ./bwts
INTEGRATE version 0.1c
Make sure mkbwt has been run:
Integrate fusion (options) reference.fasta annotation.txt directory_to_bwt accepted_hits.bam unmapped.bam (dna.tumor.bam dna.normal.bam)
options: -cfn integer : Cutoff of spanning RNA-Seq reads for fusions with non-canonical
exonic boundaries. default: 3
-rt float : Normal dna / tumor dna ratio. If the ratio is less than
this value, then dna reads from the normal dna data set
supporting a fusion candidates are ignored. default: 0.0
-minIntra integer : If only having RNA reads, a chimera with two adjacent
genes in order is annotated as intra_chromosomal rather than
read_through if the distance of the two genes is longer than
this value. default: 400000
-minW float : Mininum weight for the encompassing rna reads on an edge. default: 2.0
-mb integer : See subcommand "mkbwt".
This value can be larger than used by mkbwt. default: 10000000
-reads string : File to store all the reads. default: reads.txt
-sum string : File to store summary. default: summary.tsv
-ex string : File to store exons for fusions with canonical exonic boundaries. default: exons.tsv
-bk string : File to store breakpoints default: breakpoints.tsv
This version of Integrate works in the following situations:
(1)having rna tumor, dna tumor, dna normal
(2)having rna tumor, dna tumor
(3)having rna tumor
Integrate will only use sequences in reference.fasta.
Chr names with and without "chr" are regarded as the same, e.g. chr1 = 1.
The rna and dna bams can be from alignments mapped to different reference files with different order of the sequences and their names with or without "chr". However, The versions should be the same, e.g. hg19. (Also, the same as in annotation.)
The tumor and normal dna bams should be mapped to the same reference file.
For rna tumor: accepted_hits.bam is a bam file containing mapped rna reads. unmapped.bam is a bam contains the not mapped rna reads. If they have been merged into one bam, just use merged.bam twice in the command line.
For dna bams: If solt-clips are provided, then Integrate is trying to search rearrangement breakpoints, otherwise, only paired reads may be included in the analysis.
If having rna normal only or having both rna and dna normal data sets. These data sets can be run to find non somatic events.
e.g. Integrate fusion -normal (options) reference.fasta annotation.txt directory_to_bwt accepted_hits.normal.bam unmapped.normal.bam (dna.normal.bam)