INTEGRATE Wiki

Brought to you by: jin-wash-u

example

Example

A test data set with a readme file (Example.pdf) was added to the files for download with v0.1c on May 30th 2014. The input data is still good. However, new versions of the software have changed its outputs. Here let's go through the example again with v0.2.0.

The test data set is not small, 180Mb in size ofter compression, just to contain one fusion. This is because I want to use the real coordinates of a real fusion.

If you want to try INTEGRATE with the test data, you need to download (go to Files) and install (go to [Installation]) first.

The test data contains two small annotation files for two genes of the fusion (annot.ucsc.test.txt and annot.ensembl.test.txt, see Appendix A of Reference_Manual_0.1c_5_1_2014.pdf from Files for how to create annotation files as input). It also contains a tailored reference genome. So everything you need to run the test data is self-contained in the test data.

you can take a look at the following files:

Files:

reference.fasta a fasta file containing part of the sequences of chr 6 and 19.
annot.ucsc.test.txt a text file with 9 columns, which contains the transcripts information.
accepted_hits.bam sorted bam for mapped tumor rna reads.
unmapped.bam not mapped tumor rna reads.
dna.tumor.bam sorted tumor dna bam with soft-clip.
dna.normal.bam sorted normal dna bam with soft-clip.

Note: (1) All the above files contains only data in the region of 6:46000000-47000000 and 19:39000000-40000000 (unmapped_bam only contains reads relevant to the two genes.
(2) reference.fasta is 6:1-47000000 and 19:1-40000000)

Preparation:

(1) All the bam files (except for unmapped.bam) have been indexed by samtools.
(2) Create a folder “bwts”
mkdir bwts
Note: this is the folder to store BWTs getting from Integrate.
You can use other names rather than “bwts”.
(3) Copy executable Integrate to this folder.
Note: Refer to Reference_Manual_0.1c_5_1_2014.pdf for instructions of compile the executable

(4) Integrate mkbwt reference.fasta

Call fusion:

Example command lines:

(1) RNA-Seq Tumor + WGS Tumor + WGS Normal

Integrate fusion reference.fasta annot.ucsc.test.txt ./bwts accepted_hits.bam unmapped.bam dna.tumor.bam dna.normal.bam

(2) RNA-Seq Tumor + WGS Tumor

Integrate fusion reference.fasta annot.ucsc.test.txt ./bwts accepted_hits.bam unmapped.bam dna.tumor.bam

(3) RNA-Seq Tumor

Integrate fusion reference.fasta annot.ucsc.test.txt ./bwts accepted_hits.bam unmapped.bam

Note: run the commands again using annot.ensembl.test.txt.

Outputs:

summary.tsv
A line says, the first fusion called is EIF3K>>CYP39A1. It is not reciprocal. It is first Tier (Refer to our publication) inter-chromosomal event. It has 68, 185, 18, and 15 encompassing RNA reads, spanning RNA reads, encompassing WGS reads, and spanning WGS reads respectively. It has no reads from the normal WGS data. Then it lists number of reads supporting different fusion transcripts. There is one transcript with 163 reads.

reads.txt
OK, looking at the summary, this fusion is very striking with large number of supporting reads. You want to look at the supporting reads.

Reads are listed in different categories: Encompassing RNA, Spanning RNA, Encompassing DNA, Spanning DNA ... BLAT them and convince you that this is a good candidate and what functions may this fusion has.

exons.txt
Now there is reason to validate the fusion. exons.tsv contains the target regions to design sequencing probes or PCR primers.

breakpoints.tsv
Well, it is real and maybe important. Now convey the discovery. This file contains the exact coordinates of fusion junctions and genomic breakpoints. You can have them in some supplementary tables. There are also a bedpe and a vcf file for gene fusions and genomic breakpoints. This could be add to your own pipelines easily.

Wiki: Home
Wiki: Installation