Big Output Files, Interrupted Index and Splice Index

Status: Beta

Brought to you by: amcpherson

Big Output Files, Interrupted Index and Splice Index

Forum: deFuse Help

Creator: A.R. Grosso

Created: 2013-10-30

Updated: 2013-10-31

A.R. Grosso - 2013-10-30

I have several questions:

1) can we discard some big output files, namely: cdna.pair.sam; reads.1.fastq; reads.2.fastq; cdna.pair.bam? deFuse os producing around 250Gb for each sample of my dataset….

2) interrupted_index - in the paper you present this ratio as log2, however the values in my output table are all positive. I just want to confirm that they are in fact already log2. What means the result "-"?

3) splice-index - according to the website the definition is: "number of concordant pairs in gene 1 spanning the fusion splice / breakpoint, divided by number of spanning reads supporting the fusion with gene 2". First, the numerator corresponds to the reads in gene 1 spanning the fusion breakpoint in the normal gene (i.e. including the remaining exons of gene)? Is it in log-scale? Are the reads normalized for length of coveraged region? What means a value of "0" or "-"? Thus, only when the SI is lower than 1, we have the fusion splice-junction being more used than the "normal" splice-junction, right?

Thanks
Ana Rita

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew - 2013-10-31

1) It depends on whether you are satisfied with the output provided in results.* If you want to do some additional interrogation of the data, for instance by running get_reads.pl to find the supporting reads, then you will have to keep some of the temporary files (details on the manual page).

2) The output for interrupted_index is actually the ratio and is not log transformed. A "-" signifies no data, usually because one side of the fusion is in a non-genic region.

3) Splice index is also a ratio as reported. Given a fusion boundary with genomic position x, the numerator counts the number of read that align with one end to the left of x and one end to the right of x. The denominator is the number of supporting spanning reads. No normalization with length is performed since the length is not a factor, reads are counted according to overlap with a single position for both the numerator and denominator. One issue with this measurement is it includes all normal splice variants but not all fusion splice variants. A "-" signifies no data and again is the result of a fusion occurring in a non-genic region. A "0" signifies no wild type reads. An SI lower than 1 as you say means we have more fusion reads than normal reads.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.