Menu

Big Output Files, Interrupted Index and Splice Index

2013-10-30
2013-10-31
  • A.R. Grosso

    A.R. Grosso - 2013-10-30

    I have several questions:

    1) can we discard some big output files, namely: cdna.pair.sam; reads.1.fastq; reads.2.fastq; cdna.pair.bam? deFuse os producing around 250Gb for each sample of my dataset….

    2) interrupted_index - in the paper you present this ratio as log2, however the values in my output table are all positive. I just want to confirm that they are in fact already log2. What means the result "-"?

    3) splice-index - according to the website the definition is: "number of concordant pairs in gene 1 spanning the fusion splice / breakpoint, divided by number of spanning reads supporting the fusion with gene 2". First, the numerator corresponds to the reads in gene 1 spanning the fusion breakpoint in the normal gene (i.e. including the remaining exons of gene)? Is it in log-scale? Are the reads normalized for length of coveraged region? What means a value of "0" or "-"? Thus, only when the SI is lower than 1, we have the fusion splice-junction being more used than the "normal" splice-junction, right?

    Thanks
    Ana Rita

     
  • Andrew

    Andrew - 2013-10-31

    1) It depends on whether you are satisfied with the output provided in results.* If you want to do some additional interrogation of the data, for instance by running get_reads.pl to find the supporting reads, then you will have to keep some of the temporary files (details on the manual page).

    2) The output for interrupted_index is actually the ratio and is not log transformed. A "-" signifies no data, usually because one side of the fusion is in a non-genic region.

    3) Splice index is also a ratio as reported. Given a fusion boundary with genomic position x, the numerator counts the number of read that align with one end to the left of x and one end to the right of x. The denominator is the number of supporting spanning reads. No normalization with length is performed since the length is not a factor, reads are counted according to overlap with a single position for both the numerator and denominator. One issue with this measurement is it includes all normal splice variants but not all fusion splice variants. A "-" signifies no data and again is the result of a fusion occurring in a non-genic region. A "0" signifies no wild type reads. An SI lower than 1 as you say means we have more fusion reads than normal reads.

     

Log in to post a comment.

MongoDB Logo MongoDB