Menu

Description of columns in results.tsv output

eearley
2014-05-07
2014-05-13
  • eearley

    eearley - 2014-05-07

    First, thanks for writing this tool! The more I use it, the better I like it.

    I wanted some help in interpreting the "ORF" column within the results.tsv.

    orf (col 53) - can be Y/N

    4 possible fusion states (TSS = transcription start site):
    1) geneA TSS intact; geneB TSS gone (so geneA ORF is ok, just truncated; geneB could have frameshift or not)
    2) geneA TSS intact; geneB TSS intact
    3) geneA TSS gone; geneB TSS gone
    4) geneA TSS gone; geneB TSS intact (so geneB ORF ok; geneA could have frameshift or not)

    Maybe I'm over thinking this? Does "Y" mean (1) or (4) is true, and no frameshift? Something else?

    Thanks!

     
  • Andrew

    Andrew - 2014-05-10

    TSSs are considered implicitly with the ORF calculation. The annotation is done using alignments to coding sequences (CDS) of each gene. To be called ORF, a fusion must:

    1. align to the same strand of the CDS for gene a and b
    2. the relative phase of the alignment positions in gene a and b must equal the relative phase of the matched nt in the fusion sequence

    The second condition simply ensures that there is no frame shift at the fusion boundary, implying that both coding sequences are preserved when fused. The TSS of the 3 prime gene is not considered, all that matters is that the 5 prime TSS is preserved and the 3 prime TTS is preserved in the fusion.

    The code for the calculation is here:

    https://bitbucket.org/dranew/defuse/src/652d5ed9a5a8c54515aa21158ca61fd91618a2ab/scripts/annotate_fusions.pl?at=master#cl-598

     
  • eearley

    eearley - 2014-05-13

    excellent. Thanks

     

Log in to post a comment.

MongoDB Logo MongoDB