TSSs are considered implicitly with the ORF calculation. The annotation is done using alignments to coding sequences (CDS) of each gene. To be called ORF, a fusion must:
align to the same strand of the CDS for gene a and b
the relative phase of the alignment positions in gene a and b must equal the relative phase of the matched nt in the fusion sequence
The second condition simply ensures that there is no frame shift at the fusion boundary, implying that both coding sequences are preserved when fused. The TSS of the 3 prime gene is not considered, all that matters is that the 5 prime TSS is preserved and the 3 prime TTS is preserved in the fusion.
First, thanks for writing this tool! The more I use it, the better I like it.
I wanted some help in interpreting the "ORF" column within the results.tsv.
orf (col 53) - can be Y/N
4 possible fusion states (TSS = transcription start site):
1) geneA TSS intact; geneB TSS gone (so geneA ORF is ok, just truncated; geneB could have frameshift or not)
2) geneA TSS intact; geneB TSS intact
3) geneA TSS gone; geneB TSS gone
4) geneA TSS gone; geneB TSS intact (so geneB ORF ok; geneA could have frameshift or not)
Maybe I'm over thinking this? Does "Y" mean (1) or (4) is true, and no frameshift? Something else?
Thanks!
TSSs are considered implicitly with the ORF calculation. The annotation is done using alignments to coding sequences (CDS) of each gene. To be called ORF, a fusion must:
The second condition simply ensures that there is no frame shift at the fusion boundary, implying that both coding sequences are preserved when fused. The TSS of the 3 prime gene is not considered, all that matters is that the 5 prime TSS is preserved and the 3 prime TTS is preserved in the fusion.
The code for the calculation is here:
https://bitbucket.org/dranew/defuse/src/652d5ed9a5a8c54515aa21158ca61fd91618a2ab/scripts/annotate_fusions.pl?at=master#cl-598
excellent. Thanks