SOAPfuse / Wiki / downstream-frameshift-definition.for.SOAPfuse

downstream-frameshift-definition.for.SOAPfuse

Authors:

In general, if the up stream fusion part includes part of up stream peptide chain, the fusion structure will not break it, because the translation of RNA begins from upstream (5'), there is no front element that will make the upstream fusion part frame-shift.

But, for downstream fusion part, it is not same.

Based on the length of upstream fusion part, the downstream fusion part may be frame-shift. Isn't it?
If it is hard to understand, just consider the upstream fusion part as a long-long insertion. So, the length of insertion is important to judge whether the downstream fusion partner is frame-shift or not.

The judgement of the downstream fusion partner frame-shift or not can be explained in three steps:

Note:
This judgement strategy is totally created by Wenlong Jia. And it is a initial version, still needs further improvement to make it strong.

1. Check whether the fusion transcript can generate peptide chain

Here, I required that both of up stream fusion isoform and down isoform must be mRNA, or we will leave the judgement result as 'NA', and the information of fusion peptide chain will be noted as 'both_must_be_mRNAs'.

Furthermore, not all mRNAs are protein_coding type, such as 'nonsense_mediated_decay'. So, when the downstream fusion partner is not protein_coding type, we will also leave the judgement result as 'NA', and the information of fusion peptide chain will be noted as 'downstream_must_be_protein_coding'.

Then, as the upstream isoforms is the mRNA, so it must have start codon:

a) the upstream fusion part includes the intact start codon.
It is easy to predict the fusion peptide chain, just from the start codon, translate every
three bases to one amino acid, till the stop codon.
b) the upstream fusion part lacks the intact start codon.
   In this case, we cannot use the original start codon of upstream, so I wrote a small program
   to predict new start codon from whole fusion segment based on the classical kozak consensus sequence.

   # Check kozak consensus sequence on wiki Or BaiDu BaiKe:

   If the kozak annotation fails, then we cannot get the new start codon, so judgement result will be
   noted as 'NA', and the information of fusion peptide chain will be noted as 'down_fusion_part_
   original_peptide_chain:0-AA'.

Anyway, no matter it is a) or b), we must know the start codon before we predict the fusion transcript peptide chain.

Yes, till now, we can know the whole fusion transcript peptide chain.

2. Obtain the down stream fusion part peptide chain

From the downstream junction point, we can know whether the down stream part includes part of its original peptide chain. If yes, we can also obtain the corresponding part of peptide chain. If no, the judgement result is noted as 'NA', and the information of fusion peptide chain will be noted as 'lacks_domains_of_down_gene'.

PS. If the peptide chain of down stream fusion part exists but is shorter than 10 AA, no more prediction
will be done, just note is as 'NA'.

3. Define inframe-shift or not

Knowing the peptide chain of whole fusion segment and the down stream fusion part, we can compare this two strings to see whether the down stream fusion part peptide chain is retained in the whole fusion segment peptide chain.

If yes, then the judgement is 'inframe-shift', and the information of fusion peptide will be noted as 'down_fusion_part_peptide_chain:xx-AA'. 'xx' means the length of down stream fusion part peptide chain.

If no, the judgement is 'frame-shift', and the information of fusion peptide will be noted as 'down_fusion_part_original_peptide_chain:xx-AA{????}'. 'xx' means the length of down stream fusion part peptide chain, and '????' is the down stream fusion part peptide chain string, just write it here, may be a little long.

Test

I have tested this strategy to judge several classical fusions. Such as, BCR-ABL1 in CML, EML4-ALK in lung cancer, TMPRSS2-ERG in prostate cancer, FGFR3-TACC3 in brain glioblastoma and bladder cancer, BCAS4-BCAS3 in breast cancer and so on.

Because of the work I am in, I have some cancer samples that harbour these classical fusions. All of these fusions are successfully annotated as 'inframe-shift' just in my first time trying, bingo!

And I also tried to annotated other interesting fusions found in my research, they all have the classical features of siginificant fusions. And, delightedly, almost all of them are annotated as 'inframe-shift'. It helps me a lot.

Next work?
Well, upload the peptide chain reported by SOAPfuse to iprscan website to check the domain regions.
Link: http://www.ebi.ac.uk/Tools/pfa/iprscan/

PS. The peptide chain of each inframe-shift case can be found in a single text file:
/out_directory/final_fusion_genes/sample-ID_or_patient-ID/analysis/For_peptides_analysis/sample-ID_or_patient-ID.trans.fusion.peptide.chain

Wenlong Jia
04-12-2013

SOAPfuse Wiki

a tool for identifying fusion transcripts from paired-end RNA-Seq data