Thank you for developing deFuse. I am a bioinformatics student and working
on fusion detection using RNA-seq. I built a super naive fusion simulator
and tried to estimate the sensitivity and FDR between existing methods.
However, when I compared deFuse result to my simulated fusion breakpoints,
deFuse could never find the breakpoint correctly, the closest were at least
30bp away from the true breakpoint when the partner was found to be
correct.
The reason why I called it naive is that I just get two random exons and
use their exon boundaries as the breakpoint and randomly generate fusion
supporting reads (span and split) from reference sequence. I blat those
reads to check their reliability and blat can find the breakpoint
correctly. At the end, I merged these simulated reads with all properly
paired aligned reads as the background to build a dataset to run deFuse.
As a result, I want to ask your opinion about this. Is this issue caused by
missing important factors deFuse considered but my simulator did not, so
that I need to add to my simulator? Or is it just a small bug of deFuse? In
my opinion, if it is really a fusion, the detection algorithm should be
able to find it.
I can set you my simulated reads to you, but it is 35M and too big to be
sent by email.
Thank you
Yuxiang Tan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In our simulations we do reasonably well, so id be interested to see your simulation and understand why defuse is not working properly. If you can share by google drive or dropbox i could take a look. You could also just send me the code that generates the simulated data, that might be better.
Thanks,
Andrew
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you so much for your reply.
My code is at test mode but not a package. As a result, I don't think you will be able to run it yourself.
I can share you the fastq files by google drive, two fastq files with simulated and background reads are 4.7G each, and two fastq files with only simulated reads are 33Mb each. Which email you will use that I can share them to you? Or you think I should just post the share link here?
Thank you
Yuxiang Tan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I shared the files to you through andrew.mcpherson@gmail.com. I am not sure whether it is the one you use. Please reply me no matter you can access it or not.
Thank you
Yuxiang
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please also send the sequence of one of the simulated fusions (not found by defuse), in addition to the read ids or sequences of some of the reads that were generated from that simulated fusion.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you so much for your reply and looks like you got my share.
In fact, none of the fusion I generated found by deFuse correctly in the way I generated.
I name the simulated reads with all information of the fusion.
For example:
@HWI_chrY_9707598_bp9707748_ENSG00000231874_ENST00000441642_1_1_+0_chr11_bp1256453_1256603_ENSG00000117983_ENST00000546052_22_1+_0_breaklen54_0_SPLITEND2
This is a split fusion read, split at end2 (anno at the end), the breakpoints are at chrY 9707748 on ENSG00000231874 and at chr11_bp1256453 on ENSG00000117983.
You can just randomly find a fusion in the simulation list and the grep the ENSG ID pair to get all the supporting reads.
For example, for the previous ID, you can:
grep ENSG00000231874 filename | grep ENSG00000117983
Then you will have all the supporting reads I generated for this fusion gene pair.
Best
Yuxiang
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Seems to match:
HWI_chrX_9727284_bp9727434_ENSG00000101850_ENST00000480178_1_1_-0_chrX_bp56276737_56276887_ENSG00000102349_ENST00000468660_2_1+_0_breaklen30_1_SPLITEND2
However, due to breakpoint homology there is some ambiguity, but the sequence is correct.
Last edit: Andrew 2014-03-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am not what do you mean correct? Is it deFuse result? Also, I did able to use deFuse to find one/two partners correctly, but the breakpoint location is not matching what I generated. Did it happen to you?
If possible, could you directly answer to my email: yuxiang.tan@gmail.com, which I think will be more efficient.
Best
Yuxiang
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Emailed you the result. Some of the breakpoints may not have the exact position correct because of homologous breakpoint sequence.
Also, this simulation is not ideal for defuse. The fragment length is 160bp, quite short, generally defuse performs better with 250bp fragments or greater.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think I generated the simulation data with same fragment size distribution. However, deFuse still give me similarly poor result. Could you send me a simulation data you used to me so that I can have a better idea? Or what else we can do?
Thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, Andrew:
Thank you for developing deFuse. I am a bioinformatics student and working
on fusion detection using RNA-seq. I built a super naive fusion simulator
and tried to estimate the sensitivity and FDR between existing methods.
However, when I compared deFuse result to my simulated fusion breakpoints,
deFuse could never find the breakpoint correctly, the closest were at least
30bp away from the true breakpoint when the partner was found to be
correct.
The reason why I called it naive is that I just get two random exons and
use their exon boundaries as the breakpoint and randomly generate fusion
supporting reads (span and split) from reference sequence. I blat those
reads to check their reliability and blat can find the breakpoint
correctly. At the end, I merged these simulated reads with all properly
paired aligned reads as the background to build a dataset to run deFuse.
As a result, I want to ask your opinion about this. Is this issue caused by
missing important factors deFuse considered but my simulator did not, so
that I need to add to my simulator? Or is it just a small bug of deFuse? In
my opinion, if it is really a fusion, the detection algorithm should be
able to find it.
I can set you my simulated reads to you, but it is 35M and too big to be
sent by email.
Thank you
Yuxiang Tan
In our simulations we do reasonably well, so id be interested to see your simulation and understand why defuse is not working properly. If you can share by google drive or dropbox i could take a look. You could also just send me the code that generates the simulated data, that might be better.
Thanks,
Andrew
Hi, Andrew:
Thank you so much for your reply.
My code is at test mode but not a package. As a result, I don't think you will be able to run it yourself.
I can share you the fastq files by google drive, two fastq files with simulated and background reads are 4.7G each, and two fastq files with only simulated reads are 33Mb each. Which email you will use that I can share them to you? Or you think I should just post the share link here?
Thank you
Yuxiang Tan
Hi, Andrew:
I shared the files to you through andrew.mcpherson@gmail.com. I am not sure whether it is the one you use. Please reply me no matter you can access it or not.
Thank you
Yuxiang
Please also send the sequence of one of the simulated fusions (not found by defuse), in addition to the read ids or sequences of some of the reads that were generated from that simulated fusion.
Hi, Andrew:
Thank you so much for your reply and looks like you got my share.
In fact, none of the fusion I generated found by deFuse correctly in the way I generated.
I name the simulated reads with all information of the fusion.
For example:
@HWI_chrY_9707598_bp9707748_ENSG00000231874_ENST00000441642_1_1_+0_chr11_bp1256453_1256603_ENSG00000117983_ENST00000546052_22_1+_0_breaklen54_0_SPLITEND2
This is a split fusion read, split at end2 (anno at the end), the breakpoints are at chrY 9707748 on ENSG00000231874 and at chr11_bp1256453 on ENSG00000117983.
You can just randomly find a fusion in the simulation list and the grep the ENSG ID pair to get all the supporting reads.
For example, for the previous ID, you can:
grep ENSG00000231874 filename | grep ENSG00000117983
Then you will have all the supporting reads I generated for this fusion gene pair.
Best
Yuxiang
Hi, Andrew:
Are there any good news from your side?
Thank you
Yuxiang
When I run this library I do get some matching results. For instance:
GATTCTGAATGACGGTGTCTGCAACACTGCAAAATACTTTTAAACGCTCAGTGCCATCTCTTATCTTCCCTCTAAAATAGAACTAGGGCAGAAATCCCATTTCCTCGGTGAATACCTCAGTCCTGCCGATCTCCGGATCACCAGATAAGCATCCACTGCATAGCAAAACAGCC|ACCAGGTATTCAATACTATTATGGCCCAATATTAGCACCCTATTTTCCTGATAACTGACCTGCTTGAATACCTGCATTGAGCCACCTTCTGAATTAAGTTGGACCTCCAAGTTGTTT
ENSG00000101850, ENSG00000102349
Seems to match:
HWI_chrX_9727284_bp9727434_ENSG00000101850_ENST00000480178_1_1_-0_chrX_bp56276737_56276887_ENSG00000102349_ENST00000468660_2_1+_0_breaklen30_1_SPLITEND2
However, due to breakpoint homology there is some ambiguity, but the sequence is correct.
Last edit: Andrew 2014-03-27
Hi, Andrew:
Thank you. Could you please send me your result?
I am not what do you mean correct? Is it deFuse result? Also, I did able to use deFuse to find one/two partners correctly, but the breakpoint location is not matching what I generated. Did it happen to you?
If possible, could you directly answer to my email: yuxiang.tan@gmail.com, which I think will be more efficient.
Best
Yuxiang
Emailed you the result. Some of the breakpoints may not have the exact position correct because of homologous breakpoint sequence.
Also, this simulation is not ideal for defuse. The fragment length is 160bp, quite short, generally defuse performs better with 250bp fragments or greater.
Issue resolved. The fusion reads were simulated with different fragment length from the concordant reads.
Hi, Andrew:
I think I generated the simulation data with same fragment size distribution. However, deFuse still give me similarly poor result. Could you send me a simulation data you used to me so that I can have a better idea? Or what else we can do?
Thank you