Menu

a problem of running deFuse on my simulation data

2014-02-06
2014-06-07
  • yuxiang tan

    yuxiang tan - 2014-02-06

    Hi, Andrew:

    Thank you for developing deFuse. I am a bioinformatics student and working
    on fusion detection using RNA-seq. I built a super naive fusion simulator
    and tried to estimate the sensitivity and FDR between existing methods.
    However, when I compared deFuse result to my simulated fusion breakpoints,
    deFuse could never find the breakpoint correctly, the closest were at least
    30bp away from the true breakpoint when the partner was found to be
    correct.

    The reason why I called it naive is that I just get two random exons and
    use their exon boundaries as the breakpoint and randomly generate fusion
    supporting reads (span and split) from reference sequence. I blat those
    reads to check their reliability and blat can find the breakpoint
    correctly. At the end, I merged these simulated reads with all properly
    paired aligned reads as the background to build a dataset to run deFuse.

    As a result, I want to ask your opinion about this. Is this issue caused by
    missing important factors deFuse considered but my simulator did not, so
    that I need to add to my simulator? Or is it just a small bug of deFuse? In
    my opinion, if it is really a fusion, the detection algorithm should be
    able to find it.

    I can set you my simulated reads to you, but it is 35M and too big to be
    sent by email.

    Thank you

    Yuxiang Tan

     
    • Andrew

      Andrew - 2014-02-13

      In our simulations we do reasonably well, so id be interested to see your simulation and understand why defuse is not working properly. If you can share by google drive or dropbox i could take a look. You could also just send me the code that generates the simulated data, that might be better.

      Thanks,
      Andrew

       
  • yuxiang tan

    yuxiang tan - 2014-02-14

    Hi, Andrew:

    Thank you so much for your reply.
    My code is at test mode but not a package. As a result, I don't think you will be able to run it yourself.
    I can share you the fastq files by google drive, two fastq files with simulated and background reads are 4.7G each, and two fastq files with only simulated reads are 33Mb each. Which email you will use that I can share them to you? Or you think I should just post the share link here?

    Thank you

    Yuxiang Tan

     
  • yuxiang tan

    yuxiang tan - 2014-02-28

    Hi, Andrew:

    I shared the files to you through andrew.mcpherson@gmail.com. I am not sure whether it is the one you use. Please reply me no matter you can access it or not.

    Thank you

    Yuxiang

     
    • Andrew

      Andrew - 2014-02-28

      Please also send the sequence of one of the simulated fusions (not found by defuse), in addition to the read ids or sequences of some of the reads that were generated from that simulated fusion.

       
  • yuxiang tan

    yuxiang tan - 2014-03-03

    Hi, Andrew:

    Thank you so much for your reply and looks like you got my share.

    In fact, none of the fusion I generated found by deFuse correctly in the way I generated.

    I name the simulated reads with all information of the fusion.
    For example:
    @HWI_chrY_9707598_bp9707748_ENSG00000231874_ENST00000441642_1_1_+0_chr11_bp1256453_1256603_ENSG00000117983_ENST00000546052_22_1+_0_breaklen54_0_SPLITEND2
    This is a split fusion read, split at end2 (anno at the end), the breakpoints are at chrY 9707748 on ENSG00000231874 and at chr11_bp1256453 on ENSG00000117983.

    You can just randomly find a fusion in the simulation list and the grep the ENSG ID pair to get all the supporting reads.
    For example, for the previous ID, you can:
    grep ENSG00000231874 filename | grep ENSG00000117983
    Then you will have all the supporting reads I generated for this fusion gene pair.

    Best

    Yuxiang

     
  • yuxiang tan

    yuxiang tan - 2014-03-22

    Hi, Andrew:

    Are there any good news from your side?

    Thank you

    Yuxiang

     
  • Andrew

    Andrew - 2014-03-27

    When I run this library I do get some matching results. For instance:

    GATTCTGAATGACGGTGTCTGCAACACTGCAAAATACTTTTAAACGCTCAGTGCCATCTCTTATCTTCCCTCTAAAATAGAACTAGGGCAGAAATCCCATTTCCTCGGTGAATACCTCAGTCCTGCCGATCTCCGGATCACCAGATAAGCATCCACTGCATAGCAAAACAGCC|ACCAGGTATTCAATACTATTATGGCCCAATATTAGCACCCTATTTTCCTGATAACTGACCTGCTTGAATACCTGCATTGAGCCACCTTCTGAATTAAGTTGGACCTCCAAGTTGTTT

    ENSG00000101850, ENSG00000102349

    Seems to match:
    HWI_chrX_9727284_bp9727434_ENSG00000101850_ENST00000480178_1_1_-0_chrX_bp56276737_56276887_ENSG00000102349_ENST00000468660_2_1+_0_breaklen30_1_SPLITEND2

    However, due to breakpoint homology there is some ambiguity, but the sequence is correct.

     

    Last edit: Andrew 2014-03-27
  • yuxiang tan

    yuxiang tan - 2014-03-27

    Hi, Andrew:

    Thank you. Could you please send me your result?

    I am not what do you mean correct? Is it deFuse result? Also, I did able to use deFuse to find one/two partners correctly, but the breakpoint location is not matching what I generated. Did it happen to you?

    If possible, could you directly answer to my email: yuxiang.tan@gmail.com, which I think will be more efficient.

    Best

    Yuxiang

     
  • Andrew

    Andrew - 2014-03-27

    Emailed you the result. Some of the breakpoints may not have the exact position correct because of homologous breakpoint sequence.

    Also, this simulation is not ideal for defuse. The fragment length is 160bp, quite short, generally defuse performs better with 250bp fragments or greater.

     
  • Andrew

    Andrew - 2014-03-28

    Issue resolved. The fusion reads were simulated with different fragment length from the concordant reads.

     
  • yuxiang tan

    yuxiang tan - 2014-06-07

    Hi, Andrew:

    I think I generated the simulation data with same fragment size distribution. However, deFuse still give me similarly poor result. Could you send me a simulation data you used to me so that I can have a better idea? Or what else we can do?

    Thank you

     

Log in to post a comment.

MongoDB Logo MongoDB