Menu

Some puzzling parameters in defuse

swordsman
2013-11-25
2013-11-25
  • swordsman

    swordsman - 2013-11-25

    I am focusing on some constantly incorrectly rejected fusion genes. After I replace the reference and annotation data according to the wiki page of defuse on the Internet, there are still some parameters I cannot figure out. The parameters are:
    // Filtering parameters
    max_dist_pos = 600
    num_dist_genes = 500
    split_min_anchor = 4 // what is the difference between split_count_threshold and it?
    splice_bias = 10

    // Position density when calculating covariance
    covariance_sampling_density = 0.01
    // Number of regions for each breakpoint sequence job in split
    regions_per_job = 20

    Thanks,
    Daqing

     

    Last edit: swordsman 2013-11-25
    • Andrew

      Andrew - 2013-11-25

      Here are the explanations of the parameters:

      split_min_anchor: minimum number of nt that must align to one side or the other of a breakpoint for the split read to be valid

      splice_bias: used to calculate a genomic position given a transcriptomic position. If the fusion boundary prediction is just past a splice junction, but in reality the fusion boundary is exactly at the splice junction, the wrong genomic position will be reported. Thus we subtract splice_bias, remap to the genome, and then add splice_bias to the remapped coordinate.

      covariance_sampling_density: we attempt to account for non-independence of read fragmentation by calculating a covariance between fragment lengths of reads that span a particular positon.

      max_dist_pos, num_dist_genes and regions_per_job are deprecated, perhaps you have an outdated config file.

      If you have a public dataset with known fusions that defuse is not finding please let me know and I will try the dataset myself.

      Also, there is another post which addresses false negatives, please see that post for some tips.

       

Log in to post a comment.

MongoDB Logo MongoDB