I am focusing on some constantly incorrectly rejected fusion genes. After I replace the reference and annotation data according to the wiki page of defuse on the Internet, there are still some parameters I cannot figure out. The parameters are:
// Filtering parameters
max_dist_pos = 600
num_dist_genes = 500
split_min_anchor = 4 // what is the difference between split_count_threshold and it?
splice_bias = 10
// Position density when calculating covariance
covariance_sampling_density = 0.01
// Number of regions for each breakpoint sequence job in split
regions_per_job = 20
Thanks,
Daqing
Last edit: swordsman 2013-11-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
split_min_anchor: minimum number of nt that must align to one side or the other of a breakpoint for the split read to be valid
splice_bias: used to calculate a genomic position given a transcriptomic position. If the fusion boundary prediction is just past a splice junction, but in reality the fusion boundary is exactly at the splice junction, the wrong genomic position will be reported. Thus we subtract splice_bias, remap to the genome, and then add splice_bias to the remapped coordinate.
covariance_sampling_density: we attempt to account for non-independence of read fragmentation by calculating a covariance between fragment lengths of reads that span a particular positon.
max_dist_pos, num_dist_genes and regions_per_job are deprecated, perhaps you have an outdated config file.
If you have a public dataset with known fusions that defuse is not finding please let me know and I will try the dataset myself.
Also, there is another post which addresses false negatives, please see that post for some tips.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am focusing on some constantly incorrectly rejected fusion genes. After I replace the reference and annotation data according to the wiki page of defuse on the Internet, there are still some parameters I cannot figure out. The parameters are:
// Filtering parameters
max_dist_pos = 600
num_dist_genes = 500
split_min_anchor = 4 // what is the difference between split_count_threshold and it?
splice_bias = 10
// Position density when calculating covariance
covariance_sampling_density = 0.01
// Number of regions for each breakpoint sequence job in split
regions_per_job = 20
Thanks,
Daqing
Last edit: swordsman 2013-11-25
Here are the explanations of the parameters:
split_min_anchor: minimum number of nt that must align to one side or the other of a breakpoint for the split read to be valid
splice_bias: used to calculate a genomic position given a transcriptomic position. If the fusion boundary prediction is just past a splice junction, but in reality the fusion boundary is exactly at the splice junction, the wrong genomic position will be reported. Thus we subtract splice_bias, remap to the genome, and then add splice_bias to the remapped coordinate.
covariance_sampling_density: we attempt to account for non-independence of read fragmentation by calculating a covariance between fragment lengths of reads that span a particular positon.
max_dist_pos, num_dist_genes and regions_per_job are deprecated, perhaps you have an outdated config file.
If you have a public dataset with known fusions that defuse is not finding please let me know and I will try the dataset myself.
Also, there is another post which addresses false negatives, please see that post for some tips.