Hi Sitan, When calculating breakpoints supporting reads, we allow a [-1, +1] shift around the junction to alleviate alignment ambiguity. This may overestimate the split reads around breakpoints. But estimation of SV frequency based on split reads may underestimate it as well. To be comparable, I suggest you use (column 19 + column 29)/(column 16 + column 26). Col 19 and Col 29 are high-quality split read numbers. These fields (column 11 to column 39) are: cluster_id, contig_id, contig_size, reads_used_for_assembly,...
Can be found at https://github.com/czc/nb_distribution
you should export the path to the novoBreak_distribution directory. The command should be "export PATH=$PATH:/data/tmp/huebnerj/novoBreak_distribution_v1.1.3rc".
If there is no normal control, a simulated normal bam file (no coverage need) can be used to meet the interfaces of novoBreak. But the results would be both somatic and germline events. You can use wgsim to simulate raw reads from reference and align them using bwa to generate a bam file as "normal".
The source code of novoBreak is on the SourceForge already. Simply use "git clone https://git.code.sf.net/p/novobreak/git novobreak-git && cd novobreak-git && make" and copy the binary of novoBreak to replace the "novoBreak" in novoBreak_distribution_v1.1.3rc should be fine. If the software in the novoBreak_distribution_v1.1.3rc directory cannot use due to Linux environment (such as samtools), simply download the source and compile, install and copy to the novoBreak_distribution_v1.1.3rc directory...
Hi Wenkai, You may try the inherent second filter filter_sv2.pl. Or a tip you may...