I have used defuse but for validation it is also handy to view the aligned reads versus the fusion products (Like IGV). Is there some integrated way of doing this? if not please give me a short description what I should do to get this working.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the simpelest way is probs a script like shown below. See the part following 'my $use=<<"END";' in the script for the manual.
this will create a DEFUSERESULTS.reference.fa to load as genome in IGV.
This will also create DEFUSERESULTS.bed for annotating the fusions and a DEFUSERESULTS.alignment.bam file containing the reads. Both to load as a file into IGV.
PS: the script is written to work in my case and does not check the commandline/files thoroughly. So i do not garantee this will work generally
#!/bin/perl -wusewarnings;usestrict;my$use=<<"END";use: $0 <DEFUSERESULTS> <READS1> <READS2> [OTHERREFFASTA]will create a valid breakpoint fasta and bedfile annotations and aligns the READS1 and READS2 using bowtie/samtools. Optionally other refs can be supplied (eg defuse.cdna.fa or hg19.fa)this will create a DEFUSERESULTS.reference.fa to **load as genome in IGV**. This will also create DEFUSERESULTS.bed for annotating the fusions and a DEFUSERESULTS.alignment.bam file containing the reads. **Both to load as a file into IGV**. examplesuse:$0 results.tsv reads1.fq reads2.fq or (should be more accurate):use:$0 results.tsv reads1.fq reads2.fq hg19.fa defuse.cdna.fawill create fusion locations and align reads to the fusion locations using bowtieENDwarn$use."\n";die"not enough commandline options"if(scalar(@ARGV)<3);#my $results = GetResult(@ARGV[0]);mkBreakFaBed($ARGV[0]);#best to include defuse.cdna.fa for multiple mappingmkAlignment(@ARGV);submkAlignment{my$results=shift@_;my$reads1=shift@_;my$reads2=shift@_;my@mfastas=@_;my$breakFa=$results.".fa";my$otherFa=join(' ',@mfastas);my$refFa=$results.".reference.fa";my$refIdx=$results.".index";{my$syscall="cat $breakFa $otherFa > $refFa";warn"#running prep reference:".$syscall."\n";system($syscall);}{my$syscall="bowtie-build $refFa $refIdx";warn"#running bowtie build:".$syscall."\n";system($syscall);}{#my $syscall="bowtie ebwt $_[0].index -1 <(gzip -dc $_[1]) -2 <(gzip -dc $_[2]) --best --strata -S |samtools view -Sbh | samtools sort ";my$syscall="bowtie $refIdx -1 $reads1 -2 $reads2 --threads 2 --all -m 10 --best --strata -S |samtools view -Sbh -F 4 - | samtools sort - $results.alignment";warn"#running bowtie/samtools alignment:".$syscall."\n";system($syscall);}{my$syscall="samtools index $results.alignment.bam";warn"running samtools index:".$syscall."\n";system($syscall);}}submkBreakFaBed{##cluster_id splitr_sequence splitr_count splitr_span_pvalue splitr_pos_pvalue splitr_min_pvalue adjacent altsplice break_adj_entropy1 break_adj_entropy2 break_adj_entropy_min breakpoint_homology breakseqs_estislands_percident cdna_breakseqs_percident deletion est_breakseqs_percident eversion exonboundaries expression1 expression2 gene1 gene2my@cluster_id=GetResult($_[0],'cluster_id');my@splitr_sequence=GetResult($_[0],'splitr_sequence');my@gene1=GetResult($_[0],'gene1');my@gene2=GetResult($_[0],'gene2');my@gene_name1=GetResult($_[0],'gene_name1');my@gene_name2=GetResult($_[0],'gene_name2');my$i=0;open(my$fa,'>',$_[0].".fa");open(my$bed,'>',$_[0].".bed");while($i<scalar(@cluster_id)){my$index=index($splitr_sequence[$i],"|");$splitr_sequence[$i]=~s/\|//g;my$ref=$gene_name1[$i]."_".$gene_name2[$i]."_".$cluster_id[$i];print$fa">".$ref."\n".$splitr_sequence[$i]."\n";print$bed"$ref\t0\t".($index)."\t".$gene_name1[$i]."\n";print$bed"$ref\t".($index)."\t".length($splitr_sequence[$i])."\t".$gene_name2[$i]."\n";$i++;}}subGetResult{my@Res;my$resultFile=$_[0];my$colname=$_[1];my$cluster_id=$_[2];openmy$in,'<',$resultFile;$_=<$in>;chomp;my@h=split"\t";my%hToI;{my$tmp=0;map{$hToI{$_}=$tmp;$tmp++;}(@h);}while(<$in>){chomp;my@row=split("\t");if(notdefined($hToI{$colname})){die"invalid colname:'$colname' is the file/header correct?header:".join("/",@h);}elsif(notdefined($hToI{'cluster_id'})){die"invalid 'cluster_id' is the file/header correct?";}elsif(defined($hToI{$colname})){push@Res,$row[$hToI{$colname}];}}return@Res;}
Last edit: terpstramm 2014-05-08
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have used defuse but for validation it is also handy to view the aligned reads versus the fusion products (Like IGV). Is there some integrated way of doing this? if not please give me a short description what I should do to get this working.
the simpelest way is probs a script like shown below. See the part following 'my $use=<<"END";' in the script for the manual.
this will create a DEFUSERESULTS.reference.fa to load as genome in IGV.
This will also create DEFUSERESULTS.bed for annotating the fusions and a DEFUSERESULTS.alignment.bam file containing the reads. Both to load as a file into IGV.
PS: the script is written to work in my case and does not check the commandline/files thoroughly. So i do not garantee this will work generally
Last edit: terpstramm 2014-05-08