Menu

How to Visualise the Read aligning to the Fusion location

terpstramm
2014-04-04
2014-05-08
  • terpstramm

    terpstramm - 2014-04-04

    I have used defuse but for validation it is also handy to view the aligned reads versus the fusion products (Like IGV). Is there some integrated way of doing this? if not please give me a short description what I should do to get this working.

     
  • terpstramm

    terpstramm - 2014-05-08

    the simpelest way is probs a script like shown below. See the part following 'my $use=<<"END";' in the script for the manual.
    this will create a DEFUSERESULTS.reference.fa to load as genome in IGV.

    This will also create DEFUSERESULTS.bed for annotating the fusions and a DEFUSERESULTS.alignment.bam file containing the reads. Both to load as a file into IGV.

    PS: the script is written to work in my case and does not check the commandline/files thoroughly. So i do not garantee this will work generally

    #!/bin/perl -w
    use warnings;
    use strict;
    
    my $use=<<"END";
    use: $0 <DEFUSERESULTS> <READS1> <READS2> [OTHERREFFASTA]
    will create a valid breakpoint fasta and bedfile annotations and aligns the READS1 and READS2 using bowtie/samtools. Optionally other refs can be supplied (eg defuse.cdna.fa or hg19.fa)
    
    this will create a DEFUSERESULTS.reference.fa to **load as genome in IGV**. 
    
    This will also create DEFUSERESULTS.bed for annotating the fusions and a DEFUSERESULTS.alignment.bam file containing the reads. **Both to load as a file into IGV**.  
    
    
    examples
    use:$0 results.tsv reads1.fq reads2.fq 
    or (should be more accurate):
    use:$0 results.tsv reads1.fq reads2.fq hg19.fa defuse.cdna.fa
    
    will create fusion locations and align reads to the fusion locations using bowtie
    END
    
    warn $use."\n";
    die "not enough commandline options" if(scalar(@ARGV) < 3);
    #my $results = GetResult(@ARGV[0]);
    mkBreakFaBed($ARGV[0]);
    #best to include defuse.cdna.fa for multiple mapping
    mkAlignment(@ARGV);
    sub mkAlignment {
        my $results=shift @_;
        my $reads1=shift @_;
        my $reads2=shift @_;
        my @mfastas=@_;
    
        my $breakFa = $results.".fa";
        my $otherFa = join(' ',@mfastas);
        my $refFa=$results.".reference.fa";
        my $refIdx=$results.".index";
    
        {
            my $syscall="cat $breakFa $otherFa > $refFa";
            warn "#running prep reference:".$syscall."\n";
            system($syscall);
        }
        {
            my $syscall="bowtie-build $refFa $refIdx";
            warn "#running bowtie build:".$syscall."\n";
            system($syscall);
        }
        {
            #my $syscall="bowtie ebwt $_[0].index -1 <(gzip -dc $_[1])  -2 <(gzip -dc $_[2]) --best --strata -S |samtools view -Sbh | samtools sort ";
            my $syscall="bowtie $refIdx -1 $reads1 -2 $reads2 --threads 2 --all -m 10 --best --strata -S |samtools view -Sbh -F 4 - | samtools sort - $results.alignment";
    
            warn "#running bowtie/samtools alignment:".$syscall."\n";
            system($syscall);
        }
        {
            my $syscall="samtools index $results.alignment.bam";
            warn "running samtools index:".$syscall."\n";
            system($syscall);
        }
    
    }
    sub mkBreakFaBed {
        #
        #cluster_id splitr_sequence splitr_count    splitr_span_pvalue  splitr_pos_pvalue   splitr_min_pvalue   adjacent    altsplice   break_adj_entropy1  break_adj_entropy2  break_adj_entropy_min   breakpoint_homology breakseqs_estislands_percident  cdna_breakseqs_percident    deletion    est_breakseqs_percident eversion    exonboundaries  expression1 expression2 gene1   gene2
        my @cluster_id=GetResult($_[0],'cluster_id');
        my @splitr_sequence = GetResult($_[0],'splitr_sequence');
        my @gene1 = GetResult($_[0],'gene1');
        my @gene2 = GetResult($_[0],'gene2');
        my @gene_name1 = GetResult($_[0],'gene_name1');
        my @gene_name2 = GetResult($_[0],'gene_name2');
        my $i = 0;
        open(my $fa, '>',$_[0].".fa");
        open(my $bed, '>',$_[0].".bed");
        while($i < scalar( @cluster_id)){
            my $index = index($splitr_sequence[$i],"|");
            $splitr_sequence[$i] =~ s/\|//g;
            my $ref = $gene_name1[$i]."_".$gene_name2[$i]."_".$cluster_id[$i];
    
            print $fa ">".$ref."\n".$splitr_sequence[$i]."\n";
    
            print $bed "$ref\t0\t".($index)."\t".$gene_name1[$i]."\n";
            print $bed "$ref\t".($index)."\t".length($splitr_sequence[$i])."\t".$gene_name2[$i]."\n";
    
    
            $i++;
        }
    }
    
    sub GetResult {
        my @Res;
        my $resultFile = $_[0];
        my $colname  = $_[1];
        my $cluster_id = $_[2];
        open my $in,'<',$resultFile;
        $_=<$in>; 
        chomp;
        my @h = split "\t";
        my %hToI;
    
        {
            my $tmp=0;
            map{$hToI{$_}=$tmp; $tmp++;}(@h);
        }
        while(<$in>){
            chomp;
            my @row = split("\t");
            if( not defined($hToI{$colname}) ){
                die "invalid colname:'$colname' is the file/header correct?header:".join("/",@h);
            }elsif(not defined($hToI{'cluster_id'})){
                die "invalid 'cluster_id' is the file/header correct?";
            }elsif(defined($hToI{$colname})){
                push @Res, $row[$hToI{$colname}];
            }
        }
        return @Res;
    }
    
     

    Last edit: terpstramm 2014-05-08

Log in to post a comment.

MongoDB Logo MongoDB