Menu

accessing support information for false negative fusions

eearley
2013-11-06
2013-11-13
  • eearley

    eearley - 2013-11-06

    I'd like to troubleshoot false negative events. That is, I'm trying to figure out why a certain known fusion in my sequences was not supported. Is there a way for defuse to spit out raw support for all potential fusions? For example, can I see - for a particular gene/genes:

    • the amount of discordant reads support for all sites
    • of chimeric reads by site

    • general alignment coverage by position
    • etc.?

    Or are there other ways to diagnose false negatives?
    Thanks!

     
  • Andrew

    Andrew - 2013-11-12

    Given a gene pair, the easiest way to identify the number and alignment location of discordant reads between the genes is to search the clusters file. Unfortunately the file contains alignments to cdna and the genome, so you can't do something as simple as a grep. Ive attached a perl script to help, the usage should be reasonably intuitive, for example:

    cat clusters | perl select_genepair_clusters.pl Homo_sapiens.GRCh37.69.gtf ENSG00000141376 ENSG00000108510 1000

    will print all clusters with alignments between genes ENSG00000141376 and ENSG00000108510 (plus or minus 1000nt) where Homo_sapiens.GRCh37.69.gtf is the gtf file in the defuse dataset.

    The format is tab delimited with fields:
    cluster id
    cluster end (0 or 1)
    read id
    read end (0 or 1)
    reference name (cDNA or chromosome)
    strand
    alignment start
    alignment end

    If you suspect the fusion might produce a very low number of discordant reads (because of low expression or breakpoint near the start or end of the fusion gene) you might want to rerun the clustering with a lower threshold on number of reads. If you dont want to rerun defuse, just rerun the clustering:

    grep clustermatepairs log/defuse.log

    will produce a command for each chromosome pair. Rerun clustermatepairs with -m 1 for all clusters of size 1 or greater and the results will appear in a clusters*.tmp file.

     
  • Andrew

    Andrew - 2013-11-13

    An additional note, there is substantial heuristic filtering at the alignment stage in alignjob.pl. For instance, read pairs for which both ends align within the same ensembl gene are filtered. This has caused problems previously when erroneous genes in the ensembl database that are annotated to originate from large regions of a chromosome have resulted in filtering real gene fusion evidence, such as TMPRSS2-ERG getting filtered in ensembl version 67.

    If you are interested in trying defuse without this filtering, unfortunately the only option is quite manual, you will have to edit alignjob.pl such that filtered read id filenames are not pushed to @concordant_readids, and rerun defuse from scratch.

     

Log in to post a comment.

MongoDB Logo MongoDB