CADBURE Wiki

Tool for evaluating aligner performance on your RNA-Seq dataset.

Brought to you by: liangc_mu, rajkump-praveen

CUDBURE Q&A

Authors:

[1] What is CADBURE?

CADBURE is a bioinformatics tool for evaluating spliced aligner performance on the user data. It compares the given two alignment results based on the relative reliable mapping and advantageous mapping in the predefined scenarios. It outputs the read name lists associated with the scenarios, so the user can view the mappings in any BAM viewer.

[2] Why should I use CADBURE?

In RNA-Seq study for differential expression, mapping of cDNA data set to the reference genome or transcriptome is an important step that will decide the outcome of your study. There are plenty of spliced aligners available ( http://wwwdev.ebi.ac.uk/fg/hts_mappers/) and choosing one that will give better performance on your data is not trivial. Moreover, aligner's performance depends on the properties of data. The solution is evaluating spliced aligner performance on your data to decide on one. CADBURE is an easy to use and simple tool for comparing alignment results. CADBURE outputs discrete measures of specificity and accuracy for each compared aligner which will help you decide. Moreover, CADBURE distribution comes with R script that will help you do bootstrap stats on the difference of measures (both specificity and accuracy) to delineate statistical significance.

[3] How can I cite CADBURE?

If you use CADBURE please cite, as following

Praveen Kumar Raj Kumar, Thanh V. Hoang, Michael L. Robinson, Panagiotis A. Tsonis, Chun Liang: CADBURE: A generic tool to evaluate the performance of spliced aligners on RNA-Seq data, in press. Scientific Reports, 5:13443, DOI: 10.1038/srep13443.

[4] How do I install CADBURE?

CADBURE itself does not need installation as long as Perl is installed. However you would need two Perl Modules (Bio::DB::Sam and HTML::Table), available from CPAN (http://www.cpan.org/).

[5] How do I execute CADBURE?

You can execute it like
./CADBURE
Or
perl CADBURE
Executing it without the flags prints the usage.

[6] What are the Inputs and outputs of CADBURE?

The input is two alignment results, which should be in BAM format. For faster processing we recommend the input to be sorted by its read name (see Samtools). The summary output displaying the number of read mapping involved in contrasting scenarios are presented in html format. The read name lists involved in different scenarios are presented as text format in different folders. The read name list will be useful for viewing contrasting mapping scenarios using any bam viewer.

[7] What is the input format for CADBURE?

Alignment results should be in BAM format. For faster processing we recommend the input to be sorted based on read. See samtools ( http://samtools.sourceforge.net/) for how to do the conversion of SAM format to BAM format.

[8] Is CADBUREE complicated to use?

CADBURE is one easier tool for evaluating spliced aligner performance on your data as it can be executed in one line. The use of CADBURE is simple because you do not need to rerun aligners or random sub-sample result or simulate chromosome. Since you do not need to simulate chromosome you can use any functional feature for generating alignment result like annotation guided mapping or SNP tolerance mapping.

[9] How to evaluate statistical significance using CADBURE?

CADBURE distribution comes with R script for doing bootstrap stats on the output to assign its statistical significance. This R script takes in TP, FP and TN of each aligner output by CADBURE. Differences in both Specificity and Accuracy results between aligners will be assessed by building 95% bootstrap confidence intervals (10,000 bootstrap samples) for the true difference. The script outputs the 95% confidence interval. Results should be deemed significant at the 5% significance level if the associated 95% CI for the differences fails to contain zero.

[10] Is there any limit for size of BAM file?

No. There is no limit.

I see a huge number for particular scenario in an aligner. How can I see those mappings?
Find the corresponding “scenario_number” folder and inside it you can see files containing list of reads involved in the scenario for your aligner. Depending upon the scenario there can be either one file or two files for the aligners. In either case the file name should say this. Upon opening your BAM file of interest in any BAM viewer like Tablet ( http://ics.hutton.ac.uk/tablet/) for example, use the read to search for mapping (Figure 1).

[11] Installation Problem

When I tried to install “Bio::DB::Sam” I get error relating to “khash.h” being not found?
This probably means your system has old version of Genome modelling tools called Pindel (http://gmt.genome.wustl.edu/packages/pindel/). Easy fix for this problem is that to get Bio-SamTools from here https://github.com/gitpan/Bio-SamTools.

I get an error like "Unable to create Scenario_1"?
Please make sure you have proper permissions to create folders. If not use “sudo” to execute CADBURE.

[12] Whom to contact to report a probelm?

Please contact the developer, Praveen [praveenk at pitt.edu] OR the lab direcor, Chun Liang [liangc at miamioh.edu]