Download Latest Version REDO.tar.gz (29.3 MB)
Email in envelope

Get an email when there's a new version of REDO

Home
Name Modified Size InfoDownloads / Week
README.txt 2016-06-01 29.8 kB
REDO.tar.gz 2016-06-01 29.3 MB
Totals: 2 Items   29.4 MB 0
NAME
====

REDO - RNA Editing Detection in Organelle 

VERSION
=======

Version 1.0

COPYRIGHT
=========

Copyright 2016 - Wanfei Liu & Qiang Lin - Beijing Institute of Genomics, Chinese Academy of Sciences.

This tool is a free package; you can redistribute it and/or modify it freely.

INTRODUCTION
============

REDO is a comprehensive application tool for identifying RNA editing events in organelles based on variant call results (VCF files). It is a suite of Perl scripts and can work easily and directly in any operating system installed Perl and R Environment. The stringent rule depended filters and statistical filters are used in REDO for reducing false positive rate. It can provide detailed annotations, statistics and figures for RNA editing sites. REDO also can detect RNA editing events in multiple samples simultaneously and identify differential proportion of RNA editing events in different samples. Moreover, the genome variation can be easily removed by a subprogram fish.pl in our package.

REDO is available at https://sourceforge.net/projects/redo/.

REDO only uses the variant calling format (VCF) files (records for all sites), the genome sequence file (for annotation) and the gene annotation file (for annotation, feature table file, http://www.ncbi.nlm.nih.gov/projects/Sequin/table.html) as input, then the raw variants are filtered by rule-depended filters and statistics filters for reducing the false positive rate as following steps. 1. Quality control filter: the low quality sites are filtered according to the reads quality (DP=0 in GATK (reads with MQ=255 or with bad mates are filtered) and DP4=0 in samtools (number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases)). 2. Depth filter: three measures are used in this filter, which are total reads depth (<4), alt reads depth (<3), and the total reads depth variation between alt site and ad-jacent sites without variant in specific window (total reads depth<0.2*average reads depth of adjacent sites). 3. Alt proportion filter: two measures are used in this filter, including alt proportion (<0.1) and the reduced alt proportion after removing possible sequencing error proportion obtained from the adjacent sites without variant in specific window (alt proportion ¨C error proportion<0.1). 4. Multiple alt filter: only the variant with one alt allele is kept for RNA editing detection. 5. Distance filter: the variant sites with short distance from each other (<3) are filtered due to the possible position obstruction for RNA editing. 6. Splice junction filter: variants within short splice anchor (<2) are removed. 7. Indel filter: the indel variants are removed in default (indels can be kept when assigning "-i yes" option). 8. Likelihood ratio test filter: a likelihood ratio (LLR) test is used for RNA editing sites (Chepelev, 2012; Sun, et al., 2016). LLR test is a probabilistic test incorporating error probability of bases (error probability is obtained using adjacent sites without variant in specific window) for detecting RNA editing sites (LLR<10). 9. Fisher exact test filter: we assessed the significance for a given RNA editing site (alt reads, ref reads) by comparing its expected levels (0, alt reads + ref reads) using the Fisher¡¯s exact test and the p-value of fisher exact test is used for filter (p-value>0.01). 10. The complicated model filter: based on the statistics results for ex-periment validated RNA editing sites and the attributes of codon table, a complicated filter model was built according to five characteristics of RNA editing sites, which are RNA editing types (C->T, A->G, T->C, etc), alt proportion, amino acids change, codon phase and hydropho-bic/hydrophilic change. 

PREREQUISITES
=============

Before running, the following software or program are required:
-	Perl Environment (perl v5.18.4 or later (it can be lower), tested for Windows and Linux)
-	Perl module: Getopt::Long; Text::NSP
-	R Environment (tested on R 3.1.2 or later (it can be lower) for Windows and Linux)
-	R module: graphics; grDevices

REDO have been tested on Windows and Linux.

INSTALLATION
============

This package doesn't need install. You can run it using absolute path after decompressed it. If you want to use it without absolute path, you should add the absolute path of REDO directory in PATH of your bash profile.

COMMAND LINE
============
	
Run REDO.pl without any parameter, it will print below usage on the screen.

*************
*1.0 version*
*************
************************************************************************
        Usage: REDO.pl -g genome_sequence -v variant_file -t tbl_file -o prefix_of_outfile -d reads_depth -c minimum_coverage_of_alt_allele -p alt_proportion -w window_size -s minimum_splice_anchor -a minimum_alt_distance -l llr_value -f fisher_pvalue -dv depth_variant -i identify_indel.
           -g: the absolute path of genome sequence.
           -v: the absolute path of variant file/files (for example, "-v test1.vcf -v test2.vcf -v test3.vcf ......").
           -t: the absolute path of tbl file (feature table file, http://www.ncbi.nlm.nih.gov/projects/Sequin/table.html).
           -o: the prefix of output file.
           -d: minimum reads depth (4).
           -c: minimum coverage of alternative (alt) allele (3).
           -p: minimum alt proportion (0.1).
           -w: minimum window size for calculating error rate and average depth (10).
           -s: minimum splice anchor size (2).
           -a: minimum alt distance (3).
           -l: minimum likelihood ratio (LLR) (10).
           -f: maximum p-value of fisher exact test (0.01).
          -dv: the depth variant (0.2).
           -i: identify indel (y or n, default n).
************************************************************************

Note: all files should be uncompressed files. For VCF file, it should include all positions with reads supports (samtools/bcftools default VCF output file or GATK output file with ¡°-ERC BP_RESOLUTION¡±). We can provide one VCF file or multiple VCF files simultaneously. tbl file is the standard feature table file in NCBI (http://www.ncbi.nlm.nih.gov/projects/Sequin/table.html), we can download it from each genome sequence record by click ¡°Send -> Complete Record -> File -> Format -> Feature Table -> Create File¡± easily in nucleotide database of NCBI. If you want run REDO multiple times with different parameters, please use ¡°-o¡± option to assign different prefix name for output files. For calculating the error rate and the average reads depth, a default 10bp size window is used. To filter false positive sites, we use 4 as the default minimum reads depth and 3 as the minimum alt allele reads coverage, because most of false positive variants identified by GATK and samtools come from the low depth and alt coverage records according to our previous study. RNA editing may need second structure or motif to recognize related enzymes and we didn¡¯t find the RNA editing sites with short distance from each other in several experiment data except for Arabidopsis (<3bp). Therefore, we use 3 as the minimum alt distance and 2 as the minimum splice anchor size. According to the test using experiment data, the likelihood ratio (LLR) should be equal or larger than 10 (default value). Moreover, according to the statistics results for experiment validated RNA editing sites and the attributes of codon table, we applied a complicated filter model. The filter model based on five characteristics of RNA editing sites, which are RNA editing types (C->T, A->G, T->C, etc), alt proportion, amino acids change, codon phase and hydrophobic/hydrophilic change. Using this filter model, we lost 0%~1.34% sensitivity, but increased 0.96%~10.45% precision rate and 0%~0.13% specificity percent. Even though, our sensitivity is comparable with REDItools. Furthermore, the chromosome ID in genome sequence file should be identical with vcf and tbl file. At present, we only identify C->T RNA editing sites with at least 0.50 editing proportion in tRNA and rRNA region.

INPUTFILES
==========

REDO need three input files:
-	VCF file (Variant Call Format, the result of GATK and samtools). REDO can accept one or multiple VCF files by option "-v".
-	Reference genome in Fasta format. Chromosome/region names must be equal to those in VCF file and tble file.
-	tbl file (feature table file, http://www.ncbi.nlm.nih.gov/projects/Sequin/table.html). It can be downloaded from NCBI directly.

Note: the chromosome ID in genome sequence file should be identical with vcf and tbl file.

OUTPUTFILES
===========

REDO mainly produces output files in simple tab-delimited table files.

1.	*_sample.out
The RNA editing result for each sample.
Main fields are:
-	Chr: chromosome name;
-	Strand: strand information ("+" or "-");
-	Name: gene name;
-	Genome_pos: nucleotide position in chromosome (1-based);
-	Gene_pos: nucleotide position in gene (1-based);
-	AA_pos: amino acid position in gene (1-based);
-	Phase: codon phase of editing site (1-based);
-	Ref->Alt: reference allele to editing allele;
-	RefCodon->AltCodon: reference codon to editing codon;
-	RefAA->AltAA: reference amino acid to editing amino acid;
-	AltRatio: editing reads proportion;
-	ErrorRatio: sequencing error rate obtained from the adjacent sites without variant in specific window (default window size is 10);
-	Depth: total reads number;
-	AverageDepth: average total reads number of adjacent sites without variant in specific window (default window size is 10);
-	RefRead: reference allele reads number;
-	AltRead: editing allele reads number;
-	AdjacentAlt: distance to adjacent candidate editing site;
-	LLR: a likelihood ratio (LLR) is a probabilistic test incorporating error probability of bases (error probability is obtained using adjacent sites without variant in specific window) for detecting RNA editing sites;
-	 Pvalue(fisher): p-value is obtained by assessing the significance for a given RNA editing site (alt reads, ref reads) by comparing its expected levels (0, alt reads + ref reads) using the Fisher¡¯s exact test;
-	RNAeditGC: GC content of RNA editing site in specific window (default 20bp);
-	GeneGC: gene GC content;
-	Exon_num: exon number of gene;
-	Exon_start: exon start/starts of gene (multiple exon starts are separated by comma) (1-based);	
-	Exon_end: exon end/ends of gene (multiple exon ends are separated by comma) (1-based);
-	Function: gene function.

2.	*.out
The integrated RNA editing result for all samples.
Main fields are:
-	Chr: chromosome name;
-	Strand: strand information;
-	Name: gene name;
-	Genome_pos: nucleotide position in chromosome (1-based);
-	Gene_pos: nucleotide position in gene (1-based);
-	AA_pos: amino acid position in gene (1-based);
-	Sample_num: total sample number of supporting RNA editing site;
-	Phase: codon phase of editing site (1-based);
-	Sample_name1: the detailed editing information in the sample, including Ref->Alt, RefCodon->AltCodon, RefAA->AltAA, AltRatio, ErrorRatio, Depth, AverageDepth, RefRead, AltRead, AdjacentAlt, LLR, Pvalue(fisher), RNAeditGC, and GeneGC separated by colon;
-	Sample_name2;
-	......
-	Exon_num: exon number of gene;
-	Exon_start: exon start/starts of gene (multiple exon starts are separated by comma) (1-based);	
-	Exon_end: exon end/ends of gene (multiple exon ends are separated by comma) (1-based);
-	Function: gene function.

3.	*.change_matrix
The statistics result file for RNA editing in each sample, including total number and percent for each RNA editing type.
Main fields are:
-	Prefix: prefix of sample name;
-	Column 2 to column 13 are 12 possible RNA editing types, the total RNA editing number and percent for each type are shown.

4.	*.stat
The statistics result file for RNA editing in each sample, including total editing number, indel editing number, editing number in different codon phases, silent number, non-silent number, and the detailed statistics for each RNA editing type.  
Main fields are:
-	Prefix: prefix of sample name;
-	Total: total RNA editing number;
-	Other: non-substitution (indel) RNA editing number;
-	Phase(1,2,3): total RNA editing number in phase 1, 2 and 3 of codons for substitution RNA editing;
-	Silent: silent RNA editing number for substitution RNA editing;
-	NonSilent: non-silent RNA editing number for substitution RNA editing;
-	A->C(silent): silent RNA editing number for A->C RNA editing;
-	A->C(non_silent): non-silent RNA editing number for A->C RNA editing;
-	A->C(1:hydrophobic2hydrophobic,2:hydrophilic2hydrophilic,3:hydrophobic2hydrophilic,4:hydrophilic2hydrophobic): the number of hydrophobic/hydrophilic changes due to A->C RNA editing in four types separated by comma; 
-	Column 10 to column 42 are RNA editing number in silent, non-silent and 4 types of hydrophobic/hydrophilic changes for other 11 RNA editing types.

5.	*_sample.R and *_sample.jpeg
*_sample.R is the R script for drawing the attributes of RNA editing in each sample while *_sample.jpeg is the figure produced by *_sample.R, which shown 16 subfigures to illustrate the RNA editing attributes.

6.	*_DPR.out
The pairwise sample comparison for differential proportion of RNA editing events.
Main fields are:
-	#sample name1 vs sample name2: the sample name of comparison;
-	Chr: chromosome name;
-	Position: nucleotide position in chromosome (1-based);
-	Proportion1: editing allele proportion in sample one;
-	Alt1: editing allele reads in sample one; 
-	Ref1: reference allele reads in sample one;
-	Proportion2: editing allele proportion in sample two;
-	Alt2: editing allele reads in sample two; 
-	Ref2: reference allele reads in sample two;
-	Fold(log2): log2 fold change comparing sample two to one;
-	Pvalue: p-value is obtained by assessing the significance for a RNA editing site in sample one (alt1 reads, ref1 reads) by comparing to sample two (alt2 reads, ref2 reads) using the Fisher¡¯s exact test.

7.	*_cluster.out, *_cluster.R and *_cluster.jpeg
*_cluster.out is a matrix for editing allele proportion of each RNA editing site in all samples. The columns are samples and the rows are RNA editing sites. *_cluster.R is the R script for doing the cluster analysis and *_cluster.jpeg is the cluster heatmap for RNA editing sites.

TEST & EXAMPLE
==============

REDO has been tested on coconut cp genome (KX028884) and related RNA-seq data (SRR1063404, SRR1063407, SRR1137438, SRR1173229, SRR1265939, SRR1273070, SRR1273180 and SRR606452, for comparing with REDItools) and Arabidopsis mt (NC_001284) and related RNA-seq data (SRR2079784, for comparing with REDItools). The cp RNA-seq data were mapped by GSNAP while mt RNA-seq data was mapped by BWA. All variants were identified by samtools. The small organelle genome, small VCF file and low memory request can make REDO run very fast in almost any computer (less than 2 minutes for 8 samples in coconut cp genome and less than 1 minute for 1 sample in Arabidopsis mt).

When you get the REDO package, decompress the directory, then you can test the program by following commands (just selected one command for each step).

1. Data pre-processing
1.1	Filter
java -jar trimmomatic-0.33.jar PE -threads 8 -trimlog SRR1063404.trimlog SRR1063404_1.fastq SRR1063404_2.fastq -baseout SRR1063404Filtered.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36&
1.2	Mapping
GSNAP:
gsnap -d coco_cp -a off -N 1 -A sam -t 8 --force-xs-dir --pairmax-rna=1000000 --split-output= SRR1063404_cp SRR1063404_1.paired.fq SRR1063404_2.paired.fq
BWA:
bwa mem -t 16 -M ath SRR2079784_1.paired.fastq.gz SRR2079784_2.paired.fastq.gz 1>SRR2079784.sam 2> SRR2079784.maperr &
1.3	Variant calling
Sam to bam: samtools view -b SRR2079784.sam -o SRR2079784.bam &
Sam sort: samtools sort -o SRR2079784.sorted.bam -O bam -T SRR2079784.temp SRR2079784.bam &
Sam mpileup: samtools mpileup -v -f arabidopsis_mt.fa SRR2079784.sorted.bam --output SRR2079784.sorted.vcf&
bcftools: nohup bcftools call -o SRR2079784.sorted.call.vcf -O v -c SRR2079784.sorted.vcf &

Note: we only used *.concordant_mult and *.concordant_uniq files as the final GSNAP mapping result while BWA default result was used as mapping result.

Table 1 The information of all test RNA-seq data
Species	Organelle	SRA accession No.	Length	Original fragments	High quality fragments	Percent	Mapping fragment	Percent
Coconut	cp	SRR1063404	202	36,009,632	32,555,041	90.41%	1,499,762	4.61%
		SRR1063407	202	35,467,948	32,141,745	90.62%	101,308	0.32%
		SRR1137438	152	50,839,994	42,267,444	83.14%	68,693	0.16%
		SRR1173229	202	119,333,177	113,394,045	95.02%	249,845	0.22%
		SRR1265939	202	51,540,183	48,892,847	94.86%	34,510	0.07%
		SRR1273070	337	40,564,276	37,752,443	93.07%	10,508	0.03%
		SRR1273180	252	60,030,680	54,291,251	90.44%	1,016,376	1.87%
		SRR606452	180	27,465,703	27,063,513	98.54%	548,364	2.03%
Arabidopsis	mt	SRR2079784	132	21,839,182	20,471,554	93.74%	1,796,028	8.77%

2. RNA editing identification:
2.1	REDItools
python REDItoolDenovo.py -i SRR2079784.sorted.bam -f arabidopsis_mt.fa -o SRR2079784 -c 4 >SRR2079784.nohup &
2.2	REDO
Coconut cp:
perl REDO.pl -v SRR1063404_cp.vcf -v SRR1063407_cp.vcf -v SRR1137438_cp.vcf -v SRR1173229_cp.vcf -v SRR1265939_cp.vcf -v SRR1273070_cp.vcf -v SRR1273180_cp.vcf -v SRR606452_cp.vcf -t cp.tbl -g cp.fsa -o coco_cp &
Arabidopsis mt:
perl REDO.pl -g mt.fa -v SRR2079784_mt.vcf -t mt.tbl -o ath_mt -a 0 -s 0&

Note: because Arabidopsis has a lot of RNA editing sites with short distance (adjacent RNA editing sites) or located in the junction points of transcripts in experiment data, we changed the default parameter -a and -s to 0 to recover these RNA editing sites.

Table 2 The information of RNA editing identification in test RNA-seq data
Species	Organelle	SRA accession No.	Samtools Variant	REDItools editing sites	Percent	REDO editing sites	Percent
Coconut	cp	SRR1063404	210	126	60.00%	96	45.71%
		SRR1063407	387	-	-	66	17.05%
		SRR1137438	197	-	-	51	25.89%
		SRR1173229	590	-	-	129	21.86%
		SRR1265939	477	-	-	49	10.27%
		SRR1273070	1426	375	26.30%	175	12.27%
		SRR1273180	229	-	-	88	38.43%
		SRR606452	247	-	-	68	27.53%
Arabidopsis	mt	SRR2079784	1219	553	45.37%	488	40.03%

3. RNA editing result:
3.1	Figures
	Figure 1 - The attributes of RNA editing sites in Arabidopsis mt.
	Figure 2 - The attributes of RNA editing sites in coconut SRR1063404.
	Figure 3 - The attributes of RNA editing sites in coconut SRR1273070.
	Figure 4 - The cluster of RNA editing sites in coconut cp based on 8 RNA-seq data according to the RNA editing proportions.
	Figure 5 ¨C The attributes comparison between true positive and false positive RNA editing sites in mt (not produced by this program).

3.2	Tables
	Table 3 - ath_mt_SRR2079784_mt.out (partial).
	Table 4 - coco_cp.out (partial).
	Table 5 - coco_cp.change_matrix.
	Table 6 - coco_cp.stat.
	Table 7 - coco_cp_DPR.out (partial).
	Table 8 - The RNA editing sites identification between REDO and REDItools in three analyzed RNA-seq datasets (without applying the complicated filter model).
	Table 9 - The RNA editing sites identification between REDO and REDItools in three analyzed RNA-seq datasets (with applying the complicated filter model).

3.2.1	Table 3 - ath_mt_SRR2079784_mt.out (partial).
#Chr	Strand	Name	Genome_pos	Gene_pos	AA_pos	Phase	Ref->Alt	RefCodon->AltCodon	RefAA->AltAA	AltRatio	ErrorRatio	Depth	AverageDepth	RefRead	AltRead	AdjacentAlt	LLR	Pvalue(fisher)	RNAeditGC	GeneGC	Exon_num	Exon_start	Exon_end	Function
ath	-	nad2	81279	19	7	1	C->T	CGG->TGG	R->W	0.18914845516202	0.000623221129939135	1327	1271.3	1076	251	9	1000	1.15601898625219e-81	55.00	39.93	5	81297,80132,333105,330306,328078,	81113,79740,332945,329735,327890,	NADH dehydrogenase subunit 2
ath	-	nad2	81270	28	10	1	C->T	CCA->TCA	P->S	0.552412645590682	0.000497481358414786	1202	1048.9	538	664	31	1000	1.19926472602608e-256	55.00	39.93	5	81297,80132,333105,330306,328078,	81113,79740,332945,329735,327890,	NADH dehydrogenase subunit 2
ath	-	nad2	81239	59	20	2	C->T	TCC->TTC	S->F	0.973550356052899	0.00103016776810268	983	930.3	26	957	30	1000	0	55.00	39.93	5	81297,80132,333105,330306,328078,	81113,79740,332945,329735,327890,	NADH dehydrogenase subunit 2
ath	-	nad2	81209	89	30	2	C->T	TCC->TTC	S->F	0.930120481927711	0.000477897934247639	415	417.6	29	386	1	1000	1.33197151054271e-203	45.00	39.93	5	81297,80132,333105,330306,328078,	81113,79740,332945,329735,327890,	NADH dehydrogenase subunit 2
......

3.2.2	Table 4 - coco_cp.out (partial).
#Chr	Strand	Name	Genome_pos	Gene_pos	AA_pos	Sample_num	Phase	SRR1063404_cp	SRR1063407_cp	SRR1137438_cp	SRR1173229_cp	SRR1265939_cp	SRR1273070_cp	SRR1273180_cp	SRR606452_cp	Exon_num	Exon_start	Exon_end	Function
cp	-	rpl23	2143	71	24	2	2	C->T:TCT->TTT:S->F:0.920792079207921:0:202:197.2:16:186:18:1000:4.31666359754032e-97:25.00:37.59						C->T:TCT->TTT:S->F:0.968599033816425:0.000216450216450216:414:451.9:13:401:18:1000:4.2221509942939e-224:25.00:37.59		1	2213,	1932,	ribosomal protein L23
cp	-	trnM-CAU	2399	49	?	1	?						C->T:?->?:?->?:0.75:0:12:12:3:9:27:28.7754103989239:0.000168259523489029:40.00:45.95			1	2447,	2374,	tRNA
cp	+	ycf2	2886	371	124	1	2						G->A:AGA->AAA:R->K:0.6:0.025:20:19.9:8:12:416:11.3538827969065:2.25475753840605e-05:45.00:37.66			1	2516,	9400,	hypothetical chloroplast RF21
cp	+	ycf2	2997	482	161	2	2				C->T:CCG->CTG:P->L:0.333333333333333:0:18:15:12:6:5:20.8308785164354:0.00953079178885632:50.00:37.66		C->T:CCG->CTG:P->L:0.75:0:12:13:3:9:5:28.7754103989239:0.000168259523489029:50.00:37.66			1	2516,	9400,	hypothetical chloroplast RF21
......

3.2.3	Table 5 - coco_cp.change_matrix (number and percent).
#Prefix	A->C(percent)	A->G(percent)	A->T(percent)	C->G(percent)	C->T(percent)	C->A(percent)	G->T(percent)	G->A(percent)	G->C(percent)	T->A(percent)	T->C(percent)	T->G(percent)
SRR1063404_cp	0(0)	0(0)	0(0)	0(0)	96(100)	0(0)	0(0)	0(0)	0(0)	0(0)	0(0)	0(0)
SRR1063407_cp	0(0)	0(0)	0(0)	1(1.52)	59(89.39)	0(0)	0(0)	5(7.58)	0(0)	0(0)	1(1.52)	0(0)
SRR1137438_cp	0(0)	0(0)	0(0)	0(0)	50(98.04)	0(0)	0(0)	0(0)	0(0)	0(0)	1(1.96)	0(0)
SRR1173229_cp	2(1.55)	5(3.88)	0(0)	2(1.55)	103(79.84)	0(0)	0(0)	10(7.75)	0(0)	1(0.78)	3(2.33)	2(1.55)
SRR1265939_cp	0(0)	0(0)	0(0)	0(0)	44(84.62)	2(3.85)	0(0)	2(3.85)	0(0)	0(0)	1(1.92)	0(0)
SRR1273070_cp	5(2.79)	9(5.03)	1(0.56)	1(0.56)	85(47.49)	3(1.68)	3(1.68)	43(24.02)	3(1.68)	2(1.12)	16(8.94)	4(2.23)
SRR1273180_cp	0(0)	0(0)	0(0)	0(0)	87(97.75)	0(0)	0(0)	1(1.12)	0(0)	0(0)	0(0)	0(0)
SRR606452_cp	0(0)	0(0)	1(1.47)	0(0)	65(95.59)	0(0)	0(0)	1(1.47)	0(0)	0(0)	0(0)	1(1.47)

3.2.4	Table 6 - coco_cp.stat #(1:hydrophobic2hydrophobic,2:hydrophilic2hydrophilic,3:hydrophobic2hydrophilic,4:hydrophilic2hydrophobic).
#Prefix	Total	Other	Phase(1,2,3)	Silent	NonSilent	A->C(silent)	A->C(non_silent)	A->C(1,2,3,4)	A->G(silent)	A->G(non_silent)	A->G(1,2,3,4)	A->T(silent)	A->T(non_silent)	A->T(1,2,3,4)	C->G(silent)	C->G(non_silent)	C->G(1,2,3,4)	C->T(silent)	C->T(non_silent)	C->T(1,2,3,4)	C->A(silent)	C->A(non_silent)	C->A(1,2,3,4)	G->T(silent)	G->T(non_silent)	G->T(1,2,3,4)	G->A(silent)	G->A(non_silent)	G->A(1,2,3,4)	G->C(silent)	G->C(non_silent)	G->C(1,2,3,4)	T->A(silent)	T->A(non_silent)	T->A(1,2,3,4)	T->C(silent)	T->C(non_silent)	T->C(1,2,3,4)	T->G(silent)	T->G(non_silent)	T->G(1,2,3,4)
SRR1063404_cp	96	0	15,73,9	9	87	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	9	87	15,10,1,61	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0
SRR1063407_cp	66	0	14,47,7	7	59	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	1	1,0,0,0	6	53	9,7,1,36	0	0	0,0,0,0	0	0	0,0,0,0	0	5	3,1,1,0	0	0	0,0,0,0	0	0	0,0,0,0	1	0	0,0,0,0	0	0	0,0,0,0
SRR1137438_cp	51	0	10,37,5	4	47	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	3	47	9,6,1,31	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	1	0	0,0,0,0	0	0	0,0,0,0
SRR1173229_cp	129	1	36,82,32	20	108	1	1	1,0,0,0	2	3	2,0,0,1	0	0	0,0,0,0	1	1	1,0,0,0	14	89	18,11,3,57	0	0	0,0,0,0	0	0	0,0,0,0	0	10	4,1,5,0	0	0	0,0,0,0	0	1	1,0,0,0	2	1	1,0,0,0	0	2	0,0,2,0
SRR1265939_cp	52	3	12,37,8	4	45	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	3	41	9,6,0,26	0	2	0,1,1,0	0	0	0,0,0,0	0	2	1,0,1,0	0	0	0,0,0,0	0	0	0,0,0,0	1	0	0,0,0,0	0	0	0,0,0,0
SRR1273070_cp	179	4	64,87,72	49	126	3	2	0,1,0,1	2	7	1,2,0,4	1	0	0,0,0,0	0	1	1,0,0,0	23	62	15,13,4,30	0	3	0,2,1,0	2	1	1,0,0,0	10	33	9,14,10,0	0	3	2,0,0,1	1	1	1,0,0,0	7	9	2,0,6,1	0	4	2,0,2,0
SRR1273180_cp	89	1	15,69,4	5	83	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	5	82	14,10,1,57	0	0	0,0,0,0	0	0	0,0,0,0	0	1	0,0,1,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0
SRR606452_cp	68	0	15,48,8	8	60	0	0	0,0,0,0	0	0	0,0,0,0	1	0	0,0,0,0	0	0	0,0,0,0	6	59	10,6,2,41	0	0	0,0,0,0	0	0	0,0,0,0	0	1	1,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	0	0	0,0,0,0	1	0	0,0,0,0

3.2.5	Table 7 - coco_cp_DPR.out (partial)
Chr	Position	Proportion1	Alt1	Ref1	Proportion2	Alt2	Ref2	Fold(log2)	Pvalue
#SRR1063404_cp vs SRR1063407_cp
cp	136348	0.5	6	6	0	0	0	NA	1
cp	143536	0.936986301369863	342	23	1	24	0	0.09	0.382540126150565
cp	76241	0.571428571428571	40	30	1	7	0	0.81	0.0386037932652878
cp	32743	0.466531440162272	230	263	0.8125	13	3	0.80	0.00902812345271057
......

3.2.6	Table 8 - The RNA editing sites identification between REDO and REDItools in three analyzed RNA-seq datasets (without applying complicated filter model).
Samples 	Software	Type	True	Total	True+	True-	False+	False-	Precision	Sensitivity	Specificity
Coconut cp1
(SRR1063404)	REDO	CTa	75	83	67	33349	16	8	80.72%	89.33%	99.95%
	REDItools			92	69	33340	23	6	75.00%	92.00%	99.93%
	Common			80	65	33352	15	10	81.25%	86.67%	99.96%
	REDO	Allb		84	67	33348	17	8	79.76%	89.33%	99.95%
	REDItools			98	69	33334	29	6	70.41%	92.00%	99.91%
	Common			84	67	33347	17	8	79.76%	89.33%	99.95%
Coconut cp2
(SRR1273070)	REDO	CT		54	23	33378	31	52	42.59%	30.67%	99.91%
	REDItools			57	22	33375	35	53	38.60%	29.33%	99.90%
	Common			49	20	33383	29	55	40.82%	26.67%	99.90%
	REDO	All		112	23	33320	89	52	20.54%	30.67%	99.73%
	REDItools			121	22	33311	99	53	18.18%	29.33%	99.70%
	Common			98	20	33334	78	55	20.41%	26.67%	99.74%
Arabidopsis mt
(SRR2079784)	REDO	CT	428c	413	358	29041	55	70	86.68%	83.64%	99.81%
	REDItools			430	362	29024	68	66	84.19%	84.58%	99.77%
	Common			409	356	29045	53	72	87.04%	83.18%	99.81%
	REDO	All		430	358	29024	72	70	83.26%	83.64%	99.75%
	REDItools			469	362	28985	86	66	80.80%	84.58%	99.70%
	Common			426	356	29028	70	72	83.57%	83.18%	99.76%

Note: REDItools was run with ¡°-c 4¡± option (total reads coverage>=4) and the result was filtered by p-value<=0.01. RED was run using ¡°-a 0, -s 0¡± option (minimum alt distance==0 and minimum junction anchor size==0) for mt data and default for cp data without applying the complicated filter model. a: CT means C->T type. b: All means all possible RNA editing types. c: There are 455 C->T RNA editing sites in Arabidopsis by cDNA clones sequencing while 428 C->T RNA editing sites are located in CDS region. We only used these 428 C->T RNA editing sites for mt RNA editing sites test.

3.2.7	Table 9 - The RNA editing sites identification between REDO and REDItools in three analyzed RNA-seq datasets (with applying complicated filter model).
Samples 	Software	Type	True	Total	True+	True-	False+	False-	Precision	Sensitivity	Specificity
Coconut cp1
(SRR1063404)	REDO	CTa	75	83	67	16	8	80.72%	89.33%	33349	99.95%
	REDItools			92	69	23	6	75.00%	92.00%	33340	99.93%
	Common			80	65	15	10	81.25%	86.67%	33352	99.96%
	REDO	Allb		83	67	16	8	80.72%	89.33%	33349	99.95%
	REDItools			98	69	29	6	70.41%	92.00%	33334	99.91%
	Common			80	65	15	10	81.25%	86.67%	33352	99.96%
Coconut cp2
(SRR1273070)	REDO	CT		44	22	22	53	50.00%	29.33%	33388	99.93%
	REDItools			57	22	35	53	38.60%	29.33%	33375	99.90%
	Common			39	19	20	56	48.72%	25.33%	29415	99.93%
	REDO	All		71	22	49	53	30.99%	29.33%	29383	99.83%
	REDItools			121	22	99	53	18.18%	29.33%	29333	99.66%
	Common			65	19	46	56	29.23%	25.33%	29389	99.84%
Arabidopsis mt
(SRR2079784)	REDO	CT	428c	406	356	50	72	87.68%	83.18%	29048	99.83%
	REDItools			430	362	68	66	84.19%	84.58%	29024	99.77%
	Common			402	354	48	74	88.06%	82.71%	29052	99.84%
	REDO	All		419	356	63	72	84.96%	83.18%	29035	99.78%
	REDItools			469	362	86	66	80.80%	84.58%	28985	99.70%
	Common			415	354	61	74	85.30%	82.71%	29039	99.79%

Note: REDItools was run with ¡°-c 4¡± option (total reads coverage>=4) and the result was filtered by p-value<=0.01. RED was run using ¡°-a 0, -s 0¡± option (minimum alt distance==0 and minimum junction anchor size==0) for mt data and default for cp data with applying the complicated filter model. a: CT means C->T type. b: All means all possible RNA editing types. c: There are 455 C->T RNA editing sites in Arabidopsis by cDNA clones sequencing while 428 C->T RNA editing sites are located in CDS region. We only used these 428 C->T RNA editing sites for mt RNA editing sites test.

SUBPROGRAM
==========

We also provide a subprogram named fish.pl for processing the REDO result tables easily and quickly. It can be used to find common and unique sites in two samples or remove genome variant in REDO result file by comparing with genome variant file. 

It can get unique bait/baits (one or multiple items in column) in bait file at first (such as chromosome, position, strand, and others), then these bait/baits are used to search the corresponding column in fish file in two measures, existing in bait file (de-fault) or not exiting in bait file ("-contrary" option).

CONTACT
=======

If you have any question or suggestion, please contact us.

Wanfei Liu & Qiang Lin
Email: <liuwf@big.ac.cn> & <linq@big.ac.cn>
Source: README.txt, updated 2016-06-01