Home
Name Modified Size InfoDownloads / Week
1OUTFILES 2018-01-02 508 Bytes
CoreAlyze-1.1.1.tar 2018-01-01 187.2 MB
0README 2018-01-01 1.7 kB
Totals: 3 Items   187.2 MB 0
CoreAlyze

CoreAlyze analyzes sets of predicted proteins
from eukaryotic species by comparing them to a set of 
248 widely conserved orthologs (PMID: 17332020). It compares up to 40 gene sets
with each other at a time.

CoreAlyze is useful for evaluating the quality
of genome annotations, and their underlying assemblies. 
It is also good for checking the outcome
of gene predictions from assembled RNA-seq transcripts.

The results are presented as a bar plot showing the number
of conserved orthologs and their fragmentation.

To test and run CoreAlyze:

In a unix shell cd test_CoreAlyze

To test:
source coralyze.cmd

corealyze.cmd: perl ../CoreAlyze_03.pl listfile 

listfile is a list of protein fasta files, which represent the predicted
set of proteins in the genome. Test fasta files are included.

CoreAlyze will take about ten minutes to run as the thousands of sequences are BLASTED.
To speed things up for quicker tests reduce the number of files in the list file

To run on your own fasta files:
Add your fasta files to the test_Coralyze directory
Make a file containing the names of your fasta files. Give this file as an argument to CoreAlyze_03.pl

perl ../CoreAlyze_03.pl your_listfile 

CEGs are missing from some genuses, so for comparison include a protein fasta
file from trusted annotation from a species related to those you are evaluating.

CoreAlyze comes with a BLAST installation for convenience, and requires that
R is installed on your computer.

CoreAlyze has two parts: a perl script for analysis, and an R script for plotting. The perl
script calls BLAST and the R script.

If there are fasta files representing more than 40 genomes, the barplot becomes overcrowded.
Source: 0README, updated 2018-01-01