Name | Modified | Size | Downloads / Week |
---|---|---|---|
1OUTFILES | 2018-01-02 | 508 Bytes | |
CoreAlyze-1.1.1.tar | 2018-01-01 | 187.2 MB | |
0README | 2018-01-01 | 1.7 kB | |
Totals: 3 Items | 187.2 MB | 0 |
CoreAlyze CoreAlyze analyzes sets of predicted proteins from eukaryotic species by comparing them to a set of 248 widely conserved orthologs (PMID: 17332020). It compares up to 40 gene sets with each other at a time. CoreAlyze is useful for evaluating the quality of genome annotations, and their underlying assemblies. It is also good for checking the outcome of gene predictions from assembled RNA-seq transcripts. The results are presented as a bar plot showing the number of conserved orthologs and their fragmentation. To test and run CoreAlyze: In a unix shell cd test_CoreAlyze To test: source coralyze.cmd corealyze.cmd: perl ../CoreAlyze_03.pl listfile listfile is a list of protein fasta files, which represent the predicted set of proteins in the genome. Test fasta files are included. CoreAlyze will take about ten minutes to run as the thousands of sequences are BLASTED. To speed things up for quicker tests reduce the number of files in the list file To run on your own fasta files: Add your fasta files to the test_Coralyze directory Make a file containing the names of your fasta files. Give this file as an argument to CoreAlyze_03.pl perl ../CoreAlyze_03.pl your_listfile CEGs are missing from some genuses, so for comparison include a protein fasta file from trusted annotation from a species related to those you are evaluating. CoreAlyze comes with a BLAST installation for convenience, and requires that R is installed on your computer. CoreAlyze has two parts: a perl script for analysis, and an R script for plotting. The perl script calls BLAST and the R script. If there are fasta files representing more than 40 genomes, the barplot becomes overcrowded.