SimplyTheBlast: a small perl tool to build genes presence/absence matrices over a set of Fasta formatted genomes.
This code requires:
Bio::SeqIO;
Bio::Perl;
Bio::Tools::Run::StandAloneBlast;
Bio::Seq;
Bio::Tools::Blast;
Bio::DB::GenBank;
Bio::DB::WebDBSeqI;
and BLAST 2.2.28 (blastall and formatcmd) installed and reachable from your command line
Usage: perl SimplyTheBlast-Align.pl <fasta formatted seeds file> <path to genomes folder> <Alignment length threshold in %> <Alignment identity threshold in %>
OR
Usage: perl SimplyTheBlast-Evalue.pl <fasta formatted seeds file> <path to genomes folder> <Evalue threshold>
Genomes files names must end with *.faa
Output files:
TABULAR_FBH_OUTPUT.xls is an Excel readable file with the identifier of the best hits found
TABULAR_FBH_OUTPUT.csv is a file with the number of the best hits found
query_n* files are fasta formatted files with the sequences of the best hits found
bugs /comments:
marco.fondi@unifi.it
Downloads:
0 This Week