|
From: Don G. <gil...@bi...> - 2004-06-12 18:32:04
|
What blast or other protocols do we want for cross-species gene similarity computes, so that any model organism database would be happy using the results? Any of you who do or have done such computations, please comment on your criteria and methods for calculating gene similarities. For euGenes.org's gene similarity computes, we've used just a simple protein-gene species x species blast (see http://euGenes.org:7072/all/hgsummary.html) This is a computed gene homology or similarity using reference protein sequences identified by the source databases. A BLAST of all these sequences is computed, each sequence against all others, using specific parameters given below (specifically an E value at or below 1e-30). The sequences used in these calculations are found in the "Reference proteins" data files in each organism's folder. This summary is determined from the "Homologous genes table" data files also in each folder. The counts in the [above url] table are of the number of available genes for an organism, and those which have one or more significant homologs in the other organisms. Percentages are the count of genes with any homolog (one or several) in another organism, divided by the total available genes in that organism (x 100). For gene reports, we use the criteria of picking the most significant (E value) match from each species, plus any other matches within 10% of that E value. Similarity computations, using NCBI BLASTP 2.2.6 : orglist=(Fruitfly Human Mouse Mosquito Weed Worm Yeast Zebrafish Rat Rice E_coli Chimp) foreach org1 in (orglist) foreach org2 in (orglist) blastall -v 10 -b 10 -m 9 -a 4 -p blastp -e 1e-30 -d org1/refprot.fasta -i org2/refprot.fasta end end -- Don -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gil...@in...--http://marmot.bio.indiana.edu/ |