Re: [Gusdev-gusdev] format of sequences for blast

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Jessie-

Jessica Kissinger wrote:
>     If memory serves me correct, once sequences were loaded into GUS, we 
> then retrieved them along with their GUS ID to submit for blast searches.

Yes, I think that's right.

>     When we retrieved the sequences, we created a custom format for the 
> header line, such that the blast results once generated for these 
> sequences could be easily parsed and loaded with the existing plug-in.
>
>     Can someone tell me what the format of the fasta header should be, 
> i.e is it ">GUSID, External_NA _sequence Name" or the other way around 
> and should there be any formatting, tabs, spaces etc.  If I remember 
> correctly, the blast results were loaded by GUSID not "name", but I 
> don't remember.

My recollection is that the defline started as you said, with ">GUSID ".
I don't believe that the format is crucial, because (again, from what
I remember) when you run the plugin to load the BLAST similarities you
supply it with a regular expression that it uses to pick the GUSID
(an na_sequence_id for most of the PlasmoDB searches) out of the defline.
So as long as the regex matches the defline format, you should be OK,
and I don't think that the plugin uses anything on the defline except
for the GUSID.

Jonathan