From: Don G. <gil...@cr...> - 2007-10-05 19:45:24
|
Eric, I'm not sure of my recollection here, but I think you may be making more work of this than needed, by assuming this need isn't handled when the input analysis.gff contains only match_part entries: http://www.gmod.org/wiki/index.php/Chado_Best_Practices#Results_from_BLAST states that every hit should have an entry in the feature table as well as every hsp. I think the output of bp_search2gff.pl with only 'match_part' is correctly loaded into Chado by the gmod_bulk_load_gff3.pl script, creating all needed features, analysisfeatures, etc. By adding the extra 'match' fields, and esp. ID= and Parent= values to these blast results for loading to chado, it introduces complexity in the feature creation step. When these analyses results in GFF format have only Target= attributes, they are handled specially by gmod_bulk_load_gff3 to create the Chado parts needed for an analysis result. (see in lib/Bio/GMOD/DB/Adapter.pm) For instance when using tblastn of proteins to genome, one of the ID parts in this pattern "DDB0XXXXX.DDB00000902.blastn" is the genome backbone, and you can expect many separate blast hits for a protein to a genome, thus you would have to add locations to the ID string to make it unique. Instead the gmod_bulk_load_gff3 loader should generate unique ids for you when your Blast GFF contains no ID, but a suitable Target field. - Don |