From: Scott C. <cai...@gm...> - 2007-10-05 19:21:11
|
Hi Eric, I guess with name munging in blast to GFF converters, the question is, where should it be done? It seems like it should be added as an option to bp_search2gff.pl. Scott On Fri, 2007-10-05 at 14:02 -0500, Eric Just wrote: > On 10/5/07, Scott Cain <cai...@gm...> wrote: > > On Fri, 2007-10-05 at 13:25 -0500, Eric Just wrote: > > > Hi there, > > > > > > I'm continuing to have fun with the great GMOD documentation on the > > > wiki. Specifically the Load Blast Into Chado document: > > > > > > http://www.gmod.org/wiki/index.php/Load_BLAST_Into_Chado > > > > > > The best practices document here: > > > http://www.gmod.org/wiki/index.php/Chado_Best_Practices#Results_from_BLAST > > > > > > states that every hit should have an entry in the feature table as > > > well as every hsp. The bp_search2gff.pl script, when run as > > > documented does not generate a match feature in the gff file. I got > > > it to create a match feature by appending --match to the argument > > > list. This correctly generates a match hit, however there are a > > > couple of issues: > > > > > > > > > the gff3 looks like this: > > > ##gff-version 3 > > > DDB00000902 blastn match_part 47684 47916 233 + > > > 0 Parent=DDB0XXXXX;Target=DDB0XXXXX 153 385 > > > DDB00000902 blastn match_part 47448 47598 144 + > > > 0 Parent=DDB0XXXXX;Target=DDB0XXXXX 1 152 > > > DDB00000902 blastn match 5006 55498 462 + . > > > ID=DDB0XXXXX > > > > > > 1. The match feature is at the end of the gff file which is > > > problematic since it serves as the parent feature for match_parts. > > > Moving the match to the top of the file rectifies this > > > > Right, from the perspective of generating the GFF, putting it at the end > > is much easier. The gmod_gff3_preprocessor.pl script fixes this sort of > > thing. > > Can't believe I forgot about the preprocessor. Thanks, that worked. > > > > > 2. The ID field of the match feature that is generated is the same as > > > the ID of the query sequence from the BLAST report. Since I am > > > storing the query (an EST in this case) in the database, this creates > > > confusion because after loading this file, I now have two features > > > with the name and uniquename of DDB0XXXXX. (Didn't think it would get > > > away with it, but it loaded without complaining). > > > > The unique index on feature requires that uniquename, type_id and > > organism_id be unique. Since your target and match features had > > different type_ids it was cool with that. I agree that it is > > potentially confusing and your suggestion below would be a good option. > > Shall I write this into the preprocessor? Also might be worth writing > the 's/Target=Sequence:/Target=/' fix into the preprocessor. I can do > that too. > > Eric > > > Thanks, > > Scott > > > > > > > > Perhaps the generated ID could be the ID of the query sequence, the ID > > > of the hit feature, and the source field > > > > > > DDB0XXXXX.DDB00000902.blastn > > > > > > That way if multiple analyses are loaded, as long as they are loaded > > > with different source fields, they will be distinguishable in the > > > database and will have unique ID's. > > > > > > Just an idea. > > > > > > Thanks, > > > Eric > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by: Splunk Inc. > > > Still grepping through log files to find problems? Stop. > > > Now Search log events and configuration files using AJAX and a browser. > > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > _______________________________________________ > > > Gmod-devel mailing list > > > Gmo...@li... > > > https://lists.sourceforge.net/lists/listinfo/gmod-devel > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. ca...@cs... > > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > > Cold Spring Harbor Laboratory > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cai...@gm... GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory |