From: Steve F. <sfi...@pc...> - 2004-10-08 18:44:16
|
Arnaud- how about this for arg names: - queryIdColumnName - subjectIdColumnName (and give a nice string in the h=> ) also, about the use statements. those are wrong and naughty. they are presuming that the table names passed in as args is one or another of the objects in the use statements. we should remove them. instead, we need to do the use at "runtime" not "compile time." here, i think (good luck!), is how: my $args = ??? # here construct the args needed to find the object in the db my $require_q = "{require $queryTable; $queryTable->new($args) }"; my $queryobj = eval $require_q; and the same for the subject steve Arnaud Kerhornou wrote: > Ok, > > Steve Fischer wrote: > >> arnaud- >> >> ok, would you like to do the upgrade to LoadBlastSimFast? >> > So the plugin requires two extra parameters, '-queryIdAttr' and > '-subjectIdAttr', is that right ? > >> if i understand correctly this would avoid having to modify the >> schema, is that right? > > > I think so. > > Another thing, the plugin requires 'use' statements for loading the > sequence objects we want to attach similarity data to. > Could we bypass somehow this declaration as in theory we would want to > attach similarity data to any view on the top of NASequenceImp or > AASequenceImp. By instanciating AASequence or NASequence superclass > objects and using the subclass_view attribute to affect to the correct > view the data, would it be feasible this way ? > >> >> steve >> >> Arnaud Kerhornou wrote: >> >>> >>> Steve Fischer wrote: >>> >>>> Arnaud- >>>> >>>> ok, i've looked at LoadBlastSimFast.pm. I see the addition of >>>> the logic to use name to get the query object (and i know that we >>>> discussed this in mail back in august). >>>> >>>> I am having some second thoughts about that change as it stands. >>>> The original intent of the plugin was that the sequences submitted >>>> to the blast process have been extracted from the database and >>>> therefore have the primary key in their definition line. >>>> >>>> I think I understand that it would be useful to be able to skip >>>> that step, ie, blast sequences using their native identifiers, and >>>> then have the plugin discover what their internal primary key >>>> is. That's what you want to do right? >>>> >>> that's right. >>> >>>> Does anybody know of any reason why that would not be ok? >>>> >>>> Assuming that nobody has any objections, maybe the best solution >>>> would be to improve the plugin to take optional arguments that >>>> specify the name of the query and/or subject identifier >>>> attributes? For example: -queryIdAttr source_id. This would >>>> give us full flexibility (and also avoid the slightly risky >>>> assumption that a digits-only identifier must be the primary key) >>>> >>> that sounds sensible. >>> >>>> steve >>>> >>>> Arnaud Kerhornou wrote: >>>> >>>>> >>>>> Steve Fischer wrote: >>>>> >>>>>> Arnaud- >>>>>> >>>>>> see below. >>>>>> >>>>>> steve >>>>>> >>>>>> Arnaud Kerhornou wrote: >>>>>> >>>>>>> Hi everyone >>>>>>> >>>>>>> To be able to reproduce the OrthoMCL method, I would like to >>>>>>> raise two issues we've got: >>>>>>> >>>>>>> * The first issue relates to the view where are stored the >>>>>>> protein sequences. I was thinking to use the >>>>>>> TranslatedAASequence view as this one contains the translated >>>>>>> sequences of our gene models. The problem I have is that it is >>>>>>> missing a name attribute so I can not match the blast output >>>>>>> query and subject names with the data into GUS (I didn't want to >>>>>>> use the TranslatedAASequence primary keys as the identifiers of >>>>>>> my proteins of interest). >>>>>>> Could we add a name attribute to this view ? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> hmm. not quite following. what would this name be, where >>>>>> would it be derived from? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> By default we assign the systematic id of the corresponding CDS to >>>>> the protein name. >>>>> >>>>>> why not use source_id and/or secondary_identifier? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> We could do that, but in any case that would involve to modify the >>>>> code of the loading BLAST output plugin (LoadBlastSimFast.pm) to >>>>> get the sequences entries. At the moment the match is made on the >>>>> primary key (which I want to avoid) or the name attribute. The >>>>> source_id attribute would do instead of the name attribute. It >>>>> must work for any blast (DNA Vs DNA or Protein Vs Protein) with >>>>> the various potential GUS sequence objects we want to attach >>>>> similarity data to. As far as I can see the source_id attribute is >>>>> present in all of them (AASequenceImp and NASequenceImp tables). >>>>> >>>>>> or, presumably this translated sequence has a relationship back >>>>>> to its na sequence (although i don't immediately see that in the >>>>>> schema browser), so couldn't you get a name or source_id from there? >>>>>> >>>>> That would require a more sophisticate query to get the sequence >>>>> entry. >>>>> >>>>>>> >>>>>>> * The second issue relates to the BLAST output parsing, done by >>>>>>> a module called BlastAnal.pm in the CBIL package. This module >>>>>>> seems to parse BLAST output file with only one query sequence. I >>>>>>> have more than one query sequence reported so I had to change >>>>>>> the code of this module to allow more than one query sequence. >>>>>>> Can my code be integrated to CBIL package ? Note that I didn't >>>>>>> change the interface of this module so it doesn't affect the >>>>>>> scripts that are using it, I'm thinking in particular of >>>>>>> parseBlastFilesForSimilarity.pl >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> this sounds ok. how about we just take a quick look at this >>>>>> together while you are visiting? then we can fold it into the >>>>>> code base. do you want to send it by mail? >>>>>> >>>>> That's fine, the module is attached. >>>>> >>>>>>> >>>>>>> cheers >>>>>>> Arnaud >>>>>>> >>>>>>> >> > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out > more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |