From: Chris S. <sto...@pc...> - 2007-01-12 22:24:18
|
Hi Dhivya, I'm putting this back onto the gusdev so that answers may help others (or others can correct my answers). I should also warn you that as PI of the project, I don't actually run any of the code so this is a test of how well I understand what's going on. ;) > sres.externaldatabaserelease.version for this instance of NRDB > sres.externaldatabase.name for NRDB To load nrdb or any other external "database" (really data source), that source needs to be entered into externaldatabase and the version (can simply be date when you downloaded it) entered into databaserelease. These can be entered manually into those tables. If this is a new version, then NRDB should already be in externaldatabase. You simply need to enter a new row in externaldatabaserelease for NRDB and give whatever you put in the version field. > pathname for the gi_taxid_prot.dmp file I'm guessing that this is a pointer to the dump file that comes with NRDB providing the taxon_id for each protein sequence but that's just a guess. Chris On Jan 12, 2007, at 5:01 PM, Dhivya Aras wrote: > Hi Chris, > > I'm trying to load a new NRDB version into gus using loadNRDB > plugin. It requires several compulsory input parameters. I dont > understand what three of them are. The ga help describes it as: > > externalDatabaseVersion *string* (Required) > > sres.externaldatabaserelease.version for this instance of NRDB > > > gitax *file* (Required) > > pathname for the gi_taxid_prot.dmp file > > and > > externalDatabaseName *string* (Required) > > sres.externaldatabase.name for NRDB > > Could you point me some docs or information about these arguments? > thanks > dhivya > > > > Chris Stoeckert <sto...@pc...> wrote: > Hi Brian, > Dhivya found the Djob plugin but did not find any documentation on > how to run. Can you point him at the appropriate place or person? > Can this be added to the GUS svn somewhere? > Thanks, > Chris > > Chris Stoeckert, Ph.D. > Research Professor, Dept. of Genetics > 1415 Blockley Hall, Center for Bioinformatics > 423 Guardian Dr., University of Pennsylvania > Philadelphia, PA 19104 > Ph: 215-573-4409 FAX: 215-573-3111 > http://www.cbil.upenn.edu > > > On Jan 11, 2007, at 8:15 AM, Brian Brunk wrote: > >> blastSimilarity does not appear to be in the GUS distribution, nor >> is it in CBIL/Bio. In my project_home it is in DJob. Seems to me >> like blastSimilarity should be in the GUS distribution that one >> can download from the gusdb.org site (or check out of the >> repository). I also have a script called >> parseBlastFilesForSimilarity.pl that takes in BLAST file names on >> stdin that is very useful for parsing blast files into the format >> to be loaded into the db that could be included. >> >> -Brian >> >> On Jan 10, 2007, at 4:52 PM, Chris Stoeckert wrote: >> >>> Hi, >>> Can anyone help with this question about the input file to >>> InsertBlastSimilarities? >>> Thanks, >>> Chris >>> >>> Begin forwarded message: >>> >>>> From: Dhivya Aras <dhi...@ya...> >>>> Date: January 10, 2007 4:25:25 PM EST >>>> To: Chris Stoeckert <sto...@pc...> >>>> Subject: Re: [GUSDEV] loading COG and blast annotation results >>>> into GUS >>>> >>>> Hi Chris, >>>> >>>> I understand that to load the blast data into the two tables, >>>> similarity and similarityspan, I need to use the gus supported >>>> plugin, InsertBlastSimilarities. But, this plugin asks for an >>>> input file 'generated by the blastSimilarity command >>>> (distributed with GUS in the CBIL/Bio component)'. Any idea >>>> where I can find this blastSimilarity utility? >>>> >>>> Thanks >>>> dhivya >>>> >>>> >>>> Chris Stoeckert <sto...@pc...> wrote: >>>> Again, answers in-line. >>>> Chris >>>> >>>> On Jan 9, 2007, at 2:27 PM, Dhivya Aras wrote: >>>> >>>>> Hi Chris, >>>>> >>>>> I did look at the GUS schema browser- unfortunately most tables >>>>> dont have any documentation- I could just see the attributes in >>>>> each table and maybe a small description of the attribute. >>>>> >>>>> But thanks to your reply, I do understand the necessity for the >>>>> two tables , similarity and similarityspan now. The way I >>>>> understand it- the query_table_id points to a table in which >>>>> the query sequence data is and the query_id indicates the row >>>>> in that table. So, for example, if the query_table_id points to >>>>> externalNASequence, I'm assuming the query_id points to the >>>>> primary key of that table, Na_Sequence_ID. Am I right in this >>>>> assumption? >>>> >>>> yes that's right. >>>> >>>>> Basically, I have an exisiting gus db with data and I have some >>>>> new blastp results of AA sequences against NRDB. Here's what I >>>>> think needs to be done to put these new blast results into the >>>>> GUS db. Please fill in gaps as I'm vague on some areas. >>>>> >>>>> 1. Store each hsp in the similarityspan table. I've mapped all >>>>> the blast fields to the table's fields- thats not a problem. >>>> yes >>>> >>>>> 2. SInce the query is an AA sequence, which table should the >>>>> query_table_id point to? TranslatedAASequence with the query_id >>>>> pointing to AA_Sequence_id? >>>> yes (assuming you are doing a blastp with a sequence from >>>> TranslatedAASequence - note that AASequence could also come from >>>> other views of AASequence). >>>> >>>>> 3. Since the subject is from the NRDB, I'm guessing that the >>>>> query_table_id should point to externalAASequence with the >>>>> query_id pointing to AA_Sequence_ID. >>>> >>>> yes (assuming you loaded nrdb into ExternalAASequence). >>>> >>>>> 4. I think these are the only tables I would be affecting for >>>>> adding these new blastP results. Am I right? >>>> >>>> yes (mostly). Using ga you'll also get audit tables populated >>>> like algorithm invocation. >>>> >>>>> I know I've asked quite a few questions, but I'm really not >>>>> able to find too much information on what the tables and fields >>>>> mean and what they contain. So I'm hoping you can help me out. >>>> No problem - we need to improve the docs. >>>> >>>>> thanks >>>>> dhivya >>>>> >>>>> Chris Stoeckert <sto...@pc...> wrote: >>>>> >>>>> See answers in-line. Also, did you look at the documentation in >>>>> the GUS schema browser? The tables (I know many don't) actually >>>>> have table and attribute descriptions. Were they too vague >>>>> (i.e. do we need to improve them? >>>>> >>>>> Chris >>>>> >>>>>> Thanks for replying. I have currently been working on putting >>>>>> my blast results in similarity and similarityspan tables. But, >>>>>> I have two questions about these tables. Maybe you could help >>>>>> me out here. >>>>>> >>>>>> 1. SImilarity and SImilarityspan have pretty much the same >>>>>> fields except than similarityspan is a child table of >>>>>> Similarity. So, why do I even need the SImilaritySpan table? >>>>> These tables have different purposes (and semantics). Think of >>>>> Similarity as global (what's the overall similarity between two >>>>> proteins) and SimilaritySpan as local (what are the individual >>>>> HSPs). >>>>> >>>>>> 2. I couldnt find any fields in the Similarity table for >>>>>> storing the actual query and subject annotation. Most probably >>>>>> this can be done by referring to some other table with the >>>>>> annotation. But I find that the only two fields refferring to >>>>>> other tables are query_table_id and subject_table_id which >>>>>> just refer to the core.TableInfo. I'm confused about these two >>>>>> fields and exactly how they can be used to refer to the query >>>>>> and subject annotation? >>>>> >>>>> The query and subject sequences are identified (as you may have >>>>> guessed) with the soft links query_table_id and >>>>> subject_table_id although these attributes can point to >>>>> anything relevant. Our semantics are that they point the >>>>> entities (e.g., nucleic acid sequence, amino acid sequence, >>>>> possibly dbref) and annotation is associated with those entities. >>>>> >>>>>> Any help or suggestions would be helpful. >>>>>> >>>>>> Thanks >>>>>> dhivya >>>>>> >>>>>> Chris Stoeckert <sto...@pc...> wrote: >>>>>> Dear Dhivya, >>>>>> Sorry for the long delay in replying. >>>>>> You guessed correctly about Similarity and SImilaritySpan. >>>>>> These were >>>>>> designed to hold BLAST results (as well as results from other >>>>>> analyses). >>>>>> >>>>>> For ortholog tables you might check the GUS schema browser >>>>>> (http:// >>>>>> www.gusdb.org/SchemaBrowser/) and scroll down to the categories: >>>>>> Paralogs and Family; Sequence Ortholog, Paralog, Family AA >>>>>> Ortholog. >>>>>> >>>>>> Looking over old notes for OrthoMCL, it looks like >>>>>> DoTS.BestSimilarityPair is the table that we store summarized >>>>>> ortholog info data for queries. >>>>>> >>>>>> Hope this helps, >>>>>> Chris >>>>>> >>>>>> On Dec 16, 2006, at 3:38 PM, Dhivya Aras wrote: >>>>>> >>>>>> > Hi everyone, >>>>>> > >>>>>> > I would like to store COG annotation and blast results in >>>>>> GUS. I >>>>>> > did find two tables named similarity and similarityspan in >>>>>> the dots >>>>>> > schema - It looks like this can hold blast results but I >>>>>> need to >>>>>> > investigate more. >>>>>> > >>>>>> > As far as COG is concerned, I couldnt find any table supporting >>>>>> > this data. I was told that orthoMcl data is stored in >>>>>> > dots.SequenceGroup and dots.SequenceSequenceGroup, but I'm not >>>>>> > sure it that would best suit my needs. So, if anyone who has >>>>>> used >>>>>> > GUS for these purposes before or just has an idea, pleas let me >>>>>> > know. I would really appreciate it. >>>>>> > >>>>>> > thanks >>>>>> > dhivya arasappan >>>>>> > __________________________________________________ >>>>>> > Do You Yahoo!? >>>>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>>>> > http://mail.yahoo.com >>>>>> > >>>>>> > >>>>>> ----------------------------------------------------------------- >>>>>> ----- >>>>>> > --- >>>>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>>>> > Join SourceForge.net's Techsay panel and you'll get the >>>>>> chance to >>>>>> > share your >>>>>> > opinions on IT & business topics through brief surveys - and >>>>>> earn cash >>>>>> > http://www.techsay.com/default.php? >>>>>> > >>>>>> page=join.php&p=sourceforge&CID=DEVDEV___________________________ >>>>>> _____ >>>>>> > _______________ >>>>>> > Gusdev-gusdev mailing list >>>>>> > Gus...@li... >>>>>> > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>>>>> >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> -------- >>>>>> Take Surveys. Earn Cash. Influence the Future of IT >>>>>> Join SourceForge.net's Techsay panel and you'll get the chance >>>>>> to share your >>>>>> opinions on IT & business topics through brief surveys - and >>>>>> earn cash >>>>>> http://www.techsay.com/default.php? >>>>>> page=join.php&p=sourceforge&CID=DEVDEV >>>>>> _______________________________________________ >>>>>> Gusdev-gusdev mailing list >>>>>> Gus...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>>>>> >>>>>> __________________________________________________ >>>>>> Do You Yahoo!? >>>>>> Tired of spam? Yahoo! Mail has the best spam protection around >>>>>> http://mail.yahoo.com >>>>>> ----------------------------------------------------------------- >>>>>> -------- >>>>>> Take Surveys. Earn Cash. Influence the Future of IT >>>>>> Join SourceForge.net's Techsay panel and you'll get the chance >>>>>> to share your >>>>>> opinions on IT & business topics through brief surveys - and >>>>>> earn cash >>>>>> http://www.techsay.com/default.php? >>>>>> page=join.php&p=sourceforge&CID=DEVDEV___________________________ >>>>>> ____________________ >>>>>> Gusdev-gusdev mailing list >>>>>> Gus...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>>>> >>>>> ------------------------------------------------------------------ >>>>> ------- >>>>> Take Surveys. Earn Cash. Influence the Future of IT >>>>> Join SourceForge.net's Techsay panel and you'll get the chance >>>>> to share your >>>>> opinions on IT & business topics through brief surveys - and >>>>> earn cash >>>>> http://www.techsay.com/default.php? >>>>> page=join.php&p=sourceforge&CID=DEVDEV____________________________ >>>>> ___________________ >>>>> Gusdev-gusdev mailing list >>>>> Gus...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>>>> >>>>> __________________________________________________ >>>>> Do You Yahoo!? >>>>> Tired of spam? Yahoo! Mail has the best spam protection around >>>>> http://mail.yahoo.com >>>> >>>> >>>> >>>> Need a quick answer? Get one in minutes from people who know. >>>> Ask your question on Yahoo! Answers. >>> >>> _______________________________________________ >>> CBIL mailing list >>> CB...@pc... >>> https://mail.pcbi.upenn.edu/mailman/listinfo/cbil >> >> Brian P. Brunk, Ph.D. >> ApiDB Senior Manager >> 1424 Blockley Hall >> Penn Center For Bioinformatics >> University of Pennsylvania >> Philadelphia PA 19104-6021 >> Tel: 215-573-3118 >> Fax: 215-573-3111 >> >> > > > > Need a quick answer? Get one in minutes from people who know. Ask > your question on Yahoo! Answers. |