From: Dhivya A. <dhi...@ya...> - 2007-01-15 19:52:38
|
Hi Chris, THanks for that information. I was able to get all the input parameters and you were right about the gi_taxid_prot.dmp- It is a download from ncbi, though not part of the nrdb bundle. I didnt know who would be the right person for this question: So, Chris, Brian or anyone in the group, if you have any ideas, let me know. I have two gus dbs on my machine. I actually wanted to run the loadNRDB script on the backup version but I couldnt find any input parameter where I can specify which database the NRDB will be loaded in. How would I specify this? Does it have to do the location/path where I'm running the script from? thanks dhivya Chris Stoeckert <sto...@pc...> wrote: Hi Dhivya,I'm putting this back onto the gusdev so that answers may help others (or others can correct my answers). I should also warn you that as PI of the project, I don't actually run any of the code so this is a test of how well I understand what's going on. ;) sres.externaldatabaserelease.version for this instance of NRDB sres.externaldatabase.name for NRDB To load nrdb or any other external "database" (really data source), that source needs to be entered into externaldatabase and the version (can simply be date when you downloaded it) entered into databaserelease. These can be entered manually into those tables.If this is a new version, then NRDB should already be in externaldatabase. You simply need to enter a new row in externaldatabaserelease for NRDB and give whatever you put in the version field. pathname for the gi_taxid_prot.dmp file I'm guessing that this is a pointer to the dump file that comes with NRDB providing the taxon_id for each protein sequence but that's just a guess. Chris On Jan 12, 2007, at 5:01 PM, Dhivya Aras wrote: Hi Chris, I'm trying to load a new NRDB version into gus using loadNRDB plugin. It requires several compulsory input parameters. I dont understand what three of them are. The ga help describes it as: externalDatabaseVersion *string* (Required) sres.externaldatabaserelease.version for this instance of NRDB gitax *file* (Required) pathname for the gi_taxid_prot.dmp file and externalDatabaseName *string* (Required) sres.externaldatabase.name for NRDB Could you point me some docs or information about these arguments? thanks dhivya Chris Stoeckert <sto...@pc...> wrote: Hi Brian,Dhivya found the Djob plugin but did not find any documentation on how to run. Can you point him at the appropriate place or person?Can this be added to the GUS svn somewhere? Thanks, Chris Chris Stoeckert, Ph.D. Research Professor, Dept. of Genetics 1415 Blockley Hall, Center for Bioinformatics 423 Guardian Dr., University of Pennsylvania Philadelphia, PA 19104 Ph: 215-573-4409 FAX: 215-573-3111 http://www.cbil.upenn.edu On Jan 11, 2007, at 8:15 AM, Brian Brunk wrote: blastSimilarity does not appear to be in the GUS distribution, nor is it in CBIL/Bio. In my project_home it is in DJob. Seems to me like blastSimilarity should be in the GUS distribution that one can download from the gusdb.org site (or check out of the repository). I also have a script called parseBlastFilesForSimilarity.pl that takes in BLAST file names on stdin that is very useful for parsing blast files into the format to be loaded into the db that could be included. -Brian On Jan 10, 2007, at 4:52 PM, Chris Stoeckert wrote: Hi,Can anyone help with this question about the input file to InsertBlastSimilarities? Thanks, Chris Begin forwarded message: From: Dhivya Aras <dhi...@ya...> Date: January 10, 2007 4:25:25 PM EST To: Chris Stoeckert <sto...@pc...> Subject: Re: [GUSDEV] loading COG and blast annotation results into GUS Hi Chris, I understand that to load the blast data into the two tables, similarity and similarityspan, I need to use the gus supported plugin, InsertBlastSimilarities. But, this plugin asks for an input file 'generated by the blastSimilarity command (distributed with GUS in the CBIL/Bio component)'. Any idea where I can find this blastSimilarity utility? Thanks dhivya Chris Stoeckert <sto...@pc...> wrote: Again, answers in-line. Chris On Jan 9, 2007, at 2:27 PM, Dhivya Aras wrote: Hi Chris, I did look at the GUS schema browser- unfortunately most tables dont have any documentation- I could just see the attributes in each table and maybe a small description of the attribute. But thanks to your reply, I do understand the necessity for the two tables , similarity and similarityspan now. The way I understand it- the query_table_id points to a table in which the query sequence data is and the query_id indicates the row in that table. So, for example, if the query_table_id points to externalNASequence, I'm assuming the query_id points to the primary key of that table, Na_Sequence_ID. Am I right in this assumption? yes that's right. Basically, I have an exisiting gus db with data and I have some new blastp results of AA sequences against NRDB. Here's what I think needs to be done to put these new blast results into the GUS db. Please fill in gaps as I'm vague on some areas. 1. Store each hsp in the similarityspan table. I've mapped all the blast fields to the table's fields- thats not a problem. yes 2. SInce the query is an AA sequence, which table should the query_table_id point to? TranslatedAASequence with the query_id pointing to AA_Sequence_id? yes (assuming you are doing a blastp with a sequence from TranslatedAASequence - note that AASequence could also come from other views of AASequence). 3. Since the subject is from the NRDB, I'm guessing that the query_table_id should point to externalAASequence with the query_id pointing to AA_Sequence_ID. yes (assuming you loaded nrdb into ExternalAASequence). 4. I think these are the only tables I would be affecting for adding these new blastP results. Am I right? yes (mostly). Using ga you'll also get audit tables populated like algorithm invocation. I know I've asked quite a few questions, but I'm really not able to find too much information on what the tables and fields mean and what they contain. So I'm hoping you can help me out. No problem - we need to improve the docs. thanks dhivya Chris Stoeckert <sto...@pc...> wrote: See answers in-line. Also, did you look at the documentation in the GUS schema browser? The tables (I know many don't) actually have table and attribute descriptions. Were they too vague (i.e. do we need to improve them? Chris Thanks for replying. I have currently been working on putting my blast results in similarity and similarityspan tables. But, I have two questions about these tables. Maybe you could help me out here. 1. SImilarity and SImilarityspan have pretty much the same fields except than similarityspan is a child table of Similarity. So, why do I even need the SImilaritySpan table? These tables have different purposes (and semantics). Think of Similarity as global (what's the overall similarity between two proteins) and SimilaritySpan as local (what are the individual HSPs). 2. I couldnt find any fields in the Similarity table for storing the actual query and subject annotation. Most probably this can be done by referring to some other table with the annotation. But I find that the only two fields refferring to other tables are query_table_id and subject_table_id which just refer to the core.TableInfo. I'm confused about these two fields and exactly how they can be used to refer to the query and subject annotation? The query and subject sequences are identified (as you may have guessed) with the soft links query_table_id and subject_table_id although these attributes can point to anything relevant. Our semantics are that they point the entities (e.g., nucleic acid sequence, amino acid sequence, possibly dbref) and annotation is associated with those entities. Any help or suggestions would be helpful. Thanks dhivya Chris Stoeckert <sto...@pc...> wrote: Dear Dhivya, Sorry for the long delay in replying. You guessed correctly about Similarity and SImilaritySpan. These were designed to hold BLAST results (as well as results from other analyses). For ortholog tables you might check the GUS schema browser (http:// www.gusdb.org/SchemaBrowser/) and scroll down to the categories: Paralogs and Family; Sequence Ortholog, Paralog, Family AA Ortholog. Looking over old notes for OrthoMCL, it looks like DoTS.BestSimilarityPair is the table that we store summarized ortholog info data for queries. Hope this helps, Chris On Dec 16, 2006, at 3:38 PM, Dhivya Aras wrote: > Hi everyone, > > I would like to store COG annotation and blast results in GUS. I > did find two tables named similarity and similarityspan in the dots > schema - It looks like this can hold blast results but I need to > investigate more. > > As far as COG is concerned, I couldnt find any table supporting > this data. I was told that orthoMcl data is stored in > dots.SequenceGroup and dots.SequenceSequenceGroup, but I'm not > sure it that would best suit my needs. So, if anyone who has used > GUS for these purposes before or just has an idea, pleas let me > know. I would really appreciate it. > > thanks > dhivya arasappan > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV________________________________ > _______________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Gusdev-gusdev mailing list Gus...@li... https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV_______________________________________________ Gusdev-gusdev mailing list Gus...@li... https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV_______________________________________________ Gusdev-gusdev mailing list Gus...@li... https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------- Need a quick answer? Get one in minutes from people who know. Ask your question on Yahoo! Answers. _______________________________________________ CBIL mailing list CB...@pc... https://mail.pcbi.upenn.edu/mailman/listinfo/cbil Brian P. Brunk, Ph.D. ApiDB Senior Manager 1424 Blockley Hall Penn Center For Bioinformatics University of Pennsylvania Philadelphia PA 19104-6021 Tel: 215-573-3118 Fax: 215-573-3111 --------------------------------- Need a quick answer? Get one in minutes from people who know. Ask your question on Yahoo! Answers. --------------------------------- Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. |