Re: [GUSDEV] [CBIL] Fwd: loadNRDB script

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

                      Hi Chris,

THanks for that information. I was able to get all the input parameters and you were right about the gi_taxid_prot.dmp- It is a download from ncbi, though not part of the nrdb bundle.

I didnt know who would be the right person for this question: So, Chris, Brian or anyone in the group, if you have any ideas, let me know.
I have two gus dbs on my machine. I actually wanted to run the loadNRDB script on the backup version but I couldnt find any input parameter where I can specify which database the NRDB will be loaded in. How would I specify this? Does it have to do the location/path where I'm running the script from?

thanks
dhivya

Chris Stoeckert <sto...@pc...> wrote: Hi Dhivya,I'm putting this back onto the gusdev so that answers may help others (or others can correct my answers). I should also warn you that as PI of the project, I don't actually run any of the code so this is a test of how well I understand what's going on. ;)
        sres.externaldatabaserelease.version for this instance of NRDB
        sres.externaldatabase.name for NRDB
To load nrdb or any other external "database" (really data source), that source needs to be entered into externaldatabase and the version (can simply be date when you downloaded it) entered into databaserelease. These can be entered manually into those tables.If this is a new version, then NRDB should already be in externaldatabase. You simply need to enter a new row in externaldatabaserelease for NRDB and give whatever you put in the version field.

        pathname for the gi_taxid_prot.dmp file
I'm guessing that this is a pointer to the dump file that comes with NRDB providing the taxon_id for each protein sequence but that's just a guess.

Chris

On Jan 12, 2007, at 5:01 PM, Dhivya Aras wrote:

Hi Chris,

I'm trying to load a new NRDB version into gus using loadNRDB plugin. It requires several compulsory input parameters. I dont understand what three of them are. The ga help describes it as:

externalDatabaseVersion *string* (Required)

        sres.externaldatabaserelease.version for this instance of NRDB

gitax *file* (Required)

        pathname for the gi_taxid_prot.dmp file

and

externalDatabaseName *string* (Required)

        sres.externaldatabase.name for NRDB

Could you point me some docs or information about these arguments?
thanks
dhivya

Chris Stoeckert <sto...@pc...> wrote: Hi Brian,Dhivya found the Djob plugin but did not find any documentation on how to run. Can you point him at the appropriate place or person?Can this be added to the GUS svn somewhere?
Thanks,
Chris

 Chris Stoeckert, Ph.D.
Research Professor, Dept. of Genetics
1415 Blockley Hall, Center for Bioinformatics
423 Guardian Dr., University of Pennsylvania
Philadelphia, PA 19104
Ph: 215-573-4409 FAX: 215-573-3111
http://www.cbil.upenn.edu

On Jan 11, 2007, at 8:15 AM, Brian Brunk wrote:

blastSimilarity does not appear to be in the GUS distribution, nor is it in CBIL/Bio.  In my project_home it is in DJob.  Seems to me like  blastSimilarity should be in the GUS distribution that one can download from the gusdb.org site (or check out of the repository).  I also have a script called parseBlastFilesForSimilarity.pl that takes in BLAST file names on stdin that is very useful for parsing blast files into the format to be loaded into the db that could be included.

-Brian

On Jan 10, 2007, at 4:52 PM, Chris Stoeckert wrote:

Hi,Can anyone help with this question about the input file to InsertBlastSimilarities?
Thanks,
Chris

Begin forwarded message:

From: Dhivya Aras <dhi...@ya...>
Date: January 10, 2007 4:25:25 PM EST
To: Chris Stoeckert <sto...@pc...>
Subject: Re: [GUSDEV] loading COG and blast annotation results into GUS

 Hi Chris,

  I understand that to load the blast data into the two tables, similarity and similarityspan, I need to use the gus supported plugin, InsertBlastSimilarities. But, this plugin asks for an input file 'generated by the blastSimilarity command (distributed with GUS in the CBIL/Bio component)'. Any idea where I can find this blastSimilarity utility?

  Thanks
  dhivya

Chris Stoeckert <sto...@pc...> wrote:
  Again, answers in-line.  Chris

    On Jan 9, 2007, at 2:27 PM, Dhivya Aras wrote:

  Hi Chris,

I did look at the GUS schema browser- unfortunately most tables dont have any documentation- I could just see the attributes in each table and maybe a small description of the attribute.

But thanks to your reply, I do understand the necessity for the two tables , similarity and similarityspan now. The way I understand it- the query_table_id points to a table in which the query sequence data is and the query_id indicates the row in that table. So, for example, if the query_table_id points to externalNASequence, I'm assuming the query_id points to the primary key of that table, Na_Sequence_ID. Am I right in this assumption?

yes that's right.

  Basically, I have an exisiting gus db with data and I have some new blastp results of AA sequences against NRDB. Here's what I think needs to be done to put these new blast results into the GUS db. Please fill in  gaps as I'm vague on some areas.

1. Store each hsp in the similarityspan table. I've mapped all the blast fields to the table's fields- thats not a problem.
yes

  2. SInce the query is an AA sequence, which table should the query_table_id point to? TranslatedAASequence with the query_id pointing to AA_Sequence_id?
  yes (assuming you are doing a blastp with a sequence from TranslatedAASequence - note that AASequence could also come from other views of AASequence). 

  3. Since the subject is from the NRDB, I'm guessing that the query_table_id should point to externalAASequence with the query_id pointing to AA_Sequence_ID.

yes (assuming you loaded nrdb into ExternalAASequence).

  4. I think these are the only tables I would be affecting for adding these new blastP results. Am I right?  

  yes (mostly). Using ga you'll also get audit tables populated like algorithm invocation. 

  I know I've asked quite a few questions, but I'm really not able to find too much information on what the tables and fields mean and what they contain. So I'm hoping you can help me out.
No problem - we need to improve the docs.

  thanks
dhivya

Chris Stoeckert <sto...@pc...> wrote:    

See answers in-line. Also, did you look at the documentation in the GUS schema browser? The tables (I know many don't) actually have table and attribute descriptions. Were they too vague (i.e. do we need to improve them?  

  Chris

      Thanks for replying. I have currently been working on putting my blast results in similarity and similarityspan tables. But, I have two questions about these tables. Maybe you could help me out here.

1. SImilarity and SImilarityspan have pretty much the same fields except than similarityspan is a child table of Similarity. So, why do I even need the SImilaritySpan table?
  These tables have different purposes (and semantics). Think of Similarity as global (what's the overall similarity between two proteins) and SimilaritySpan as local (what are the individual HSPs).

  2. I couldnt find any fields in the Similarity table for storing the actual query and subject annotation. Most probably this can be done by referring to some other table with the annotation. But I find that the only two fields refferring to other tables are query_table_id and subject_table_id which just refer to the core.TableInfo. I'm confused about these two fields and exactly how they can be used to refer to the query and subject annotation?

  The query and subject sequences are identified (as you may have guessed) with the soft links query_table_id and subject_table_id although these attributes can point to anything relevant. Our semantics are that they point the entities (e.g., nucleic acid sequence, amino acid sequence, possibly dbref) and annotation is associated with those entities.    

  Any help or suggestions would be helpful. 

Thanks
dhivya

Chris Stoeckert <sto...@pc...> wrote:  Dear Dhivya,
Sorry for the long delay in replying.
You guessed correctly about Similarity and SImilaritySpan. These were 
designed to hold BLAST results (as well as results from other analyses).

For ortholog tables you might check the GUS schema browser (http:// 
www.gusdb.org/SchemaBrowser/) and scroll down to the categories: 
Paralogs and Family; Sequence Ortholog, Paralog, Family AA Ortholog.

Looking over old notes for OrthoMCL, it looks like 
DoTS.BestSimilarityPair is the table that we store summarized 
ortholog info data for queries.

Hope this helps,
Chris

On Dec 16, 2006, at 3:38 PM, Dhivya Aras wrote:

> Hi everyone,
>
> I would like to store COG annotation and blast results in GUS. I 
> did find two tables named similarity and similarityspan in the dots 
> schema - It looks like this can hold blast results but I need to 
> investigate more.
>
> As far as COG is concerned, I couldnt find any table supporting 
> this data. I was told that orthoMcl data is stored in 
> dots.SequenceGroup and dots.SequenceSequenceGroup, but I'm not 
> sure it that would best suit my needs. So, if anyone who has used 
> GUS for these purposes before or just has an idea, pleas let me 
> know. I would really appreciate it.
>
> thanks
> dhivya arasappan
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------- 
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to 
> share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php? 
> page=join.php&p=sourceforge&CID=DEVDEV________________________________ 
> _______________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Gusdev-gusdev mailing list
Gus...@li...
https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev

  __________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com
  -------------------------------------------------------------------------
  Take Surveys. Earn Cash. Influence the Future of IT
  Join SourceForge.net's Techsay panel and you'll get the chance to share your
  opinions on IT & business topics through brief surveys - and earn cash
  http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV_______________________________________________
  Gusdev-gusdev mailing list
  Gus...@li...
  https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV_______________________________________________
Gusdev-gusdev mailing list
Gus...@li...
https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev

  __________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

---------------------------------
Need a quick answer? Get one in minutes from people who know. Ask your question on Yahoo! Answers.

_______________________________________________
CBIL mailing list
CB...@pc...
https://mail.pcbi.upenn.edu/mailman/listinfo/cbil

 Brian P. Brunk, Ph.D.
ApiDB Senior Manager
1424 Blockley Hall
Penn Center For Bioinformatics
University of Pennsylvania
Philadelphia PA 19104-6021
Tel: 215-573-3118
Fax: 215-573-3111

---------------------------------
Need a quick answer? Get one in minutes from people who know. Ask your question on Yahoo! Answers.

---------------------------------
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.