Menu

#426 Make a interactions file for BioGRID after each release

next-load
open
None
9
2015-03-26
2015-01-08
No

This should be automated somehow. All new interaction annotations will come from Canto. That may help as those annotations always have dates attached.

Related

Chado: #452

Discussion

  • Kim Rutherford

    Kim Rutherford - 2015-02-10

    Add a query after each load that reports duplicates for symmetrical interaction types including binding annotations + IPI + with.

     
  • Kim Rutherford

    Kim Rutherford - 2015-02-16

    As a first step I'm adding a check for duplicate interactions. I must be misunderstanding things as there seem to be a lot of duplicates:
    https://www.dropbox.com/s/qz3n89tsiasoz9c/chado-load-warnings-2015-02-17.txt?dl=0

    In that file -
    "already exists: ..." is a straightforward duplicate.

    "already exists for symmetrical relation: ..." means the relation is symmetrical and there's a duplicate with the bait/prey (subject/object) swapped around.

    I'll check for the binding/with duplicates separately.

    (I thought we had another ticket for this but I can't find it)

     
    • Valerie Wood

      Valerie Wood - 2015-02-17

      Something does seem a little odd here.

      PMID:22681890 is Colm Ryan's paper, and we didn't curate that one as fas as I know. All if the interactions should be BIOGRID derived. Which means either we are duplicating these annotations somehow, or they are providing them in duplicate.

      We can discuss later....

       
      • Valerie Wood

        Valerie Wood - 2015-02-17

        There are 4563 annotation in this file, but omitting PMID:22681890 reduces to 499, so if we find out what is causing this one we'll be 90% of the way there.....

         
      • Kim Rutherford

        Kim Rutherford - 2015-02-17

        PMID:22681890 is Colm Ryan's paper, and we didn't curate that one as fas as I know

        Yep, all those come from BioGRID. It looks like they are providing duplicates. As a test I searched for SPBC317.01 in the BioGRID data file and got:
        https://www.dropbox.com/s/516xz0ul2vi29kc/mbx2_interactions.txt?dl=0

        In that example each Positive Genetic interaction is listed in both directions. The Negative Genetic interactions are in just one direction. Does that make sense?

        Here it is at the source:
        http://thebiogrid.org/276879/summary/schizosaccharomyces-pombe/mbx2.html

         
  • Kim Rutherford

    Kim Rutherford - 2015-02-17

    Also an interactions file gets created after each load containing only interactions added since the previous release:
    http://curation.pombase.org/dumps/latest_build/exports/pombase-interactions-since-v49-2015-02-02.gz

    Which will help for next time.

     
    • Valerie Wood

      Valerie Wood - 2015-02-17

      BioGRID often fix things, will corrections made subsequently get picked up doing it this way?

       
      • Kim Rutherford

        Kim Rutherford - 2015-02-17

        BioGRID often fix things, will corrections made subsequently get picked up doing it this way?

        The load script downloads the latest BioGRID release whenever there is a new version.

         
  • Valerie Wood

    Valerie Wood - 2015-02-17

    E-mailed BioGRID for clarification....

     
  • Kim Rutherford

    Kim Rutherford - 2015-02-23

    I've added a new database check that looks at the interactions.

    http://curation.pombase.org/dumps/builds/pombase-build-2015-02-21-v2-l1/logs/log.2015-02-22-20-58-07.chado_checks

    Lines like "already exists: ..." are cases where an annotation in duplicated.

    Lines like "missing annotation for: ..." are for missing reciprocal annotations.

    The missing reciprocals can be ignored for now as I'll be fixed the load code soon to automatically add the reciprocals.

     
  • Valerie Wood

    Valerie Wood - 2015-02-23

    So in both cases I presume we don't need to do anything?, except to filter our "already exists" when we submit to BioGRID.

    Although I am now confused because you say"The missing reciprocals can be ignored for now as I'll be fixed the load code soon to automatically add the reciprocals"

    I had assumed that we were only adding the reciprocals for GO protein binding as Mark already does the inferences for BioGRID. I am happy for it to be done this way for both though if it is easier/more consistent.

    Am I correct that curators don't need to do anything here?

     
  • Kim Rutherford

    Kim Rutherford - 2015-02-23

    I think we need to look at the "already exists" ones because in those cases there are two identical annotations. Usually one is from Canto and one is from BioGRID.

    An example is for byr4:
    http://www.pombase.org/spombe/result/SPAC222.10c#interactionPhysical

    We have this annotation twice:
    forms complex with spg1 GTPase Spg1 Reconstituted Complex Furge KA et al. (1998)

    I could change the loader just to ignore the Canto ones and keep the BioGRID ones in Chado. I think that's better than dropping the BioGRID annotations because if you make a new, duplicate annotation in Canto we don't want it in Chado because we don't want to send it to BioGRID when we send them an update. Does that make sense?

    I had assumed that we were only adding the reciprocals for GO protein binding
    as Mark already does the inferences for BioGRID. I am happy for it to be done
    this way for both though if it is easier/more consistent.

    Sorry, I should have made a comment about that. Mark and I had a chat and decided to put the reciprocal for the symmetrical interactions in Chado when loading. That will make it consistent with the GO protein binding case.

     
  • Valerie Wood

    Valerie Wood - 2015-02-23

    I'm a little confused still (but less so).

    There are 425 already exists annotations.

    I can see that we could easily make the opposite annotation for a symmetrical evidence code. However a lot of these are for assymetrical codes.This implies one of us has curated in the wrong direction?

    but I just checked and this session the annotation only seem to appear in the correct direction:
    http://curation.pombase.org/pombe/curs/4650423a1b7a3d16

     
  • Valerie Wood

    Valerie Wood - 2015-02-23

    Ah I see this is just a straight duplicate. Otherwise it would be
    "already exists for symmetrical relation

    Ok this just means we curated it and BioGRID did too. Thats a shame, but it won't happen so often with frequent updates and when biogrid have access to the list of papers curated.

    We need to delete these, but I don't really want to lose the community attribution. I wonder of we could somehow merge? i.e class as a biogrid annotation but if it was created in duplicate by a member of the fission yeast community (essentially annotation confirmed) keep this curator attribution within Pombase so that their name will still be attached in their sessions? (no need to export the attribution)

    ?

     
  • Kim Rutherford

    Kim Rutherford - 2015-02-23

    Ah I see this is just a straight duplicate. Otherwise it would be
    "already exists for symmetrical relation

    Yep! Sorry I wasn't clear about that.

    I wonder of we could somehow merge?

    We can do that by keeping the annotation source as "BioGRID" (as opposed to "PomBase"), but add the Canto details (date, author and session ID). I've made a ticket about that:
    https://sourceforge.net/p/pombase/chado/452/

     
  • Kim Rutherford

    Kim Rutherford - 2015-02-24

    The reciprocal annotations are now created automatically. I'm running a full load to test.

     
  • Kim Rutherford

    Kim Rutherford - 2015-03-22

    Have we heard back from BioGRID about how they handle symmetrical relations?

     
  • Valerie Wood

    Valerie Wood - 2015-03-23

    do you mean asymmetric ones (i.e Colm's paper). I am still waiting for a rely on that

    v

     
  • Kim Rutherford

    Kim Rutherford - 2015-03-23

    do you mean asymmetric ones (i.e Colm's paper). I am still waiting for a rely on that

    Probably then we should send them what we think is right and they can let us know if it doesn't work for them.

     
  • Kim Rutherford

    Kim Rutherford - 2015-03-23

    This is what Val got from Jennifer Rust, which is probably enough to be getting on with:

    Hi Val,
    Yes, we are using a "spoke" model to enter information so we only report
    interactions in a single direction and more specific guidelines on how
    we define the bait and the hit in an interaction can be found here:
    Direction of interactions (Bait/Hit)
    http://wiki.thebiogrid.org/doku.php/curation_guide:direction_of_interactions.

    The annotations are not automatically reversed in our system. For
    instance if bait x:hit y are shown to interact by Affinity Capture
    Western and our curators capture that our system will not automatically
    add a bait y:hit x interaction by Affinity Capture Western. There will
    only be a single interaction for x and y that the user can see by
    querying either x or y.

    It sounds like there may be a unique issue going on for the specific
    paper you mentioned PMID:22681890. Usually on datasets that large we
    talk with the researcher directly and automatically upload the data they
    provide so hopefully the curator was in touch with Colm Ryan and can
    give me some more info on why the interactions were added this way. I
    will look into it and get back to you ASAP.

     
  • Kim Rutherford

    Kim Rutherford - 2015-03-26

    We may need to change the file format. From Jennifer Rust:

    I am attaching a copy of the file we ask users to fill out when they
    submit interactions directly to us. This file is formatted for easy
    upload into our system and if your export script produced a file with
    this format it might streamline the process of upload so that we
    could eventually automate it. It is not much different from the files
    you have sent previously but there are some columns in this file that
    are not in the data files you sent. For example, there is a phenotype
    column that must be populated for genetic interactions (we currently
    use the YPO) although we are working to expand the ontologies we can
    use.

    The file she sent is now in Dropbox:

    Dropbox/pombase/Chado/interactions/BioGRID-data-submission-spreadsheet.xls
    https://www.dropbox.com/s/vs5fapbjndrnfov/BioGRID-data-submission-spreadsheet.xls?dl=0

     
  • Valerie Wood

    Valerie Wood - 2015-03-26

    We should go ahead with the next exchange with the format you are working on (unless it is very quick to implement).

    The phenotype column will be necessarily blank for the foreseeable future. They will need to fill this in....

    I envisage that once we have multigene phenotypes up and running, we can somehow add a step to collect the BIOGRID evidence (if it cannot be inferred, from the combination of allele type and phenotype term) and dump the GI input section to reduce duplication.

     

Log in to post a comment.