Menu

#114 protein binding phenotype extension change

next-load
closed-fixed
None
9
2015-02-09
2012-09-17
No

We've had a Curator Discussion, and we've decided we ought to change how we do extensions with the protein binding phenotype terms. There's some background in a couple of tickets on the FYPO tracker:

https://sourceforge.net/tracker/index.php?func=detail&aid=3565516&group_id=65526&atid=2096431
https://sourceforge.net/tracker/index.php?func=detail&aid=3568427&group_id=65526&atid=2096431

Anyway, what we want you to do (pretty please with a chocolate :) ) is:

If any annotation to any of the terms in The List has as assayed_using(PomBase:geneB) extension, convert the extension to to assayed_using(PomBase:geneA,PomBase:geneB) where geneA is the gene annotated to the term.

The List:
FYPO:0000702
FYPO:0000703
FYPO:0000704
FYPO:0000705
FYPO:0001275
FYPO:0001571
FYPO:0001645
FYPO:0002365
FYPO:0003206
FYPO:0003207
FYPO:0003215

Holler if you have problems or questions.

Thanks!!

m/v/a

Related

Curation tool: #345

Discussion

  • Kim Rutherford

    Kim Rutherford - 2012-09-19

    "assayed_using(PomBase:geneA,PomBase:geneB) where geneA is the gene annotated to the term"

    It seems strange to have geneA mentioned in the extension. Actually I think "assayed_using" a strange relation to use in an annotation extension.

    The change shouldn't be a problem though.

     
  • Midori Harris

    Midori Harris - 2012-09-19

    Yeah, it is weird. The reason we're doing it is so we can use the protein binding phenotype terms for situations where a mutation affects the interaction between any two proteins, as in Antonia's examples:

    mutation in geneA increases protein A binding to protein B
    vs.
    
mutation in geneA increases protein B binding to protein C

    We can only do that unambiguously by either (a) having separate ontology terms for "mucks up its own binding to something else" and "mucks up something else's binding to a third thing", or (b) adjusting our use of extensions to make all interaction participants explicit, even when one of them is the product of the gene being annotated.

    Note that this only affects protein binding for now, and may affect RNA binding if we add any phenotype terms, because those are the only interactions where everything doing the interacting is a gene product.

    Assayed_using is only used with phenotype annotations, where we needed a relation to say "this is the gene/protein that was actually tested". For example, if they looked at phosphorylation of Cdc2 and saw less of it, we could use "decreased protein phosphorylation" without an extension*, or with the extension assayed_using(Cdc2). With the extension, it means "decreased phosphorylation of Cdc2", which is way more specific than we want for an actual ontology term.

    *and it wouldn't be wrong, because the Ontologically Geeky interpretation is "decreased the phosphorylation of some unspecified protein(s)"

    m

     
  • Midori Harris

    Midori Harris - 2012-09-19

    p.s. the reason we haven't gone for separate terms is that we thought it would be more confusing for community users ...

     
  • Midori Harris

    Midori Harris - 2014-06-10

    updated ID list

     
  • Kim Rutherford

    Kim Rutherford - 2014-06-10

    After talking to Midori and Antonia, we decided to keep the current syntax:
    assayed_using(PomBase:geneA),assayed_using(PomBase:geneB)

    We will also add a post-load database check to make sure that any annotation using one of the terms above has two assayed_using() extension relations. If there are lots to fix, Kim will change the loader to add assayed_using(PomBase:geneA) automatically.

     
  • Kim Rutherford

    Kim Rutherford - 2014-06-11

    I've added a Chado check the reports all annotations that use one of those terms but doesn't have two assayed_using() extensions. I hope I've understood things correctly.

    The result of the checks go into the log file the ends in ".chado_checks" eg.
    http://curation.pombase.org/dumps/latest_build/logs/log.2014-06-10-05-26-33.chado_checks

    There are 712 annotations in the results.

    It looks like a lot come from the PHAF files like:
    pombe-embl/external_data/phaf_files/chado_load/PMID_12615979_phaf.tsv

    but there are also over 200 from Canto.

    Here is what the query returns:
    https://www.dropbox.com/s/nxv9e6t9s20hlb7/binding_annotation_assayed_using_prob.txt
    (Dropbox/pombase/Chado/queries/binding_annotation_assayed_using_prob.txt)

    Currently it only reports the term (with extensions), the feature ID and the sessions (if any). I can add more information if you'd like.

    Quite a lot have no assayed_using() extension. Is that a problem?

     
  • Midori Harris

    Midori Harris - 2014-06-11

    Thanks! I'll fix the PHAF(s).

    Quite a lot have no assayed_using() extension. Is that a problem?

    Not a huge problem -- it means the annotation isn't as specific as ones with extensions, and it's sub-optimal, but it's not the kind of thing that needs to be flagged as an error.

     
  • Kim Rutherford

    Kim Rutherford - 2014-06-11

    I'll fix the PHAF(s).

    Cheers.

    Not a huge problem -- it means the annotation isn't as specific as ones with extensions, and it's sub-optimal, but it's not the kind of thing that needs to be flagged as an error.

    Great. So in those cases we just need to add the missing assayed_using(geneA) extension?

     
  • Midori Harris

    Midori Harris - 2014-06-11

    Not a huge problem [snip]

    Great. So in those cases we just need to add the missing assayed_using(geneA) extension?

    I think I would actually just leave them unchanged.

     
  • Valerie Wood

    Valerie Wood - 2014-06-11

    I think if we ever use "protein binding" phenotypes we should always capture what is being bound abnormally, otherwise the annotation isn't very informative. So I think that reporting warnings for these to be filled in would be good.

     
  • Midori Harris

    Midori Harris - 2014-06-11

    PHAF updated

     
  • Valerie Wood

    Valerie Wood - 2014-06-26

    Although maybe this was to do with
    https://sourceforge.net/p/pombase/chado/354/
    and they are no longer being rejected?

     
  • Valerie Wood

    Valerie Wood - 2014-06-26

    This is what I am worried about:
    http://curation.pombase.org/dumps/latest_build/logs/log.2014-06-26-01-59-13.chado_checks
    canto_annotations - SUCCESS
    missing_assayed_using - FAILURE: expected 0 but got 218
    SPAC1006.09:allele-6 <-> abolished protein binding 7415c70250f4c0b2
    SPBC4.04c:allele-3 <-> abolished protein binding 073951bf43ce4011
    SPAC1006.09:allele-7 <-> abolished protein binding 7415c70250f4c0b2
    SPAC1006.09:allele-8 <-> abolished protein binding 7415c70250f4c0b2
    SPBC211.04c:allele-3 <-> abolished protein binding 41faea428ef0af7e
    SPAC664.07c:allele-3 <-> abolished protein binding 422cbe6fb6bdb6bb
    SPBC1709.14:allele-2 <-> abolished protein binding 9276d468f83256bf
    SPAC1006.09:allele-6 <-> abolished protein binding 7415c70250f4c0b2
    SPBC4.04c:allele-3 <-> abolished protein binding 073951bf43ce4011
    SPAC1006.09:allele-7 <-> abolished protein binding 7415c70250f4c0b2
    SPAC1006.09:allele-8 <-> abolished protein binding 7415c70250f4c0b2
    SPBC211.04c:allele-3 <-> abolished protein binding 41faea428ef0af7e
    SPAC664.07c:allele-3 <-> abolished protein binding 422cbe6fb6bdb6bb
    SPBC1709.14:allele-2 <-> abolished protein binding 9276d468f83256bf
    SPAC1006.09:allele-6 <-> abolished protein binding 7415c70250f4c0b2
    SPBC4.04c:allele-3 <-> abolished protein binding 073951bf43ce4011
    SPAC1006.09:allele-7 <-> abolished protein binding 7415c70250f4c0b2
    SPAC1006.09:allele-8 <-> abolished protein binding 7415c70250f4c0b2
    SPBC211.04c:allele-3 <-> abolished protein binding 41faea428ef0af7e
    SPAC664.07c:allele-3 <-
    etc

    we seem to be losing all of the protein binding extensions....
    (it could be a different issue)

     
  • Kim Rutherford

    Kim Rutherford - 2014-06-26

    we seem to be losing all of the protein binding extensions....

    I don't think anything is being lost. These:

    SPBC211.04c:allele-3 <-> abolished protein binding 41faea428ef0af7e

    are warnings about annotations that should have an assayed_using, but don't.

     
  • Valerie Wood

    Valerie Wood - 2014-06-26

    Ah OK, we need to fix those...

     
  • Valerie Wood

    Valerie Wood - 2014-06-26

    What about these further down the file?

    SPBC106.10:allele-10 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-11 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-9 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-8 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-10 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-11 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-9 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-8 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-10 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-11 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-9 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-8 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-10 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-11 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-9 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPBC106.10:allele-8 <-> abolished protein binding [assayed_using] SPBC106.10 60d1480f79c0c05b
    SPAC17C9.10:allele-4 <-> abolished protein binding [has_substrate] SPAC23H3.13c 4c58cb4b9203cd31
    SPAC6B12.11:allele-4 <-> normal protein binding [assayed_using] SPBC11B10.09 8e48c14b6338b11
    SPAC6B12.11:allele-4 <-> abolished protein binding [assayed_using] SPAC23C4.18c 8e48c14b6338b11
    SPBC11C11.02:allele-2 <-> abolished protein binding [assayed_using] SPBC83.18c 83c8a0868d64f0a1
    SPBC2G2.14:allele-3 <-> abolished protein binding [assayed_using] SPBC12D12.01 5c30b3bf5c5da95c

     
  • Valerie Wood

    Valerie Wood - 2014-06-26

    the ones with no 'assayed using' are from a small number of sessions.

    073951bf43ce4011\
    1ccb8cea48bf9230\
    41faea428ef0af7e\
    422cbe6fb6bdb6bb\
    4acb2f022c79e2b7\
    7415c70250f4c0b2\
    9276d468f83256bf\

     
  • Valerie Wood

    Valerie Wood - 2015-01-23

    We need to do some work here.
    Whenever I have done protein binding i have only reported the protein bound, (I haven't yet curted a phenotype where disrupting rptien a resukts in the abolished /altered bindignnof b to c.

    This menas that ALL of my existing anntations (and presumable others) will need to be retrofitted when this is in place. I think we should mak a new ticket for this with exactly what needs to happen

    (currently annotations of this type are not accepted
    warning in 6b153a4a0b268a6c: duplicated extension: "[assayed_using] SPAC17G8.10c"
    warning in 6b153a4a0b268a6c: duplicated extension: "[assayed_using] SPAC17G8.10c"
    warning in 6b153a4a0b268a6c: duplicated extension: "[assayed_using] SPAC17G8.10c"
    warning in 6b153a4a0b268a6c: duplicated extension: "[assayed_using] SPAC17G8.10c"
    warning in 6b153a4a0b268a6c: duplicated extension: "[assayed_using] SPAC17G8.10c"

    val

     
  • Midori Harris

    Midori Harris - 2015-01-25

    I think I did curate a "mutant a alters b-c binding" one recently but of course I don't remember what gene or paper. It was more than 5 minutes ago ;)

    ANyway, protein binding phenotype annotations are now required to have two assayed_using extensions. The check for that has been running for a few months now, and the retrofitting is done (I remember updating quit a few extensions shortly after it went live). For example, it reported no problems for the v48 release:

    http://curation.pombase.org/dumps/releases/pombase-chado-v48-2015-01-10/logs/log.2015-01-11-07-05-17.chado_checks (includes "check that protein binding annotations have two assayed_using extensions - SUCCESS")

    The warnings that appear in the canto logs now are flagging annotations where both extensions use the same gene ID. But if a mutation alters one protein's binding to itself (dimerization etc.), that's exactly how we would capture it.

    As far as I can tell, all that needs to be done is to allow the duplicated extensions.

     
  • Kim Rutherford

    Kim Rutherford - 2015-01-31

    In the latest Chado there are terms with names like:

    "abolished protein binding [assayed_using] SPCC1672.02c [assayed_using] SPCC1672.02c"

    (sap1+5 has that term)

    Does that make sense?

     
    • Midori Harris

      Midori Harris - 2015-01-31

      In principle it makes sense -- it means that the sap1-5 mutant causes 2 or more copies of the Sap1 protein not to bind to each other (they dimerize or multimerize in wild type but not the mutant). We would have to check the paper to make sure that's correct ... but that's true for every annotation anyway!

       
  • Kim Rutherford

    Kim Rutherford - 2015-02-03

    I'm closing this for now but I'll reopen if Mark reports problems.

     
  • Valerie Wood

    Valerie Wood - 2015-02-09
    • status: open --> closed-fixed
     

Log in to post a comment.