Summary:
Although it isn't ideal, we will probably go ahead and store GO protein binding annotations in reciprocal in Canto. Even though this annotation only represents a single interaction/experiment the annotation itself is equally valid for both gene products. Its a little hacky from a storing in a database perspective, but as a biological representation it is probably OK.
-Kim was worried that the annotation numbers would be overinflated, but as these are included as 2 annotation in GO annotation counts, this is probably OK.
What needs to happen:
1. Logs need to report any GO protein binding annotation which are not curated in reciprocal from the artemis files (manual or automated, fix depending on scale)
2. Canto needs to automatically generate a reciprocal annotation when a GO protein binding annotation is made, AND to delete this reciprocal annotation when a GO protein binding annotation is deleted (I will open a Canto ticket for this). This will make annotation consistent, and clear for users, if the annotation is automatically created we won't get a mixed bag of curating in both directions or not.
1 & 2 above should ensure that:
1. GO GAF should automatically contain reciprocal protein binding annotation (required)
2. GO protein annotation will automatically display on gene pages (required), Mark does not need to do anything.
3. Mechanisms are in place to ensure that protein binding annotation is synchronized so that everything is consistent.
4. Querying should work OK against Chado with a single query, as if you specify all of the interactions for gene a, you do not also need to query for all genes x,y,z which interact with gene a, but where "a" is not the subject of the annotation.
Did I miss anything:
1. Anything else which needs to happen here?
2. Any unforeseen consequences ?
Some previous discussion and rumination in case we need to go back through this
https://sourceforge.net/p/pombase/chado/327/
Just to check: in a case like this:
FT /GO="aspect=F; term=protein binding, bridging;
FT GOid=GO:0030674; evidence=IPI; db_xref=PMID:16252005;
FT with=PomBase:SPAC29B12.03|PomBase:SPAC17H9.10c;
FT date=20051102"
should we be creating 2 reciprocal annotations because the "with=" has two values?
What should the "with" be for the reciprocals?
Also, what does it mean to have two IDs in the with field? Does "with=X|Y" mean "binds X OR Y"?
Hmm we need to check the meaning of the pipe in this field.
I only ever use more than one ID in the with field when I am using the term
"protein binding, bridging" GO:0030674
and in this case it means both at the same time. I think here (but would need to check) that it is GO conventiona to use "|", which is different from extension filed usage. I think there was talk about standardizing, but I can't remember if it was ever implemented.
For this reason if I am making "AND" annotations, I use independent rows, so its unambiguous.
We probably need to dig a little deeper here....if anyone can enlighten, good, if not can wait until Tues/Weds
I'm going to put this aside for this release.
https://github.com/pombase/pombase-chado/commit/80ba0d2a09e7400f35a41e0ebcd57aec9bfe0ee9
I haven't done any more on this as I don't know what "with=X|Y" means. It makes a difference to how the reciprocal annotations look.