report a warning for residue = number
for modifications these need to be fixed i.e. residue=1 -> residue=M1
For protein binding, we decided that we would use the normal/abnormal phenotype binding terms to delineate the region of binding (later might be able to do something graphically with these, in sequence context).
For now we can get rid of the legacy stuff.
So actually want we want to do is report ANY use of residue=
which is not of the type
residue=M1 with the modification ontology.
Antonia, Midori can you confirm if we have any other usage?
OK,
residue is only used with
modification
and
sequence (currently protein sequence feature via PBO)
syntax should be
single residue S17
multiple residues K133|K134
residue range K133-T134 (only used with sequence)
multiple ranges R179-N180|R231-D232 (only used with sequence)
exception RNA polymeraseII CTD repeats CTD,S2|CTD,S5
residue is disallowed for
molecular function
cellular_component
protein family or domain
(I don't think there are too many violations)
Last edit: Valerie Wood 2015-01-28
background on CTD notation
https://sourceforge.net/p/pombase/curation-tasks/82/
mis19 is a known violator.
Just to check, does "K133|K134" mean "the K at 133 or the K at 134 is modified"?
Is this a good summary?:
modification can have residue= like: "K133" or "K133|K134|..." or "CTD,S2" or "CTD,S2|CTD,S5"
sequence can have: "R179" or "R179-N180" or "R123|R231" or "R179-N180|R231-D232"
and there are no other CVs where it's allowed and no other residue= options?
yes, but I think I got that wrong. I think we have usually used comma to separate residues?
I'm not sure pipe or comma (I probably wrote "|" here because this is our standard "or" separator for other data. I don't think we have done that though....
In [curation-tasks:#82] I suggested a different syntax for the CTD residues, precisely because we use comma to mean "and" in so many other places:
There are some pipes in the Artemis files, e.g. "residue=L55|L32" (rpl42) and odd ones with "region", e.g. myh1 has "residue=444|region". We can clean them up to whatever we decide to use, though, and I like the idea of pipe consistently meaning "or" and comma consistently meaning "and".
Related
Curation tasks:
#82Kim do you wan to report the violators in the logs (maybe automagically fix the artemis ones). If there are not so many canto violators we can fix up, can decide when we see.
Yep, no problem. Although I'm still not clear what's valid. Do allow pipe and comma? Or just one option?
I think that the plan is that residues should be 'pipe' separated for consistency.
OK.
Does K133|K135 mean K133 OR K135 is modified? Do you ever need to record that K133 AND K135 is modified? (Sorry if I missed that in the discussion)
good point, we might want to do that if both modifications are required for some process I guess. Wait for M/A to comment I think...
no change from my previous opinion ... yes, we should allow both pipe and comma, and use them for "or" and "and" respectively
So Kim, the problem now is that so far we have used commas usually to mean OR
Has anyone used a comma with residues to mean "AND"?
If not the remapping should be simple (ish)
If they have, I don't know how to proceed. Possible
i) assume all use of comma in artemis means OR (All of my legacy stuff) and remap
ii) Everyone check and fix any their own previous use of comma's in canto sessions.
Would that work?
There are only 13 uses of "|" with residue= in the Artemis flat files.
I will check and fix these so that
| = OR, and if not I will change to comma
I'm a bit confused wrt this but:
-in an experiment where they have shown (in the same experiment) that geneX is phosphorylated on residue a and b simultaneously during response Y
-> residue=a, residue=b, added_during(Y)
-in an experiment where they have shown that geneX is phosphorylated on residue a, and in a separate experiment they show that geneX is phosphorylated on residue b. They don't say/show whether a and b are phosphorylated at the same time, or at different times.
-> residue=a | residue=b
is this the right way to do it?
yep, that looks correct