As per Skype call:
Numbers of annotations made total/ number of annotations per paper per year
Also, It would be useful if we could get an overview of the phenotype annotation
i) How many phenotype annotations have we made so far (and using how many papers)
We may have further questions about phenotype data later.
Ok,
we are only really interested in the papers in the curation tool, because they have been fully curated.
Is it possible to do a "by year" query on the curation tool data only as the date is stored there?
Not to worry if it is difficult...only if it is easy. This would limit the stats to those publications which have a "curation tool session" but this is fine.
If we do this, we should exclude the following papers (which were done by "bulk import")
as this is really to evaluate increase in curator work load from small-scale experiments rather than total annotation number
These were done by "bulk import " and should be excluded
7098 | PMID:16823372 ORFeome
4829 | PMID:20473289 viability (deltion collection)
915 | PMID:19547744 phosphoproteome
545 | PMID:18257517 phosphoproteome
480 | PMID:20537132 deletion profiling (fitness)
404 | PMID:21511999 Rhind paper not sure what curation types this was?
295 | PMID:16303567 meiosis
261 | PMID:12529438 stress GO IEP
even though they may have a session in the curation tool.
(the rest were done in the curation tool or artemis)
even this one 362 | PMID:18684775 (manually curated by Antonia!)
v
Here are the counts:
Dropbox/pombase/curation_tool/queries
One file has the counts of total annotations per year, the other is annotations per paper per years. The second is maybe what you were after?
These are only from the APPROVED session in the curation tool as of 1 hour ago.
Perfect! I plotted them on a second Y-axis. There seems to be a few outliers- I guess we haven't done enough papers yet- so I did a did a slightly unscientific average per 5 years...hope that's ok.
That's what I was going to suggest> I suspect n is only 1 or 2 for 1984!
1984 34.0
We should repeat this in a few months, and I think it will be very useful for the grant...
v
I've made a note of the method so it will take 5 minutes next time (instead of 30 minutes).
Are there any more statistics that we need? If not, can this be closed?
Diff:
I think we are one for now.
I will put it at lowest priority (1) and some time we can
document the queries etc)