Lookup service in Protein2GO and deleted accessions
Protein2GO change request / bug tracker
Brought to you by:
tonys-ebi
Hi,
The Lookup service in Protein2GO is very helpful and I use it a lot, but today I came across a returned accession that looks to have been deleted on 2011-12-14. The text string I searched on was 'rict-1' and the resulting accession was 'Q9XV48'.
In this case, there are currently three TrEMBL IDs for rict-1; I will attach annotations to the longest isoform accession, G5EFK2, for now.
I know there's been a lot of recent discussion about adding more information to search results in Protein2GO (e.g., this protein is a member of the Reference Proteome), but I was wondering what the SOP is for when/how accessions are updated for the Lookup service.
Many thanks,
--Kimberly
Hi Kimberly,
At the moment, the task of keeping the dictionary that underpins the
lookup service up-to-date is a manual one.
We could put in place a periodic sanity check that went through the
dictionary and weeded out entries that referred to deleted or secondary
accessions, and that would address part of the problem. However, given
that the whole purpose of this function is to allow MODs to search for
entities using their nomenclature rather than UniProt's /
RNAcentral's, we're entirely reliant on those MODs to provide us with
the information that we can use to populate the dictionary.
What I would like is for those MODs that use this function to get in
the habit of sending us a gp_information file on a regular basis, that
provides us with all of the information that we need (MOD id, name,
symbol, synonyms, and, most importantly, the corresponding UniProt
accession / RNAcentral id).
When I first implemented this function, I hacked together some
gp_information files based on the then-current gene_association and
gp2protein files on the GOC site, and, apart from some recent updates to
add in some SGD RNA identifiers, nothing has been changed since that
initial data was loaded.
So yes, this is something that needs to be looked at.
Cheers,
Tony
On 22/07/15 20:17, KM Van Auken wrote:
Related
Tickets: #11
Thanks, Tony.
I typically use the Lookup service when there are multiple accessions for a C. elegans protein and I'd like to identify the one that is included in the Reference Proteome set. Otherwise, the main Protein2GO search box usually works fine for me.
Since the information about Reference Proteome membership is going to be added to the P2GO search results, it may be the case that once that is in place, we don't need to maintain the Lookup service for WB any more.
I can't speak to how SGD and FB, the other Database options, use the Lookup service and what effect the new P2GO search results would have on their usage, but it's possible that in the future we may not need this for WB.
--Kimberly