Menu

#28 GOA entry is incomplete

current
accepted
None
minor
2015-04-29
2015-04-24
No

Perhaps I don't understand what the http://identifiers.org/goa/ entry is for, but it seems incomplete.

The pattern

^([A-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9])|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])

Appears to be copied from the UniProt entry. But GOA also annotates complexes and RNAs. E.g. IntAct:EBI-9519039

Technically it also annotates isoforms (most visible in the GPAD files)

I will get someone from GOA to comment and provide the full list of patterns.

But is it even correct to put the subject of the annotation here? GOA doesn't actually mint these IDs, it just annotates to them. Maybe that's fine? Are there guidelines for this style of annotation database in MIRIAM, i.e. databases that generate de-novo associations between objects in other databases? Should the core GO database be in here?

Note that GOA does mint IDs for its associations, these aren't very public yet, but they may be in future. Perhaps these should replace the UniProt protein pattern? I suppose it depends what the use case is for having an entry for goa in identifiers.org

Discussion

  • Nick Juty

    Nick Juty - 2015-04-24

    Hi Chris,

    Thanks for the information. I just backtracked to look into the use case, and it turns out we were requested to add this collection as GOA was being cross-referenced for some entries in ENA. I think the IDs being minted by GOA, which aren't public yet, would probably be the correct pattern to use.

    Thanks for getting in touch,

    cheers

    Nick

     
  • Nick Juty

    Nick Juty - 2015-04-24
    • status: open --> accepted
    • assigned_to: Nick Juty
     
  • Tony Sawford

    Tony Sawford - 2015-04-27

    I'll be honest, I wasn't aware of this entry until today - I don't know where it came from, or who created it, but it doesn't look right.

    The objects that GOA supplies are annotations, which, as Chris rightly says, don't have publicly visible IDs. The pattern currently in the entry is for an old-style (6 character) canonical UniProt accession.

    I can provide the correct patterns for all of the entities for which we provide annotations, but this doesn't feel like the right thing to do, as these identifiers are not GOA's property.

     
  • Nick Juty

    Nick Juty - 2015-04-28

    Hi guys,

    I think as an interim measure, it would be a good idea to have all the correct patterns; there are clearly some that do not work (http://www.ebi.ac.uk/QuickGO/GProtein?ac=A0A0A7GK78). The original use case we were given was to link from ENA record input and get information on proteins annotations. I could incorporate the other patterns and expand the definition we store to make it more complete.

    On judging appropriateness of inclusion of GOA in the first place, I think it was appropriate since GOA does offer a differ view of the data, being much more focused on the individual annotations and their evidence. But I do agree that maybe this is a bit more debatable. I guess the overriding factor was the utility this provided for the original requester.

    We can then revisit/review options when GOA actually mints its own identifiers for this data.

    cheers

    Nick

     
  • Tony Sawford

    Tony Sawford - 2015-04-28

    A0A0A7GK78 is a deleted UniProtKB accession (probably deleted in the recent cull of redundant bacterial proteomes), which is why you don't see any annotations for it in QuickGO.

    These are the patterns for the three types of identifier to which we support annotation:

    UniProtKB accessions
    ([OPQ][0-9][A-Z0-9]{3}[0-9]|A-NR-Z{1,2}[0-9])((-[0-9]+)|:PRO_[0-9]{10}|:VAR_[0-9]{6}){0,1}

    RNAcentral taxon-specific identifiers
    URS[0-9A-F]{10}(_[0-9]+){0,1}

    IntAct complex portal identifiers
    EBI-[0-9]+

    Note that the current version of QuickGO only displays annotations to UniProt accessions; the forthcoming new version will display annotations to all three types of identifier, but the form of URLs will be changing; we'll be sending out emails about this, and the changes to the supported web services, in due course.

     
  • Chris Mungall

    Chris Mungall - 2015-04-28

    I think there is a danger in conflating what a database says about an ID vs what kind of IDs a database creates. After all, GOA also provides information about GO IDs, about PubMed IDs.

    Similarly, the GO Consortium also provides information about genes in a variety of species, so should our pattern be expanded to include every MOD ID possible?

    Perhaps I'm just misunderstanding how identifiers.org is intended to be used.

     
  • Nick Juty

    Nick Juty - 2015-04-29

    Hi Chris,

    Just to clarify, one of our objectives is to enable users to link (resolve to a page for example) to information they feel is important (about an entity/entities). If that information, lets use pubmed as an example, is available essentially unmodified in numerous places, we list the different resources (resolving locations) that are available, eg. http://info.identifiers.org/pubmed/18957448.

    If the GO Consortium or GOA database provided the same information for PubMed as above, I would have no issue in listing those links as additional resources on our record for PubMed. The same is true for genes in various species. The only real proviso being that those links should be resolvable. Note: If the information is identical to something already available, it would be made available as a resource for an existing collection. If it were different, it would warrant being added as a new record in our registry.

    Finally, that (new) data is made available by one provider through identifiers generated by a different provider is not an isolated incident. I believe there are at least a handful of such cases in the registry, where information is made accessible through reuse of UniProt, PDB and other identifiers. I think this really just speaks to the prominence of those databases in the domain.

    Ideally, GOA would have their own identifiers, and we would use those. In this case, to address the needs of our users, we have had to come up with an interim measure which I think is quite reasonable and practical.

    We do of course value your opinion, and would be happy to listen to any other possible solutions you may have.

    cheers

    Nick

     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB