Marcus - 2011-03-02

Thank you for your questions. Below I have given you a short summary answer followed by a lengthier explanation, which contains some background that you might find relevant.

Short answer:

ECO has undergone major revisions, many of which will take some time to grow accustomed to. We are trying to remedy numerous errors and inconsistincies within ECO, facilitate easier development, and encourage wider use. With that said, I expect that many GO users will not want to abandon the three-letter codes, and if that is the case, then they should continue to use three letter codes. It is neither the purview nor goal of ECO developers to mandate how individual databases use ECO, and a group that uses ECO should do so as it deems appropriate.

ECO will maintain the three-letter GO codes as synonyms (explained further below). However, for logistical reasons, ECO will not be involved in the development or renaming of acronyms for GO (also explained below). If an acronym or other abbreviation is widely used as a synonym, or if it is suggested by a user who requests a term, then it will be incorporated.

For GO to effectively use ECO, some adjustments might be necessary. Just as some ECO terms contain GO three-letter codes as synonyms (and should contain dbxrefs, as described more below), it might be helpful if GO would include ECO IDs of the terms it uses on the GO Evidence site (

Longer answer:

You raise some good questions here. First let me point out that the implications of renaming the ECO terms were not overlooked previously. As stated in the prior proposal sent to various email lists, we removed "inferred" because of its essentially redundant nature; we also renamed many terms to reflect that they were evidence, rather than inference (this needs further refinement); and we renamed many terms as a result of their being moved to more appropriate places in the ontology. As a result of these and other modifications, the three-letter codes became somewhat outdated as acronyms.

Regarding the notion that three-letter codes are "far more immediately informative to viewers of individual annotation than ECO ids," I, of course, agree that three letters are more useful in some respects than a seven digit autogenerated number. I also believe that this is particularly true for the well-experienced GO annotator, who is accustomed to a particular set of acronyms. However, I do not think that this is reason enough to base ontology development around three letter codes. Follwing are several relevant issues that I think warrant consideration.

1. ECO is a growing ontology, and using three-letter codes to describe hundreds of evidence types would become very confusing to use and to develop. For example, consider the new evidence codes IBA, IBD, IMR, and IRD, requested by PAINT users. For ECO:0000214, IBD is an acronym for "inferred from biological aspect of descendant." The original suggested acronym was IDS for "inferred from descendant sequences." However, there was already a term with the acronym IDS, ECO:0000067 developmental similarity, used by PhenoScape. Designing term names to reflect three letter codes is problemmatic by its very nature; it is much easier to develop and maintain ECO if the terms are named according to the concepts they represent, rather than shorthand acronyms that are used by a particular research domain. This is one reason why I try to give terms succinct names--so that acronyms are not needed.

2. ECO is undergoing a modification that will allow most terms to be expressed as cross products of evidence and assertion method. How would one differentiate between sequence orthology evidence used in manual assertion and sequence orthology evidence used in automatic assertion with a three letter code such as ISO? Should they be called SOM and SOA? Are three letters important for a particular reason? Might they be called ISOM and ISOA?

3. Similar to ISO, most of the three-letter codes contain "I", which stands for "inferred," which was removed (explaind at top); thus, the "I" is now legacy.

4. Even before we started the major ECO revision (which continues), there was already some disconnect between ECO and GO three-letter codes. Consider, for example, ISS, which is a GO code, but which is actually a conflation of two ECO terms, "sequence similarity" and "structural similarity," the latter of which is a descendant of "phenotypic similarity."

Despite these and other issues, the three-letter GO codes are very popular with GO users, as you point out. As we continue to revise ECO with the goal of making it broadly usable, we are mindful of the needs of GO. All ECO terms in use by GO still refer to the three-letter GO codes in the form of exact synonyms (for example, ECO:0000002 "direct assay result" has both "inferred from direct assay [GO:IDA]" and "IDA [GO:IDA]"). New ECO terms requested by GO users continue to include three-letter codes as synonyms. For example, see the recently created terms ECO:0000214-16 and ECO:0000308, child terms of "phylogenetic evidence," all of which were all given three-letter GO codes.

Perhaps the technically correct way of dealing with the three-letter GO codes would be to reference them only as dbxrefs to the GO list of codes. This is how we are proceeding as we are beginning to map ECO to other ontologies, such as OBI, using dbxrefs.

Although it would be appropriate to add dbxrefs to the GO-related codes, I think the GO codes represent a special case because of their history and widespread use by GO. The three-letter GO codes are maintained as exact synonyms out of recognition that ECO was originally created to serve the annotation needs of GO, i.e. the three letter codes were acronyms (or general approximations).