One thing you could do is look at UMLS and extract from it the sort of ontology you are interested in.  This would not be trivial, but it would mostly involve just a significant amount of work in identifying what you want and then transforming it into whatever DAG representation you desire.  There is a lot of good stuff in there, and as part of the process you could (if it would be of value to you) integrate information from some of the various hierarchies in the "vocabularies" that make up the Metathesaurus.  This could yield fairly large hierarchically organized ontologies involving such things as drugs, various coding schemes (ICD-9, Read codes, etc.), drug classes (pharmacological, therapeutic), medical conditions, etc.

This information is all in large relational tables that are quite well designed but involve several levels of abstraction.  I presume you would need to extract it into an efficient DAG representation for computational purposes.  If so, I would favor using something like BSDDB or Metakit for that since you might likely find that traversing such hierarchies via SQL queries would result in performance problems.  But that depends on what you want to do.  Writing extraction utilities from the .nlm ascii files to a DAG representation in something like Python (or -- ugh! -- Perl if you must) would not be that difficult.  Or you could use the UMLS scripts to dump the tables into an Oracle or MySql database and extract from there.

Note that both GO and NCBI are in the Metathesaurus, along with a host of other things (some of which require licenses for use).  Note also that when things go *into* the UMLS Metathesaurus, certain decisions are made my the UMLS curators about what is represented and how this is done.  So things may not appear in the UMLS version of a source exactly as you might expect.

Start at, and have a good time.  I know I'm having a good time with it.

Gary H. Merrill
Principal Scientist, New Applications
Biomedical Data Sciences
GlaxoSmithKline Inc.
(919) 483-8456
Sent by:

03-May-2005 23:13
Please respond to

Obo-discuss digest, Vol 1 #85 - 1 msg

Date: Mon, 02 May 2005 11:13:27 +0200
From: Silke Trissl <>
Subject: [Obo-discuss] Ontologies in Life Science


I am looking for large ontologies in Life Science.

I know the Gene Ontology and the NCBI Taxonomy. Are there any other
ontolgoies in the area of biology that are comparable in size or
complexity? I am looking for Ontologies in the form of DAGs with more
than 10 000 nodes to test different index structures on 'real-life' data.


                Silke Trissl


Obo-discuss mailing list

End of Obo-discuss Digest