One thing you could do is look at UMLS and
extract from it the sort of ontology you are interested in. This
would not be trivial, but it would mostly involve just a significant amount
of work in identifying what you want and then transforming it into whatever
DAG representation you desire. There is a lot of good stuff in there,
and as part of the process you could (if it would be of value to you) integrate
information from some of the various hierarchies in the "vocabularies"
that make up the Metathesaurus. This could yield fairly large hierarchically
organized ontologies involving such things as drugs, various coding schemes
(ICD-9, Read codes, etc.), drug classes (pharmacological, therapeutic),
medical conditions, etc.
This information is all in large relational
tables that are quite well designed but involve several levels of abstraction.
I presume you would need to extract it into an efficient DAG representation
for computational purposes. If so, I would favor using something
like BSDDB or Metakit for that since you might likely find that traversing
such hierarchies via SQL queries would result in performance problems.
But that depends on what you want to do. Writing extraction
utilities from the .nlm ascii files to a DAG representation in something
like Python (or -- ugh! -- Perl if you must) would not be that difficult.
Or you could use the UMLS scripts to dump the tables into an Oracle
or MySql database and extract from there.
Note that both GO and NCBI are in the Metathesaurus,
along with a host of other things (some of which require licenses for use).
Note also that when things go *into* the UMLS Metathesaurus, certain
decisions are made my the UMLS curators about what is represented and how
this is done. So things may not appear in the UMLS version of a source
exactly as you might expect.
Start at http://www.nlm.nih.gov/research/umls/documentation.html,
and have a good time. I know I'm having a good time with it.
Gary H. Merrill
Principal Scientist, New Applications
Biomedical Data Sciences
email@example.com Sent by: firstname.lastname@example.org
Please respond to email@example.com
Obo-discuss digest, Vol 1
#85 - 1 msg
Date: Mon, 02 May 2005 11:13:27 +0200
From: Silke Trissl <firstname.lastname@example.org>
Subject: [Obo-discuss] Ontologies in Life Science
I am looking for large ontologies in Life Science.
I know the Gene Ontology and the NCBI Taxonomy. Are there any other
ontolgoies in the area of biology that are comparable in size or
complexity? I am looking for Ontologies in the form of DAGs with more
than 10 000 nodes to test different index structures on 'real-life' data.
Obo-discuss mailing list