On Mar 28, 2012, at 1:05 PM, Scott Cain wrote:
> Hi Bob,
> So to get GAZ to load, you extended the name constraint so that it is
> on a combination of name and dbxref_id? This and extending the size
> of dbxref.accession are not unreasonable changes to Chado (every time
> I think we make something like accession big enough, somebody always
> comes along to prove me wrong :-)
> Given how long it takes to convert GAZ to chado-xml, it might be nice
> to make that available somewhere for other people to use (ooh, maybe
> the obo folks could be convinced to provide chado-xml for
> download--that would be awesome! Yes Chris I'm looking at you :-)
We used to have chado-xml for all the ontologies available here:
But this was switched off as not many people used it, and the xlstproc to chadoxml for some ontologies was taking too long.
I would recommend someone write a chadoxml dump in java as part of the owltools framework. This will be much faster, and will also allow you to translate the chado-subset of any OWL ontology without having to convert beforehand. I can't commit to doing this, but I can put some stub code in place and grant svn access if anyone wants to volunteer.
> On Wed, Mar 28, 2012 at 9:21 AM, Bob MacCallum
> <r.maccallum@...> wrote:
>> Just a bit of an update on this.
>> The GAZ maintainers have fixed the syntax errors and circular
>> references, and the "ontology" is now on bioportal (minus full tree
>> I forgot to mention that two hacks are needed to load the ontology into Chado
>> 1. table cvterm constraint cvterm_c1: either remove constraint on name
>> or add an extra one on dbxref_id (we have gone for the latter, thanks
>> to Dave Emmert's suggestion). This because there are several terms
>> with the same name, e.g. London. So far we have not identified any
>> downstream problems, but other sites should test before this goes into
>> the trunk.
>> 2. table dbxref column accession: extend to varchar(1024) to handle
>> some crazy long URLs purporting to be accessions.
>> Although it takes a while, conversion to Chado XML (4.2 days) and
>> loading with stag-storenode.pl (3.5h) seems to be working for us. The
>> latter can handle updates also.
>> I guess we should try cvtermpath at some point - I imagine this will
>> take a while...
>> On Thu, Dec 1, 2011 at 11:11 AM, Bob MacCallum
>> <r.maccallum@...> wrote:
>>> As you are probably all aware, GAZ is a big ontology.
>>> As far as we are aware, this is the only source for it:
>>> It contains a lot of formatting errors, which we once corrected
>>> manually then converted to Chado XML and loaded via stag-storenode.pl
>>> - that loading process took more than 5 days.
>>> Recently I've tried fixing as many errors as possible with Perl
>>> one-liners and making a few mods to Bio/OntologyIO/obo.pm
>>> so that it handles quoted commas better and skips dodgy xrefs - so
>>> that I can load it with gmod_load_cvterms.pl
>>> (which I believe is preferable to stag-storenode if we want to
>>> regularly re-load/update the ontology).
>>> Trouble is, once all the terms are loaded (several hours), the
>>> relationships are taking many many days (still not finished), while it
>>> outputs this kind of thing:
>>> Retrieved accession: 00307773
>>> Looking at relationship in file: 00307773-00022850
>>> Looking at relationship in file: 00307773-00022845
>>> Retrieved accession: 00284751
>>> Looking at relationship in file: 00284751-00283672
>>> Retrieved accession: 00255061
>>> Looking at relationship in file: 00255061-00007146
>>> Before we go looking at optimising the code or other tricks, I
>>> wondered if anyone else had solved this problem?
>>> Then, of course, it would be nice to have cvtermpath populated but we
>>> noticed circular references when we half-heartedly tried this last.
>>> As far as I can tell, Gramene seems to be the only site with GAZ on
>>> board. I'll make contact with them but I don't think they have Chado
>>> under the hood.
>>> many thanks,
>> This SF email is sponsosred by:
>> Try Windows Azure free for 90 days Click Here
>> Gmod-phendiver mailing list
> Scott Cain, Ph. D. scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research