From: Christian W. <wi...@ka...> - 2004-12-10 12:22:00
|
"Conal Tuohy" <Con...@vu...> writes: > Christian Wittern wrote: > >> We are using TEI with a homegrown ontology for the Tang Knowledgebase >> Project here in Kyoto. We do quite similar things (e.g. linking from >> <name> and other elements) and are also using TopicMap technology. We >> are working with texts, rather than bibliographies, so the association >> is in most cases not immediately derived from the context. > > Our TEI files are mostly "texts" too. But quite a few of them contain bibliographies at the back, or have bibliographic references inside them. > > What kind of associations are you harvesting? This is quite experimental. At the moment, I am interested in things like co-occurrences of <name>s, names and places, as well as some associations that can be derived implicitly from the context, like for example the date of entries (in the case of a history we are working on at the moment). We hope of course, that some patterns evolve that look promising. > >> At the moment, one of the headaches is a software environment for >> maintaining this stuff and so we are at the beginning of developing >> our own stuff with eXist and some Java libraries. I would be very >> interested to hear what kind of s/w environment other projects are >> using. We did try the OKS Suite from Ontopia, but it choked on some >> of the larger topicmaps. > > We are using TM4J. To build our topic map We merge XTM topic maps > which we have harvested from TEI files and MADS authorities, using > XSLT pipelines running in Cocoon. Interesting. I might have some more specific questions at some point, so this is good to know. > > What sort of size were your large topicmaps, Christian? In our > development so far we've got up to about 70Mb of XTM. Thats quite sizeable, even given the verbosity of XTM. I do not remember exactly, but I think it was in the range of 50 MB. OKS (actually, the Omnigator) would just issue SQL statements to the backend to get some statistics for the welcome page and the whole thing timed out after 60 mins. Pretty depressing actually. > TM4J has a few > different back-ends. At present we use the in-memory back-end which > naturally has the best performance by far, but it requires a lot of > memory of course. You can also use relational databases with it, > though actually I've found them pretty slow. So in-memory is the way to go? Quite substantial given, a size of 70 MB in the filesystem, the memory footprint must be at least 10 times of this?! All the best, Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN |