From: Andreas W. <And...@em...> - 2014-11-17 10:24:48
|
Hi Joe, hi all, sorry for chiming in so late. * Joe Wicentowski dixit [2014-11-02 19:23]: > Peter Stadler and I were chatting about this list, the community of > TEI and eXist users, and we thought this would be a good opportunity > for everyone to check in and say a few words about how you are doing. > > What are you working on these days? How is it going? Lessons? > Questions? Ingo Caesar and I have begun almost a year ago to develop a database and application with eXist. Both of us are not really IT people (Ingo is a librarian and I am a philosopher), and neither of us had any experience with xquery/xslt before, although we do have some affinity to IT/web technology and some meager previous experience with other systems. We had very nice and support from different folks, but we all were even more amazed at how the templating system and the xquery libraries allow you to get really presentable results very easily and very quickly -- so I should first of all congratulate and thank you for that. The project we're busy with is a collection of sources (of 16th and 17th century prints of legal and political thought), to be complemented by a dictionary of key terms. [1] It is scheduled to go online next year with a limited set of source texts. I. Some of the maybe less obvious things we have had to deal with so far are these: * Pre-rendering: generating fragments of html that are transparently loaded and reloaded in the webapp, since both on-the-fly transformation and complete loading of our large (e.g. 14 MB) source files would result in very bad performance. This requires some processing of the linking features. As a part of this, we create an xml file containing an index of nodes (noting the html fragment in which it ends up and its crumbtrail) and a standalone toc html for every work. * Lemmatized search (i.e. you search for "vel" or "potestas" and get results with "sive" and "potestatem", respectively). We are just finishing this and it still has quite a few rough edges. We need to work on our dictionaries and adapt them to our texts, but the technical essentials are in place. (We use the sphinx search engine [2] for this. This also seems faster than lucene, but there is more integration fine-tuning necessary, of course.) * Responsive design, so that the webapp is easy to use on all sorts of devices. In addition to eXists's templating system, we make use of bootstrap and jqueryUI for most of our UI implementation. A usability test/survey is scheduled for next year. * Private URIs as per [3]. We use @ref values such as getty:7002722, cerl:cnp00396685, gnd:118622110, author:A0100 or facs:W0013-0019 to link some of our elements to authority databases, to our image server or to other resources within our database. For some of the elements, the @ref attribute can contain several such values. So far, we use an xslt function to translate these private URIs to full weblinks in the html that we are generating. II. Challenges. What keeps popping up frequently or what we have to confront at one point are points like these: a) AJAX. We use infinteajaxscroll [4] for loading of text fragments, but we would like load other parts of the webpage in the background as well. We are going to start by looking into Peter's WeGA [5] again. I am of course open to suggestions and pointers. b) XSLT. I have run into a few difficulties when the "processing instruction processing" in xslt is not as sophisticated as in xquery (I have an attribute in my processing instructions. In xquery I can, with some difficulty, access its value, but not so in xslt.), when xslt has few options to learn about its (pseudo-)filesystem environment and when I wanted to profile my transformations. So far, I have found workarounds for those issues, but recently I started re-coding my xslt's in xquery. I think this may even have other advantages, but it felt like I was almost forced to go this route. Finally, I see some challenges of a more conceptual kind coming up, i.e. versioning and user management: c) Versioning: CVS/GIT integration is not implemented, but we can live with that for now. More critically, however, while we do have a rough sequence of states that our documents go through, up to the point where we publish them and acquire persistent identifiers, I can not at all see how we are going to strike a balance between continuous updating and citability (is there an equivalent to, say, a "third, corrected and augmented edition" in web resources?). d) User management: And we will possibly at some point want to offer options of online collaboration, annotation and commenting of our sources. It is not at all clear to me how this process will be designed, let alone implemented. Right now, our code is in desperate need of a cleanup, but we intend to open-source it, and will probably put it online at github when it is somewhat more presentable. If anyone of you has any suggestions and/or pointers WRT the issues I described, I would be more than happy to learn about them. On the other hand, if anyone is interested in browsing our development instance, I would gladly give you the url, although I would (for now) prefer to do so off-list. I am looking forward to learning more about eXist, TEI and other projects. Thank you for soliciting and ty even more for putting up with such a long e-mail. Cheers, Andreas [1] http://www.salamanca.adwmainz.de/en/description.html [2] http://sphinxsearch.com/ [3] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SAPU [4] http://infiniteajaxscroll.com/ [5] https://github.com/Edirom/WeGA-WebApp -- Dr. Andreas Wagner Project "The School of Salamanca" Academy of Sciences and Literature, Mainz and Institute of Philosophy Goethe University Frankfurt http://salamanca.adwmainz.de IGF HP 25 / R 2.455 Grüneburgplatz 1 60629 Frankfurt am Main Tel. +49 (0)69/798-32774 Fax +49 (0)69/798-32794 |