Re: [eXist-TEIXML] Back from #TEIConf2014

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Joe, hi all,

sorry for chiming in so late.

* Joe Wicentowski dixit [2014-11-02 19:23]:
> Peter Stadler and I were chatting about this list, the community of 
> TEI and eXist users, and we thought this would be a good opportunity 
> for everyone to check in and say a few words about how you are doing.
>
> What are you working on these days?  How is it going?  Lessons? 
> Questions?

Ingo Caesar and I have begun almost a year ago to develop a database and 
application with eXist. Both of us are not really IT people (Ingo is a 
librarian and I am a philosopher), and neither of us had any experience 
with xquery/xslt before, although we do have some affinity to IT/web 
technology and some meager previous experience with other systems. We 
had very nice and support from different folks, but we all were even 
more amazed at how the templating system and the xquery libraries allow 
you to get really presentable results very easily and very quickly -- so 
I should first of all congratulate and thank you for that.

The project we're busy with is a collection of sources (of 16th and 17th 
century prints of legal and political thought), to be complemented by a 
dictionary of key terms. [1] It is scheduled to go online next year with 
a limited set of source texts.

I. Some of the maybe less obvious things we have had to deal with so far 
are these:

* Pre-rendering: generating fragments of html that are transparently 
  loaded and reloaded in the webapp, since both on-the-fly 
  transformation and complete loading of our large (e.g. 14 MB) source 
  files would result in very bad performance. This requires some 
  processing of the linking features. As a part of this, we create an 
  xml file containing an index of nodes (noting the html fragment in 
  which it ends up and its crumbtrail) and a standalone toc html for 
  every work.

* Lemmatized search (i.e. you search for "vel" or "potestas" and get 
  results with "sive" and "potestatem", respectively). We are just 
  finishing this and it still has quite a few rough edges. We need to 
  work on our dictionaries and adapt them to our texts, but the 
  technical essentials are in place. (We use the sphinx search engine 
  [2] for this. This also seems faster than lucene, but there is more 
  integration fine-tuning necessary, of course.)

* Responsive design, so that the webapp is easy to use on all sorts of 
  devices. In addition to eXists's templating system, we make use of
  bootstrap and jqueryUI for most of our UI implementation. A usability 
  test/survey is scheduled for next year.

* Private URIs as per [3]. We use @ref values such as getty:7002722, 
  cerl:cnp00396685, gnd:118622110, author:A0100 or facs:W0013-0019 to 
  link some of our elements to authority databases, to our image server 
  or to other resources within our database. For some of the elements, 
  the @ref attribute can contain several such values. So far, we use an 
  xslt function to translate these private URIs to full weblinks in the 
  html that we are generating.

II. Challenges. What keeps popping up frequently or what we have to 
confront at one point are points like these:

a) AJAX. We use infinteajaxscroll [4] for loading of text fragments, but 
   we would like load other parts of the webpage in the background as 
   well. We are going to start by looking into Peter's WeGA [5] again. I 
   am of course open to suggestions and pointers.

b) XSLT. I have run into a few difficulties when the "processing 
   instruction processing" in xslt is not as sophisticated as in xquery 
   (I have an attribute in my processing instructions. In xquery I can, 
   with some difficulty, access its value, but not so in xslt.), when 
   xslt has few options to learn about its (pseudo-)filesystem 
   environment and when I wanted to profile my transformations. So far, 
   I have found workarounds for those issues, but recently I started 
   re-coding my xslt's in xquery.
   I think this may even have other advantages, but it felt like I was 
   almost forced to go this route.

Finally, I see some challenges of a more conceptual kind coming up, i.e. 
versioning and user management:

c) Versioning: CVS/GIT integration is not implemented, but we can live 
   with that for now. More critically, however, while we do have a rough 
   sequence of states that our documents go through, up to the point 
   where we publish them and acquire persistent identifiers, I can not 
   at all see how we are going to strike a balance between continuous 
   updating and citability (is there an equivalent to, say, a  "third, 
   corrected and augmented edition" in web resources?).

d) User management: And we will possibly at some point want to offer 
   options of online collaboration, annotation and commenting of our 
   sources. It is not at all clear to me how this process will be 
   designed, let alone implemented.

Right now, our code is in desperate need of a cleanup, but we intend to 
open-source it, and will probably put it online at github when it is 
somewhat more presentable.

If anyone of you has any suggestions and/or pointers WRT the issues I 
described, I would be more than happy to learn about them.

On the other hand, if anyone is interested in browsing our development 
instance, I would gladly give you the url, although I would (for now) 
prefer to do so off-list.

I am looking forward to learning more about eXist, TEI and other 
projects. Thank you for soliciting and ty even more for putting up with 
such a long e-mail.

Cheers,

Andreas

[1] http://www.salamanca.adwmainz.de/en/description.html
[2] http://sphinxsearch.com/
[3] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SAPU
[4] http://infiniteajaxscroll.com/
[5] https://github.com/Edirom/WeGA-WebApp

-- 
Dr. Andreas Wagner
Project "The School of Salamanca"
Academy of Sciences and Literature, Mainz
and Institute of Philosophy
Goethe University Frankfurt
http://salamanca.adwmainz.de

IGF HP 25 / R 2.455
Grüneburgplatz 1
60629 Frankfurt am Main
Tel. +49 (0)69/798-32774
Fax  +49 (0)69/798-32794

Re: [eXist-TEIXML] Back from #TEIConf2014

eXist-db is a feature rich Open Source native XML database

Re: [eXist-TEIXML] Back from #TEIConf2014