From: Martin M. <mar...@ma...> - 2010-01-29 20:29:15
|
I am trying to teach myself how to use eXist in order to build a site for lexical analysis and collaborative annotation that will begin with Shakespeare and may in time include several hundred 16th and 17th century plays. As I teach myself, I would like to write a tutorial that can help others like me build sites of moderate complexity. Wolfgang has been very generous with advice on this or that. Now I have spent several days with the documentation and have a lot of questions for which some people on this list may have advice. I am not and will never be a programmer. I'm an English professor who has dabbled with perl, tcl, and Python. I did moderately complex work with Peter Robinson's Anastasia, which is an xml database of sort and required mixing tcl, sgrep, and html. I was able to do that largely because of very good and step-by-step documentation on doing simple routines within a clear explanatory framework. Here are some proposition and questions. I'll be grateful for answers to any or all of them. 1. eXist is an xml database written in Java, but you can use it for sites of some complexity even if you don't know Java and have no intention of learning it. True or false? 2. The must-haves for building anything in eXist are a) a good understanding of XML b) a command of xquery c) some familiarity with formating and styling html output d) a basic familiarity with doing some stuff at the command line Is that an exhaustive list of the must-haves? 3. Javascript (any particular flavour?) is a good-to-have in many situations and may be a must-have for applications beyond some level of complexity, but it is a layer of competence you can add later. True or false? Now for some other questions. The current documentation is lucid and comprehensive on many things, but there are some basic things where I simply cannot connect the dots and don't know where to look. Example 1: What are the processing steps that transform a document from its source to its display on the user's screen. Take /webapp/index.xml. I can make sense of that document in its xml form, and I see, for instance, that it includes pointers to other documents that are meant to be shown together with it. But where is the script that is activated when I tell the browser to fetch http://localhost:8080/exist/index.xml? Here are slightly different versions of the same question: What kinds of events and in what sequence are triggered by the browser command http://localhost:8080/exist/index.xml? What kinds of script do I need to write and where do I put it in order to enable the execution of that command? When the machine is told to do something with the file 'index.xml', how does it know where to look? Example 2: What is a "collection"? How do you turn a set of xml files into a "collection?" What are required or preferred locations for putting them? How do you index them? What does /db/system refer to? There is quite a bit of documentation here, but it is very confusing if you enter the realm of xml databases as a novice. What I need to proceed is something very simple, a bare bones structure of steps that take you from a set of files to an indexed collection with a minimal set of functionalities for text display and searching. On the other hand, this bare bones structure should be robust and extensible, a simple doll that you can put different clothes on later. I define the minimal set of functionalities as the ability to chunk a document into its parts, a panel for navigating those chunks, and a basic lexical search feature. Users who have ben taken through and mastered those steps will probably be able to add more complex features at their own pace. At least that has been my experience in the past. Here are the questions in order: 1. I have a set of files. It could be the Hamlet and Macbeth files from the samples directory or it could be some of the documentation files. I want to turn them into a collection. 2. a Macintosh specific question: The eXist installer puts eXist in the Applications directory. But I may not want to keep data in that directory. Can I put my collections in some other directory. Or would it be simpler to move the entire eXist operation somewhere else? 3. Does it matter where my collection lives? To judge from the file structure of the eXist application seems to be 'no'. But is there a good practice to start with? 4. For the sake of argument, let's put my collection in /eXist/mycollection. What do I do next? What is the minimal configuration file I need to write in order to index the texts in my collection? Where do I put the configuration file? How do it execute it? Where will eXist store the indexes it creates? Do I have a choice? (I'd rather not). 5. What is the simplest way of creating a minimal Web interface for my database? Can I rely on the XqueryServlet for quite a while or need I learn about the REST server from the beginning? Am I correct in assuming that the indexes are a black box? That is to say, xqueries (or other scripts) address the indexes, but in formulating them I need only look at the source documents. Here is a final question. Because eXist has been around for a while it is a project with a history. Some of that shows in the directory structure. There may be a little cruft here and there, and there may be a fair amount of stuff that is optional rather than essential. I gather from the documentation that the project has moved in a certain direction. "Writing Web applications using Xquery" seems the way to go, and there may be considerable interest in XRX web application architecture. Or so I would like to understand the documentation. But if that is the case, would it be possible to create a distribution of eXist in which you throw away everything that doesn't relate to that development path? From the perspective of the learner that would be quite helpful. If in rummaging around the file system you can start from the assumption that it includes everything needed for your tasks, that's a great advantage. How about an "essential exist" version, whether as a separate distribution or as a distribution in which the essential parts are in a directory of their own and you can throw away or ignore the rest until you're ready for it. I am pretty confident that once I have learned how to build a fairly basic eXist database with a 'document' rather than 'data' orientation, I will be able to do a pretty good job of writing a tutorial that can help others. Almost a decade I wrote a "Very Gentle Introduction to the TEI" and was surprised by the many people who told me how useful it was. If I am told that I am the only person in the world who has these difficulties connecting the dots on some of the most basic eXist procedures I'll be happy to bow out of this altogether. But I suspect I'm not the only one. Martin Mueller |