From: Oystein R. <oys...@ak...> - 2007-08-30 16:38:36
|
Here's a design problem of mine I described in a posting some weeks ago. Unfortunately I got no replies. I'll try to rephrase and elaborate. My original presentation might not have been clear enough. Hope I do better this time. - Øystein - ............... All, I have a database in eXist that can be queried from a web interface, with the help of a java servlet running under Tomcat. The exact configuration of my system might not be relevant to my questions, but I mention it anyway. What _is_ relevant, however, is that my database stores textual documents, and that I do full text search on these documents. Furthermore, and most important, I want to present the found documents with the search words highlighted. eXist supports such highlighting, by putting <exist:match> tags around the matching words in the retrieved documents. It's just up to the application to convert these tags into something visible. After a search is done, I want to present a result list showing the found documents. To simplify my case a bit, let's assume my documents are fairly short, say 200 words each, and that the result list shows each found document in full. (In a real full text document search system the result list would normally show each found document represented by some brief "surrogate" - a line of metadata and/or text _excerpts_. Think of a Google result list.) Since a query might find many documents, I want to show long result lists one chunk at a time, e.g, 20 documents at a time. I assume there are two main approaches for dealing with such chunks: (1) Each time the user/browser requests a new chunk, the original query is simply _rerun_, and the desired chunk is extracted from the whole result. The extraction can be done by the query itself. (2) At the initial search the whole result is stored to some temporary location, e.g, stored in the session by use of the session:set-attribute() function. When a chunk is requested, the chunk is somehow retrieved from, or with the help of, that stored result. It seems to me that (2) is the "proper" way of doing it. Say I chose (2). But what exactly should I store? Should I store a "fat" result list containing each document in full, complete with the match tags? That would make it easy later to retrieve a chunk. But the stored result list and the in-memory DOM tree might become very large. Or should I store a "lean" version - a result list with just a unique reference to each found document? If I choose this alternative I lose the match information and must somehow recreate it when a chunk is retrieved. Let me here inject that I do full text search with the help of an XPath predicate, not a FLWOR "where" clause. One way to recreate the match information of a "lean" chunk would be to retrieve the chunk with a new query that combines (a) a predicate tailored to retrieve exactly the desired chunk with (b) the predicate of the original query. The purpose of (a) would be to retrieve exactly the right documents of the chunk, in the correct order. (b) would serve to get the match information reapplied. Predicate (a) might be a rather clunky thing, with a long, explicit "or" expression mentioning each single reference - something like "[ref=id_m or ref=id_m+1 or ref=id_m+2 or ... or ref=id_m+19]". Btw - if I store the (lean) result list in the session I could perhaps store each chunk - a list of 20 references - in its own session attribute? So to retrieve a chunk my servlet should first get the session attribute value for that chunk, i.e, a list of 20 references, then construct a long and explicit predicate from that list, etc. There might also be a third result list alternative, somewhere between "lean" and "fat". I might not throw away the match information altogether, as in the "lean" version, but store it in a compact way, as references to character positions, or something. In this third, "compact" alternative each document in the result list is stored as a reference and a list of match positions. Comments? Suggestions? - Øystein - -- Øystein Reigem, The department of culture, language and information technology (Aksis), Allegt 27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47 55 58 94 70. E-mail: <oys...@ak...>. Home tel: +47 56 14 06 11. Mobile: +47 97 16 96 64. Home e-mail: <or...@br...>. Aksis home page: <www.aksis.uib.no>. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Exist-open mailing list Exi...@li... https://lists.sourceforge.net/lists/listinfo/exist-open |