From: David L. <dl...@ca...> - 2011-05-09 11:57:30
|
I'm excited about this interest in xmlsh + Exist. I took your 'challenge' and joined the mailing list. Here's what I have in mind, but it would help to have my background on this. As always I *crave input* even if not volunteering to help actual coding, letting me know how you use a tool every day is valuable. I am a *beginner* (not even yet novice) to Exist. My history with Exist was when I was evaluating several XML DB's for my Day Job. This was about 2 years ago. I experimented with Oracle, DB2, MarkLogic, Exist, dbxml and probably a few others. For my test case, which was my first use to put the DB I tried to upload about 5 GB of XML and images. ( These come from a public clinical data source from the FDA). I admit my first experience with Exist was horrible. I was unable to upload the files with the GUI tools without hangs or errors. Once mostly successful I tried the GUI samples and tried to do some queries and it took forever, when it worked, and when it didn't I got java errors on the server side. I knew from what great things I had heard about Exist that it was probably me, but first impressions lasted. So I want on to MarkLogic as an experiment ... and loading that amount of data was very tedious as well ... the simple tools couldn't hack it but eventually I found a way and once the data was in the DB it worked flawlessly 'out of the box'. Again, first impressions lasted. < Please excuse description of MarkLogic here - but its relevant to the discussion --> So I ended up doing a lot of work with MarkLogic and pretty much gave up on Exist. <duck> Still though, the tools for doing what I thought 'simple things' with MarkLogic are not that simple. My first goal was to *reliably* upload the data with 10's of GB's of XML and binary data. Another was to do simple queries like "dir" and "put" and "get" and to invoke snippets of XML on the server. So I took the advantage to write these tools. My natural place of course was to integrate them into XML as extension modules. One goal was to directly tie MarkLogic XML binary structure into XMLSH's so I could read an XML document or do an XQuery from ML and load it into Saxon Trees ... turns out this was a fools errand, I ended up having to serialize & deserialize because of the lack of high fidelity API's on the returned data, as well as a mismatch in the data structures. But oh well. I discovered that I only needed 4 basic tools to do pretty much anything A) "Put" - an efficient way to put raw data to the backend. In ML's case there is a specific API for this. For the xmlsh extension I make extensive use of multithreading as well as a special case using MD5's for a "rsync" like behavior. Works really fast and efficiently. B) "Get" - get files from the remote server. In this case I still have a simplistic solution (1 file at a time) but a feature request is in to have the full features of "put" in particular multithreading, directory support etc. C) Invoke ad-hoc query. That is send a query to the server and invoke it ad-hoc and return the results D) Invoke a remotely stored query. <side note> Note: I've only been partially successful with #C and #D ... the MarkLogic API (XCC) has poor support for XDM. I cannot pass to or retrieve anything but Documents and some atomic values. No sequences, no nodes ... just full documents and atomic values ! Ug. XQuery allows you to pass a sequence to a function but I cannot do so with ML API's.... </side note> For MarkLogic, these are the only necessary building blocks for building any kind of tools. Add others can be built on these, mainly ad-hoc queries. Things like "dir" command is done via an ad-hoc XQuery. Thus all the other ML commands are actually simple xmlsh scripts which call into the 4 primitives above. There was one more goal I had which I failed to do with ML directly, but did succeed in a round-about way. That is integrate xmlsh 'on the back end' ... Be able to call xmlsh scripts from ML If you see from this description, I've done this with Saxon XSLT, XPath, XQuery ... as extension functions. http://www.xmlsh.org/XPathExtension A quick example is calling xslt from XQuery within an XPath expression ... /foo/bar/xmlsh:eval('xslt -f file.xsl')/spam This works in Saxon (with xmlsh). But since I cant get at the internals of MarkLogic it cant be done in ML. For ML I make use instead of a round-about solution they use for similar things. That is an xmlsh bridge running in tomcat. http://www.xmlsh.org/EmbeddingServlet < End of MarkLogic > So why say all this ? this is what I'd like to do with Exist. I'd like to go back to "Stage One" and improve my 'initial experiences'. Part of this is an easy and efficient way to get GB's of data into Exist, and to easily script, using my favorite tool (xmlsh) access to Exist. This would allow it to fit into my existing workflow. Also it makes it easy for me to get onto the next part which is figuring out why I was having so much problems in the first place. But to do that I need easy access to simple things like put/get/invoke/query/ls/rm/rmdir etc... In the end if I'm successful I may discover how to use Exist successfully where previously I have failed ... (what a painful backwards way to use a tool ! oh well ..) And finally as a selfish goal, I would love more adoption to xmlsh and providing better integration to more products is a possibly step in that direction. So what are my goals ? 1) Implement 'front end' commands into exist which run from xmlsh. These would allow me from any desktop or server to do command-line things from within xmlsh put file/dir get file invoke ad-hoc invoke stored-query ls/dir rmdir/mkdir set attributes ... Basic things like that. 2) Experiment with direct integration into Exist such as I have with Saxon. This would ideally be as some kind of extension function in Exist which can call xmlsh. I have no idea yet how to do this. 3) Integrate Exist's XML model efficiently with xmlsh's Ideally (but probably not possible) I'd like both sides to be able to efficiently translate Exit's xml model into xmlsh's ... I suspect its not possible but it might be. This is a low-priority issue of mainly intellectual curiosity. I've found (to my surprise) that serializing/deserializing XML to cross implementation boundaries can be as or more efficient then a kind of binary mapping or wrapping ... but you never know until you try it. So where next ?? This is where I could use Advise first, and help if desired. Advise: What is the "Best" API to use from Java to Call-Into Exist functionality ? >From here http://exist.sourceforge.net/devguide_xmldb.html#N10254 It seems that "XML:DB" is the preferred API to use. Does this group agree ? I don't really want to start down a blind path. I'll stop now for fear of losing your patience. -David ---------------------------------------- David A. Lee dl...@ca... http://www.xmlsh.org |