Dana, Markos - Sounds good. I'll pull together some stuff today or tomorrow,
write a brief intro/orientation, then send it all along. Thanks. ... Gary
On Tue, Nov 10, 2009 at 7:37 PM, daniela florescu <dflorescu@...> wrote:
> yes, please use data+queries, and we'll do our best to optimize them,
> On Nov 10, 2009, at 3:02 PM, Markos Zacharioudakis wrote:
> Hi Gary,
> I would really like to look at you data and queries. Is it possible
> (without taking too much of your time) to separate just the xml docs and the
> related xqueries from your application and send them to us? That would be
> *From:* Gary Lewis <gary.m.lewis@...>
> *To:* daniela florescu <dflorescu@...>
> *Cc:* Markos Zacharioudakis <markos_za@...>;
> *Sent:* Fri, November 6, 2009 3:43:36 AM
> *Subject:* Re: [Zorba-users] example using indexes?
> Hi Markos - Thanks for the orientation to zorba's abstract store API. It
> sounds like something I best leave to you folks. Much appreciated.
> Hi Dana - I'd be happy to share this use case if you think it would be
> helpful to the zorba folks. I got involved with it because I thought I'd
> learn from it, and it hasn't disappointed me in that regard. Here's a brief
> description of the data that may help you decide if zorba would also find it
> a helpful use case.
> It's US Bureau of Census data from the Current Population Surveys (CPS) for
> Oct 2007 dealing with school enrollment. It comes as a flat text file.
> Roughly 150,000 rows. Each row about 1000 characters in a fixed-field format
> (no delimiters). I used awk (gawk actually) to extract only those variables
> of interest to me and to create a basic xml file (still flat). The data has
> some interesting hierarchical structure to it. Households composed of
> various kinds of family units composed of zero or more sub-families composed
> of individuals and their relationships (householder, spouse, children, etc).
> There are any number of interesting xml trees that could be built from the
> flat structure. I'm using xquery in 2 ways: 1) to transform the flat xml
> into xml trees that can be used for data analysis (eventually with R); and
> 2) to create frequency distributions to verify the unweighted counts and
> several detailed tables that the Bureau of Census provides in their
> technical documentation and web site.
> If this sounds interesting, let me know and we can move ahead. I'm happy to
> share any of the xquery programs I've written for this use case.
> On Thu, Nov 5, 2009 at 5:17 PM, daniela florescu <dflorescu@...>
>> it would be useful if you could show us your data and query patterns (not
>> if you are willing to disclose...).
>> If you do, we can profile them and see if we can do anything better.
>> It's always easier to optimize given a particular use case.
>> About persistence: we will have one, sooner rather then later.
>> In the meantime, did you try the Zorba extension that runs on the Amazon
>> provided by http://www.28msec.com ? (maybe this would do the job for you ?)
>> On Nov 5, 2009, at 1:38 PM, Gary Lewis wrote:
>> Hi Markos - Thanks very much for your reply. As you suggest, I'll wait on
>> the index testing until the next release. Also there's no need to pull
>> together any of the performance testing that you mention. I'm at the stage
>> where I'm still playing and learning.
>> Yes, I assumed that the indexes would be main-memory only. But I'm working
>> under such tight hardware constraints (ie, a server that's 8 years old; a
>> Pentium M chip; and only 1.2G memory) that any boost in performance would be
>> helpful. Of course, I could get a new server, but it rather helps me write
>> tighter code if I'm resource bound. It's a good way to learn if I always
>> need to think about response time.
>> I'd also like to try using zorba with Berkeley DB XML. I've used xzilla a
>> bit, but very much like zorba and would like to use zorba with a persistent
>> store like BDB XML. Unfortunately I've no idea how to begin. Do you or
>> others at zorba have instructions or guidelines or advice on how to proceed?
>> Thanks again. I appreciate it.
>> On Thu, Nov 5, 2009 at 3:32 PM, Markos Zacharioudakis <
>> markos_za@...> wrote:
>>> Hello Gary,
>>> The state of both the documentation and the implementation of indexes in
>>> 0.9.8 is rather poor. We are currently working on improving index support in
>>> zorba, so, I would like to ask you to be a little patient and wait until the
>>> next release (in about 2 months from now).
>>> On thing to keep in mind is that zorba does not come with a persistent
>>> storage system; it has a main-memory store only, which basically manages a
>>> set of xml trees. Indexes will also be main-memory only and index data from
>>> the main-memory trees.
>>> Now, regarding performance, I am not aware of any public comparative
>>> studies of different xml systems. We have some internal numbers comparing
>>> zorba, xqilla, and saxon on the XMARK benchmark. If you want, we can prepare
>>> a report for you (we will need some time to freshen-up and clean-up the
>>> numbers). Let us know.
>>> *From:* Gary Lewis <gary.m.lewis@...>
>>> *To:* zorba-users@...
>>> *Sent:* Thu, November 5, 2009 4:57:21 AM
>>> *Subject:* [Zorba-users] example using indexes?
>>> Hi - Does anyone have an xquery example in which the Index feature gets
>>> used? The 0.9.8 documentation provides only modest help.
>>> I could always put the data into a database and improve query response
>>> time that way, but I was hoping to explore the new Create Index etc features
>>> in 0.9.8.
>>> ps. More generally ... does anyone know of a decent resource (article,
>>> web site, conference presentation) about xquery performance or, even better,
>>> about zorba xquery performance?
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008
>> trial. Simplify your report design, integration and deployment - and focus
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.
>> Zorba-users mailing list
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.
> Zorba-users mailing list