From: Steve B. <Ste...@zv...> - 2002-12-03 05:57:39
|
Hi Pete, After our conversation of a few days ago, I thought I'd do an experiment on TclDOM memory consumption. [for the benefit of mailing list readers, Peter queried how much memory a DOM tree consumes compared to the size of the text document, ie. if the original XML document is 1MB in size, how much memory does the DOM tree representation of that document take? DOM is notoriously memory-hungry, so this information is important when scoping apps that are going to be handling alot of XML documents and possibly caching them in-memory. We are builing a Content Management System (CMS) that has these characteristics.] I created a document, using a trivial Tcl script, that contained 1000 elements each containing 1000 random characters. Memory consumption for that document was ~1MB, as expected. I then parsed the document 100 times, creating 100 identical DOM trees (btw, parsing the 1MB document 100 times was significantly faster than creating the one document in the first place!). Memory usage increased by 116MB, indicating that the memory overhead for the libxml2 DOM tree is less than 20%. That seems pretty reasonable. Next I wrote another script that created a TclDOM object handle for every element node, to see what the additional memory overhead is for TclDOM. The increase was 58MB, so that's ~580 bytes per node. Now, in a CMS it would be unlikely that you'd be handling individual nodes; the app would just be passing entire trees around the place. Thus per-node memory overhead is not of much concern. Overall, my conclusion is that libxml2 seems to introduce minimal additional memory overhead. Of course, YMMV. Memory consumption will be on a per-node basis, so a document that contains few elements and text nodes that are large will have a lower overhead than Leading on from this I thought I'd experiment with building a CMS using the Tcl Web Server for handling HTTP requests. tclhttpd sure is good for quick prototyping! Starting on Friday, by Monday I had a Web server running on my TiBook that generates our website's HTML on-the-fly via XSLT. Source documents and compiled stylesheets are cached in-memory. The server determines which source document and XSL stylesheet to use by consulting the XML Pipeline document for the website. The tricky part is that the pipeline controller itself is also an XSL stylesheet! So, a request for a HTML document is handled by invoking the pipeline controller XSL stylesheet. The stylesheet works out which source document and stylesheet to use, as well as parameters to pass, and then invokes another transformation to do the work of generating the HTML. IOW, a document request causes at least one and possibly two transformations to be performed. Performance seems OK but not super-fast. Caching seems to be effective in serving pages quickly. However, I think I'll have to instrument the code to allow profiling and work out the slow-spots and also use (or write) a client robot to time the requests. I didn't use Apache for this prototype because getting mod_tcl built on my TiBook was not straight-forward (due to the fact that I've got Tcl built as an OSX framework, etc, etc). At this stage I might try running Apache up on our Linux server and getting mod_tcl+TclXSLT working there. Cheers, Steve -- Steve Ball | XSLT Standard Library | Training & Seminars Zveno Pty Ltd | Web Tcl Complete | XML XSL Schemas http://www.zveno.com/ | TclXML TclDOM | Tcl, Web Development Ste...@zv... +---------------------------+--------------------- Ph. +61 2 6242 4099 | Mobile (0413) 594 462 | Fax +61 2 6242 4099 |