[Tclxml-devel] TclDOM Memory Consumption, SCMS

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Pete,

After our conversation of a few days ago, I thought I'd
do an experiment on TclDOM memory consumption.

[for the benefit of mailing list readers, Peter
queried how much memory a DOM tree consumes compared
to the size of the text document, ie. if the original
XML document is 1MB in size, how much memory does the
DOM tree representation of that document take?
DOM is notoriously memory-hungry, so this information
is important when scoping apps that are going to be
handling alot of XML documents and possibly caching them
in-memory.  We are builing a Content Management System
(CMS) that has these characteristics.]

I created a document, using a trivial Tcl script, that
contained 1000 elements each containing 1000 random
characters.  Memory consumption for that document was
~1MB, as expected.  I then parsed the document 100
times, creating 100 identical DOM trees (btw, parsing
the 1MB document 100 times was significantly faster
than creating the one document in the first place!).
Memory usage increased by 116MB, indicating that the memory
overhead for the libxml2 DOM tree is less than 20%.
That seems pretty reasonable.

Next I wrote another script that created a TclDOM object
handle for every element node, to see what the additional
memory overhead is for TclDOM.  The increase was 58MB,
so that's ~580 bytes per node.  Now, in a CMS it would be
unlikely that you'd be handling individual nodes;
the app would just be passing entire trees around the place.
Thus per-node memory overhead is not of much concern.

Overall, my conclusion is that libxml2 seems to introduce
minimal additional memory overhead.  Of course, YMMV.
Memory consumption will be on a per-node basis, so a
document that contains few elements and text nodes
that are large will have a lower overhead than

Leading on from this I thought I'd experiment with
building a CMS using the Tcl Web Server for handling
HTTP requests.  tclhttpd sure is good for quick
prototyping!  Starting on Friday, by Monday I had a
Web server running on my TiBook that generates our
website's HTML on-the-fly via XSLT.  Source documents
and compiled stylesheets are cached in-memory.
The server determines which source document and XSL
stylesheet to use by consulting the XML Pipeline
document for the website.  The tricky part is that
the pipeline controller itself is also an XSL stylesheet!
So, a request for a HTML document is handled by
invoking the pipeline controller XSL stylesheet.
The stylesheet works out which source document
and stylesheet to use, as well as parameters to pass,
and then invokes another transformation to do the
work of generating the HTML.  IOW, a document request
causes at least one and possibly two transformations
to be performed.

Performance seems OK but not super-fast.  Caching seems
to be effective in serving pages quickly.  However, I think
I'll have to instrument the code to allow profiling and
work out the slow-spots and also use (or write) a client
robot to time the requests.

I didn't use Apache for this prototype because getting
mod_tcl built on my TiBook was not straight-forward
(due to the fact that I've got Tcl built as an OSX
framework, etc, etc).  At this stage I might try
running Apache up on our Linux server and getting
mod_tcl+TclXSLT working there.

Cheers,
Steve

-- 
Steve Ball            |   XSLT Standard Library   | Training & Seminars
Zveno Pty Ltd         |     Web Tcl Complete      |   XML XSL Schemas
http://www.zveno.com/ |      TclXML TclDOM        | Tcl, Web Development
Ste...@zv...  +---------------------------+---------------------
Ph. +61 2 6242 4099   |   Mobile (0413) 594 462   | Fax +61 2 6242 4099