From: SourceForge.net <no...@so...> - 2008-03-27 03:39:08
|
Feature Requests item #1661177, was opened at 2007-02-16 14:09 Message generated for change (Comment added) made by xmldoc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=373750&aid=1661177&group_id=21935 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: XSL Group: output: manpages >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Scott Smedley (ssmedley) Assigned to: Michael(tm) Smith (xmldoc) Summary: man: reduce memory usage as much as possible Initial Comment: Hi, I am getting HUGE amounts of memory usage when using xsltproc with the latest releases of docbook-xsl. The input XML file is ~563Kb in size. Versions 1.72.0 & 1.70.1 of docbook-xsl cause xsltproc to consume MORE THAN 1.5 GIGABYTES of memory when outputting in manpage format. Compare this with version 1.68.1-1 of docbook-xsl installed from an .rpm for my Fedora 4 system) which consumes ONLY 57 MEGABYTES of memory. I'm happy to assist debugging this issue, but I'll need some guidance - email me if I can be of use. The input file I used is here: http://www.aao.gov.au/local/www/ss/tmp/fvwm.1.xml LaPSS>> xsltproc -version Using libxml 20619, libxslt 10114 and libexslt 812 xsltproc was compiled against libxml 20619, libxslt 10114 and libexslt 812 libxslt 10114 was compiled against libxml 20619 libexslt 812 was compiled against libxml 20619 SCoTT. :) ---------------------------------------------------------------------- >Comment By: Michael(tm) Smith (xmldoc) Date: 2008-03-27 12:39 Message: Logged In: YES user_id=118135 Originator: NO Scott, This has been open for a year now and I've not come up with any brilliant ideas for getting the memory size down, so I'm going to close this for now. --Mike ---------------------------------------------------------------------- Comment By: Michael(tm) Smith (xmldoc) Date: 2007-06-22 12:38 Message: Logged In: YES user_id=118135 Originator: NO Moving to feature requests as this is not strictly a bug. I would like to try to get the RAM usage down more if I could, but have not had time to work on it. At best, as I mention, I reckon I could probably only get it down to 100Mb at best for this particular test case (and that is thinking really optimisticallly). ---------------------------------------------------------------------- Comment By: Scott Smedley (ssmedley) Date: 2007-03-14 22:31 Message: Logged In: YES user_id=370510 Originator: YES Hi Mike, > Actually, just running "xsltproc docbook-xsl-snapshot/manpages/docbook.xsl > ./fvwm.1.xml" > is all you need to do. I thought so too until I ran: strace -etrace=open -o /tmp/o xsltproc docbook-xsl-snapshot/manpages/docbook.xsl ./fvwm.1.xml & saw that xsltproc was reading files in /usr/share/sgml/docbook/. I tried to remove the docbook-dtds package but ran into dependency hell. Instead, I just ran xsltproc again with --novalid & apart from a few (probably unimportant) parse errors it still seemed to work ok. It's a bit strange to me why xsltproc should want to read files in /usr/share/sgml/docbook/ but it still took just as long & consumed roughly the same amount of memory so ... In short, everything I said previously still stands. Time for bed methinks. Scott. :) ---------------------------------------------------------------------- Comment By: Michael(tm) Smith (xmldoc) Date: 2007-03-14 22:13 Message: Logged In: YES user_id=118135 Originator: NO Scott, Actually, just running "xsltproc docbook-xsl-snapshot/manpages/docbook.xsl ./fvwm.1.xml" is all you need to do. I think you'll get the same results if you use the steps I suggested. I just normally suggest those because using the remote URL and mapping it through catalogs is the best way to do it if you want to script it and want to switch between production and snapshot versions of the stylesheets. Because if you afterwards just run "docbook-xsl-snapshot/uninstall.sh --batch", it will revert all pointers to the snapshot and so get you back to however you had your environment set up before installing the snapshot. --Mike ---------------------------------------------------------------------- Comment By: Scott Smedley (ssmedley) Date: 2007-03-14 22:07 Message: Logged In: YES user_id=370510 Originator: YES Argh! Wait. I didn't read your last post properly. All I did was unzip the snapshot & run: xsltproc docbook-xsl-snapshot/manpages/docbook.xsl ./fvwm.1.xml Let me install it as you described & retry. Scott. ---------------------------------------------------------------------- Comment By: Scott Smedley (ssmedley) Date: 2007-03-14 21:56 Message: Logged In: YES user_id=370510 Originator: YES Hi Michael, I just realised that the ~135Mb figure I gave you was on a hacked version of the 27-Feb-07 snapshot. Unhacked, the memory usage is ~190Mb. So what did I hack? I commented out all the *ahem* crap in common/l10n.xml. ie. I just kept the "en" stuff. That's all I changed. I just tried the same trick on the latest snapshot (13-March-2007). It gives the same numbers. ie. ~190Mb of memory, or ~135Mb when I get rid of the superfluous l10n stuff. > So my question now is whether the performance you're getting with the > snapshot is acceptable to you? I know ~135Mb is still a lot of memory > for just transforming a 0.5Mb document. Yes. I have to confess it still sounds rather excessive. I'm a developer of the FVWM Window Manager which consumes ~7Mb of RAM - so perhaps that has biased me! But, it would be unfair of me to criticise given that I don't understand the intricacies of what xsltproc is actually doing. A little more information about my particular usage: The fvwm.1.xml file we're talking about here is just a test case. (I simply ran ESR's doclifter on the fvwm.1 man page.) My actual input files total ~1.2Mb. So that's more than twice the size of the fvwm.1.xml file. When I run xsltproc with my hacked version of the stylesheets, it actually consumes ~230Mb of memory. (I haven't checked how much it would be unhacked.) So, in short, I would _really_ like to obtain further memory reductions if it is at all possible. > One thing to note is that the manpages stylesheet had to import > the whole DocBook HTML stylesheet in order to run a transform. > When I run a transform with the HTML stylesheet, it takes about > 75Mb of RAM. When I run one with the manpages stylesheet, > it takes about 122Mb. So the maximum amount of memory I'd ever > be able to reduce it by is some portion of the 47Mb difference > between those two. Assuming I could figure out how to reduce > it by, say, half (I'm thinking optimistically), that would > amount to a reduction of maybe 24Mb. > > So in your environment, I guess even if I made the needed > changes, you'd still be using 100Mb or so (instead of 135Mb) > to run the transform. > > But if that amount of reduction in the RAM usage is important > to you (to be able to not have to use/maintain your > post-processing scrip), I can spend some time to try it. I would appreciate memory reductions of this size _enormously_! > Also, can I ask how long the XSLT transformation takes to > run in your environment? Ah! This is something else I was going to ask you about. > On my machine, it takes about 8 to 10 seconds What spec is your machine? It takes ~22 seconds on my system that has plenty (1Gb) of RAM: LaPSS>> /bin/grep 'model name' /proc/cpuinfo model name : Intel(R) Pentium(R) M processor 1.73GHz Do you have any suggestions for how I might be able to make it run faster? Thanks again for all your help - it is muchly appreciated. Scott. :) ---------------------------------------------------------------------- Comment By: Michael(tm) Smith (xmldoc) Date: 2007-03-14 20:41 Message: Logged In: YES user_id=118135 Originator: NO Scott, I think no need to try with the latest snapshot if you're already testing with a snapshot numbered 6657 or later -- because the latest relevant change I made to that code was at rev 6657 of the project svn repository. So my question now is whether the performance you're getting with the snapshot is acceptable to you? I know ~135Mb is still a lot of memory for just transforming a 0.5Mb document. Is it enough that you can turn off the string-replace post-processing step you were using as an alternative? Or will you continue to use that? There may be some other places I can mess with to get it down a bit more. I don't really have a clear idea at this point about how much of a difference those changes might make. One thing to note is that the manpages stylesheet had to import the whole DocBook HTML stylesheet in order to run a transform. When I run a transform with the HTML stylesheet, it takes about 75Mb of RAM. When I run one with the manpages stylesheet, it takes about 122Mb. So the maximum amount of memory I'd ever be able to reduce it by is some portion of the 47Mb difference between those two. Assuming I could figure out how to reduce it by, say, half (I'm thinking optimistically), that would amount to a reduction of maybe 24Mb. So in your environment, I guess even if I made the needed changes, you'd still be using 100Mb or so (instead of 135Mb) to run the transform. But if that amount of reduction in the RAM usage is important to you (to be able to not have to use/maintain your post-processing scrip), I can spend some time to try it. Also, can I ask how long the XSLT transformation takes to run in your environment? On my machine, it takes about 8 to 10 seconds -- which is in range for what I see with other source docs of similar size. (Smaller docs can take just 2 seconds or so). --Mike ---------------------------------------------------------------------- Comment By: Scott Smedley (ssmedley) Date: 2007-03-14 20:06 Message: Logged In: YES user_id=370510 Originator: YES Hi Michael, With the snapshot I downloaded on 28-Feb-2007, xsltproc is consuming ~135Mb of memory. Would you still like me to try the latest snapshot? Scott. :) ---------------------------------------------------------------------- Comment By: Michael(tm) Smith (xmldoc) Date: 2007-03-14 19:40 Message: Logged In: YES user_id=118135 Originator: NO Scott, The latest snapshot contains a significant change I made to try to resolve this. If possible, can you please try running your source through the latest snapshot. You can test with the snapshot by doing the following: - Change to some directory where you have perms to write files, e.g. /opt/scratch, download the snapshot into it, unzip the snapshot, run the install.sh script, then run your transform. For example, cd /opt/scratch http://docbook.sourceforge.net/snapshots/docbook-xsl-snapshot.zip unzip docbook-xsl-snapshot.zip ./docbook-xsl-snapshot/install.sh --batch . /opt/scratch/docbook-xsl-snapshot/.profile.incl That will point your catalog system at the snapshot. Then you can run your transformation by doing this: xsltproc http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl fvwm.1.xml ... and that /should/ cause xsltproc to map/resolve the remote stylesheet URL to /opt/scratch/manpages/docbook.xsl ... in which case if you vim/less your ./fvwm.1 output file, you should see this near the top: Generator: DocBook XSL Stylesheets vsnapshot_6668 (or whatver the current snapshot build number is at the time you read this) But if that doesn't work, then you can always just do: xsltproc /opt/scratch/docbook-xsl-snapshot/manpages/docbook.xsl fvwm.1.xml Anyway, please try if/when you have time, and let me know if it works better (without eating up all your available RAM...) --Mike ---------------------------------------------------------------------- Comment By: Michael(tm) Smith (xmldoc) Date: 2007-02-25 12:53 Message: Logged In: YES user_id=118135 Originator: NO Sorry, I didn't read your description... I'll just download the source from the URL your provided. --Mike ---------------------------------------------------------------------- Comment By: Michael(tm) Smith (xmldoc) Date: 2007-02-25 12:50 Message: Logged In: YES user_id=118135 Originator: NO Scott, If possible, can you please upload/attach your XML source file? I would like to test with it if I can. --Mike ---------------------------------------------------------------------- Comment By: Scott Smedley (ssmedley) Date: 2007-02-20 22:57 Message: Logged In: YES user_id=370510 Originator: YES Hi Michael, > What I /could/ potentially do is > ship a perl or sed script in the distribution that you could optionally > instead (and provide a parameter in the stylesheets for easily disabling > the XSLT-based string substitution). Yes, that's effectively what I'm doing as a (temporary?) workaround. I hacked the stylesheets to bypass string substitution & wrote a simple perl script to do it afterwards. See: http://www.aao.gov.au/local/www/ss/tmp/string.subst.pl A parameter to turn it on/off would be better though. > Let me know what you think of that idea. It sounds like it is the best possible solution, given the constraints. Personally, I'd use it. Given the amount of memory the XSLT string substitution performs I'm sure other users would prefer (require?) a post-processing option too. Scott. :) ---------------------------------------------------------------------- Comment By: Michael(tm) Smith (xmldoc) Date: 2007-02-20 19:42 Message: Logged In: YES user_id=118135 Originator: NO It looks like support for the EXSLT str:replace function has recently been added to libxslt/xsltproc - http://article.gmane.org/gmane.comp.gnome.lib.xslt/3267 But I have no idea how long it will be before a new libxslt/xsltproc release with that support will be available (nor how long it will be after that before a new Debian package for it is available). So in the mean time, I'll look at my current string-substitution code and see if there's anything I can do to make it use less memory. I doubt that there is. I think it's just inevitable side effect of the approach I'm using -- which requires reading the entire rendered output into memory multiple times and iterating over the entire contents each time to do the string replacements. The unfortunate fact is that XSLT 1.0 is not really designed to do string replacments efficiently. It would be much more efficient to do post-processing using perl or sed or something. But the problem with that is that we have a requirement that the DocBook Project stylesheets have no dependencies other than an XSLT engine. So I can't introduce a required post-processing step using other tools. What I /could/ potentially do is ship a perl or sed script in the distribution that you could optionally instead (and provide a parameter in the stylesheets for easily disabling the XSLT-based string substitution). Let me know what you think of that idea. ---------------------------------------------------------------------- Comment By: Michael(tm) Smith (xmldoc) Date: 2007-02-17 19:47 Message: Logged In: YES user_id=118135 Originator: NO Scott, Thanks for the heads-up about this. Unfortuately, I don't have any good idea how to fix it either. :( What those <substitution> instances are doing is causing the XSLT processor to read in the entire contents of your output and to do string search and replacement on those contents (in order to deal with some characters that are treated as special characters by troff/groff and to do some cleanup that would otherwise be very difficult to do). The problem is that there is no standard way to do that kind of string replacement in XSLT 1.0 and the way that I have the stylesheet doing it now is the only way I know how to do it -- though I recognize that it's very inefficient. It seems to cause the XSLT processor to use as much memory as it can get. I will do some profiling to see if I can mitigate some of the memory issues. But going back to the way that the 1.68.1 stylesheet was doing it is not an option -- because it produced output bugs in many cases, and the current string-substitution mechanism is designed to fix those bugs. Unfortunately, I think that the limitations of XSLT 1.0 make this a real PITA to deal with. ---------------------------------------------------------------------- Comment By: Scott Smedley (ssmedley) Date: 2007-02-16 17:01 Message: Logged In: YES user_id=370510 Originator: YES If I comment out the 28 <substitution ...> lines in manpages/param.xsl the memory usage reduces to ~80Mb. Obviously the output is incorrect, but it indicates the area that is using so much memory. I've no idea how to fix it. Scott. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=373750&aid=1661177&group_id=21935 |