Share

DITA Open Toolkit

Tracker: Bugs

5 java.lang.OutOfMemoryError for map with many references - ID: 1693223
Last Update: Comment added ( imagiczhang )

There seems to be a fundamental issue with the way the OT has been
designed. The OT launches a single Java process that handles everything.

Unfortunately, a 32bit Windows machine is able to allocate a maximum of 2Gb
for any single process. When generating output using the OT the memory used
by the Java process that handles generation may easily grow beyond this
size. This is particularly true of the memory intensive output types such
as CHM or PDF, though all output types will suffer from this issue given a
large enough map/topics.

The average user generating output from a map referencing a few hundred
topics may never hit this limit, particularly if those topics are "small".
However, it is not uncommon for documentation (for complex software) to
contain many hundreds or even thousands of topics and if those topics are
on the "large" side (= a few full "pages" of content) the 2Gb is easily
reached. If those topics also reference images the memory usage may also
increase, reducing the number of topics needed to hit the limit.

It may be useful to see how quickly this can occur so I am attaching a set
of VBScript files that can be executed on a Windows box to generate a "test
suite" for testing the limits of the OT. Instructions:

1. Save attached zip file and unzip to an EMPTY (!) folder.
2. Run the two VBS files. One generates a single map file containing 2000
topic references. The second generates the 2000 topics that are referenced
by the map. See readme.txt for descriptions on how to modify the VBS to
generate different numbers of topic files and references in the map file.
3. Set ant_opts to use as much memory as your system can dish out:
-Xmx####m (possibly also -Xms####m).
4. Generate output using the map you generated in step 2.
5. Monitor Java process memory and CPU usage using Windows Task Manager or
another monitor.

Repeat steps 2 to 5 after modifying the VBS files so that they generate a
larger map and more topics, or simply larger topics or both, then repeat
steps 3 and 4 until you hit the limit.

To truly test this as described you would need a machine with (about) a
minimum of 3Gb of RAM -- 1Gb for Windows and other miscellaneous processes
and 2Gb that can be dedicated to generating output.

We may wish to blame this on the OS, however, many users may not be able to
move to a 64bit version of Windows (or other OS) and so I believe it is
worth looking into ways to resolve this. That also assumes that the
7152Gb-per-process limit of a 64bit Windows release is enough (perhaps when
the OT supports output to MPEG movies that limit will be hit ;-) But
seriously, making this type of assumption is not good programming -
remember Y2K.

It is not known whether this affects other operating systems. This bug as
submitted is specifically for Windows.

Links:
http://msdn.microsoft.com/msdnmag/issues/01/12/XPKernel/
http://forum.java.sun.com/thread.jspa?threadID=553939&messageID=2712071


Brainstorming:

Break up the generation steps (somehow) so that each one runs in a separate
Java process (perhaps using a .bat file or some other method).

In theory, this might actually improve performance -- stopping one process
after it has completed its OT generation step before launching the next one
could free up any memory allocated to the first Java process. Right now the
single process just continues to grow, only freeing up memory when it has
completed.

Making this change might help other users of the toolkit that are not
worried about hitting any limit by speeding things up (this assumes that
addressing and finding things in such a large block of memory wastes time,
testing may be necessary in that case).

It might also reduce the number of people that need to fiddle with ant_opts
for allocating Java memory. This seems to be an issue that many users of
the OT run into (search a few forums and you'll find many questions about
this). It would be nice if this "just works" with the defaults for more
people and this might help.

Allowing each step in the generation process to complete might allow for
easier debugging (?)

I assume it would not be good to run any of the generation steps in
parallel as then one process would be "robbing" overall system memory from
other Java processes (or would it?)


Derek Read ( vancouverizer ) - 2007-04-02 20:56

5

Closed

None

Nobody/Anonymous

Java

None

Public


Comments ( 6 )

Date: 2007-11-30 06:33
Sender: imagiczhangProject Admin


Memory problem in xslt processor Java implementation is documented.
If we need to make it work, we can try on xslt processor C++
implementation.


Date: 2007-07-25 08:10
Sender: lzap


I tested the forking and it didnt help. It seems the memory leaks are not
the reason of the high memory requirements. But I have found a different
solution! Please read

http://tech.groups.yahoo.com/group/dita-users/message/6741


Date: 2007-07-24 14:12
Sender: rdandersonProject Admin


A comment related to this report and one possible fix was posted at
dita-users today:
http://tech.groups.yahoo.com/group/dita-users/message/6741


Date: 2007-07-20 02:27
Sender: imagiczhangProject Admin


I don't think it is a dita issue. XSLT is not designed to process huge
files. What can be done is to investigate how to prevent the issue and
utilize the memory in high efficiency.


Date: 2007-07-19 15:03
Sender: lzap


Hello, I am trying your scripts under JProfiler. Do you think its some
Dita issue that can be fixed? IMHO the memory requirements are a bit high,
we are in trouble processing much smaller XML files (less than 500 MB) --
the memory used by Dita goes over 2 GB.


Date: 2007-04-03 12:45
Sender: imagiczhangProject Admin


I agree that it is a good idea if we can fork a process during the
processing. Currently there is no data shared in memory between steps. So
there is no reason to put every step into one process.


Attached File ( 1 )

Filename Description Download
stressTest.zip VBS files for generating the test suite described Download

Changes ( 3 )

Field Old Value Date By
status_id Open 2007-11-30 06:33 imagiczhang
close_date - 2007-11-30 06:33 imagiczhang
File Added 223450: stressTest.zip 2007-04-02 20:56 vancouverizer