Tracker: Bugs

5 OutOfMemoryError parsing a largish text chunk - ID: 1851842
Last Update: Comment added ( mnasato )

As reported by Mirko Nasato on the freemarker-user list:

Hi all,

I'm processing a 5.7M XML template through FreeMarker (version 2.3.11). With -Xmx128m I get an OutOfMemoryError

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuffer.append(StringBuffer.java:224)
at freemarker.core.FMParser.PCData(FMParser.java:2497)
at freemarker.core.FMParser.Content(FMParser.java:2570)
at freemarker.core.FMParser.OptionalBlock (FMParser.java:2784)
at freemarker.core.FMParser.Root(FMParser.java:2956)

with -Xmx192m it works.

The template does not actually contain any FreeMarker instructions (it's an ad hoc test to reproduce the problem). If I put a random expression like ${foo!} in the middle of the document then it works with less memory, -Xmx96m is enough.

So I guess the problem may be with big chunks of text without any expressions (TextBlock?). But the whole file being 5.7M, why does it need more than 128M to do the parsing?

Thanks

Mirko


Attila Szegedi ( szegedia ) - 2007-12-16 05:20:05 PST

5

Open

Fixed

Attila Szegedi

Parsing

None

Public


Comments ( 7 )

Date: 2008-04-19 03:27:37 PDT
Sender: mnasato


Great thanks!


Date: 2008-04-18 14:01:53 PDT
Sender: revuskyProject AdminAccepting Donations


Since my latest bit of hackery, this example works with 55M of heap. It
still seems outrageously profligate, I have to admit, but I guess this is
typical of java apps. People running out of memory does not, as a practical
matter, seem like a very big problem out there, so with a better than 3x
improvement, I guess we could close this. The real solution, I guess, to
using very big template files would be to look into the memory mapped file
stuff in java.nio.*. Beyond a certain file size threshold (maybe
configurable) the Template.templateText buffer could become a memory mapped
file. The door is open to try that, but I don't have the feeling that it is
too worthwhile. The memory usage isn't a real problem for hardly anybody,
certainly not in the typical ways that people use the tool.


Date: 2008-04-03 02:45:11 PDT
Sender: szegediaProject AdminAccepting Donations


Jonathan provided a fix for this. It doesn't bring down the memory usage to
48M as your manual parser hack did, but it is completely self-contained in
the JavaCC file, and brings the memory usage down to 61M, which I think we
can live with. This will go in 2.3.13, but in case you're impatient, you
can grab the automatic build from here:

http://freemarker.org:8085/download/FM-BRANCH23/artifacts/build-60/Library/freemarker.jar



Date: 2007-12-28 01:54:04 PST
Sender: mnasato


Thanks for investigating Attila.

Let me clarify that this IS a real world problem. I wouldn't be spending
time on it otherwise. ;-)

In my project the template is an arbitrary OpenDocument Text file; users
(read: not programmers) create their own templates and they may insert just
a few expressions at the beginning of a huge document, while the
application still needs to parse the whole file.

I think there is definitely a problem with parsing. The problem is not just
that 128m are not enough for that particular file, the point is that memory
usage grows exponentially with the text chunk size (because of the way
StringBuffer expands its capacity I think), rather than being kept under
control.

As shown, by manually patching FMParser.java it the sample document can be
parsed with -Xmx48m rather than 192m.

I'll try to help further, but I'm not familiar with JavaCC so it could take
me a while.



Date: 2007-12-27 22:50:37 PST
Sender: szegediaProject AdminAccepting Donations


I've digged around with a profiler, and found out that after parsing, the
template will actually take 25.5 MB. Of this, Template.lines will take up
14.26 MB, and rootElement and below will take 11.3 MB. Actually, the 11.3
MB part is okay, as Java stores each character on 2 bytes, so 5.7 MB UTF-8
encoded file does expand to about 11.4 MB. I'm worried about "lines" though
-- it should be possible to implement this thing with a single String for
the source of the template with various String objects being its substrings
(and thus, using a single char[] underneath).

As for high memory requirements during the parsing, I've captured a heap
snapshot when OOME was thrown with -Xmx64M, and it shows 31MB of char[]
objects (551554 instances), 19MB of freemarker.core.Token objects (504994
instances), and 13MB of java.lang.String objects (551530 instances). Mind
you, these are the *shallow* sizes of these objects. The parser indeed
creates a huge bunch of tokens in a linked list (in this case, half a
million before running out of memory), and each one of them has its own
String object, and its own char[] object.

I'm not yet sure what can I do to remedy this. It's worth exploring a bit
whether this is a real world problem -- I mean, how often do you get this
large templates? OTOH, I definitely feel that we should try to lower the
memory requirement for the already parsed template somehow.


Date: 2007-12-16 14:11:25 PST
Sender: nobody

Logged In: NO

As a proof of concept (FMParser.java being generated by JavaCC anyway) the
following patch will take the memory requirement down to 48m by splitting
the text chunk in smaller pieces if it exceeds 512k:

@@ -2501,6 +2501,9 @@
} else {
break label_17;
}
+ if (buf.length() > 512 * 1024) {
+ break label_17;
+ }
}
if (stripText && contentNesting == 1)
{if (true) return TextBlock.EMPTY_BLOCK;}

Incidentally, Velocity also seems to require 48m to parse the same
template, throwing OutOfMemoryException with -Xmx32m.


Date: 2007-12-16 06:08:21 PST
Sender: mnasato


The file I used in my test can be downloaded here:

http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.odt

Unzip the ODT (an OpenDocument Text file is basically a ZIP archive): the
"content.xml" is ~5.7M.


Attached File

No Files Currently Attached

Change ( 1 )

Field Old Value Date By
resolution_id None 2008-04-03 02:45:11 PDT szegedia