Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#195 OutOfMemoryError parsing a largish text chunk

undecided
closed-fixed
Parsing (33)
5
2014-06-15
2007-12-16
Attila Szegedi
No

As reported by Mirko Nasato on the freemarker-user list:

Hi all,

I'm processing a 5.7M XML template through FreeMarker (version 2.3.11). With -Xmx128m I get an OutOfMemoryError

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuffer.append(StringBuffer.java:224)
at freemarker.core.FMParser.PCData(FMParser.java:2497)
at freemarker.core.FMParser.Content(FMParser.java:2570)
at freemarker.core.FMParser.OptionalBlock (FMParser.java:2784)
at freemarker.core.FMParser.Root(FMParser.java:2956)

with -Xmx192m it works.

The template does not actually contain any FreeMarker instructions (it's an ad hoc test to reproduce the problem). If I put a random expression like ${foo!} in the middle of the document then it works with less memory, -Xmx96m is enough.

So I guess the problem may be with big chunks of text without any expressions (TextBlock?). But the whole file being 5.7M, why does it need more than 128M to do the parsing?

Thanks

Mirko

Discussion

  • Mirko Nasato
    Mirko Nasato
    2007-12-16

    Logged In: YES
    user_id=1812940
    Originator: NO

    The file I used in my test can be downloaded here:

    http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.odt

    Unzip the ODT (an OpenDocument Text file is basically a ZIP archive): the "content.xml" is ~5.7M.

     
  • Logged In: NO

    As a proof of concept (FMParser.java being generated by JavaCC anyway) the following patch will take the memory requirement down to 48m by splitting the text chunk in smaller pieces if it exceeds 512k:

    @@ -2501,6 +2501,9 @@
    } else {
    break label_17;
    }
    + if (buf.length() > 512 * 1024) {
    + break label_17;
    + }
    }
    if (stripText && contentNesting == 1)
    {if (true) return TextBlock.EMPTY_BLOCK;}

    Incidentally, Velocity also seems to require 48m to parse the same template, throwing OutOfMemoryException with -Xmx32m.

     
  • Attila Szegedi
    Attila Szegedi
    2007-12-28

    Logged In: YES
    user_id=52489
    Originator: YES

    I've digged around with a profiler, and found out that after parsing, the template will actually take 25.5 MB. Of this, Template.lines will take up 14.26 MB, and rootElement and below will take 11.3 MB. Actually, the 11.3 MB part is okay, as Java stores each character on 2 bytes, so 5.7 MB UTF-8 encoded file does expand to about 11.4 MB. I'm worried about "lines" though -- it should be possible to implement this thing with a single String for the source of the template with various String objects being its substrings (and thus, using a single char[] underneath).

    As for high memory requirements during the parsing, I've captured a heap snapshot when OOME was thrown with -Xmx64M, and it shows 31MB of char[] objects (551554 instances), 19MB of freemarker.core.Token objects (504994 instances), and 13MB of java.lang.String objects (551530 instances). Mind you, these are the *shallow* sizes of these objects. The parser indeed creates a huge bunch of tokens in a linked list (in this case, half a million before running out of memory), and each one of them has its own String object, and its own char[] object.

    I'm not yet sure what can I do to remedy this. It's worth exploring a bit whether this is a real world problem -- I mean, how often do you get this large templates? OTOH, I definitely feel that we should try to lower the memory requirement for the already parsed template somehow.

     
  • Mirko Nasato
    Mirko Nasato
    2007-12-28

    Logged In: YES
    user_id=1812940
    Originator: NO

    Thanks for investigating Attila.

    Let me clarify that this IS a real world problem. I wouldn't be spending time on it otherwise. ;-)

    In my project the template is an arbitrary OpenDocument Text file; users (read: not programmers) create their own templates and they may insert just a few expressions at the beginning of a huge document, while the application still needs to parse the whole file.

    I think there is definitely a problem with parsing. The problem is not just that 128m are not enough for that particular file, the point is that memory usage grows exponentially with the text chunk size (because of the way StringBuffer expands its capacity I think), rather than being kept under control.

    As shown, by manually patching FMParser.java it the sample document can be parsed with -Xmx48m rather than 192m.

    I'll try to help further, but I'm not familiar with JavaCC so it could take me a while.

     
  • Attila Szegedi
    Attila Szegedi
    2008-04-03

    Logged In: YES
    user_id=52489
    Originator: YES

    Jonathan provided a fix for this. It doesn't bring down the memory usage to 48M as your manual parser hack did, but it is completely self-contained in the JavaCC file, and brings the memory usage down to 61M, which I think we can live with. This will go in 2.3.13, but in case you're impatient, you can grab the automatic build from here:

    http://freemarker.org:8085/download/FM-BRANCH23/artifacts/build-60/Library/freemarker.jar

     
  • Attila Szegedi
    Attila Szegedi
    2008-04-03

    • status: open --> open-fixed
     
  • Logged In: YES
    user_id=33187
    Originator: NO

    Since my latest bit of hackery, this example works with 55M of heap. It still seems outrageously profligate, I have to admit, but I guess this is typical of java apps. People running out of memory does not, as a practical matter, seem like a very big problem out there, so with a better than 3x improvement, I guess we could close this. The real solution, I guess, to using very big template files would be to look into the memory mapped file stuff in java.nio.*. Beyond a certain file size threshold (maybe configurable) the Template.templateText buffer could become a memory mapped file. The door is open to try that, but I don't have the feeling that it is too worthwhile. The memory usage isn't a real problem for hardly anybody, certainly not in the typical ways that people use the tool.

     
  • Mirko Nasato
    Mirko Nasato
    2008-04-19

    Logged In: YES
    user_id=1812940
    Originator: NO

    Great thanks!

     
    • status: open-fixed --> closed-fixed
    • Group: --> undecided