Hi Andrea,


Thank you for the solution. We’ve increased the heap memory to 2Gb and the filter-media is working fine now.


Thanks and Regards,


SP Library


From: Andrea Schweer [mailto:schweer@waikato.ac.nz]
Sent: 04 February 2013 11:33
To: dspace-tech@lists.sourceforge.net
Cc: Melisa Ong
Subject: Re: [Dspace-tech] Media filter for powerpoint file - java.lang.OutOfMemoryError


Hi Melisa,

On 04/02/13 15:59, Melisa Ong wrote:

When we run the filter-media function, we kept running into this out of memory error for some powerpoint files.

Has anyone encountered this error when running the filter-media for powerpoint files? 

Have you tried increasing the heap memory for the media filter? Assuming that you're running DSpace 1.7 or higher, you can change this in the [dspace]/bin/dspace file:

Also it tends to halt the filter-media after the error is encountered. Is there a way to allow the command to detect(or prompt) the file/s with error and then continue to run for other files?

I don't think there is -- and especially not once the process has crashed due to lack of heap memory.

You could use the --skip option for items that you know are problematic, but even that doesn't sound doable for a large repository.

There's a somewhat ugly workaround though. The filter-media job will skip over items that it has already processed; it considers an item to have been processed when there is an appropriately-named text file in the TEXT bundle.. So you could just create an empty file with the appropriate name and put it into the item's TEXT bundle (that is, if your PPT file is called slides.ppt, call the empty file slides.ppt.txt). The filter-media task should then no longer try to extract text from that PowerPoint file.

My first step would be to increase the heap memory and see if that makes the media filter work for all existing items.


Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand