I'm getting an out of memory error when I use XpathExtract on a somewhat large file (9 mb). Also, I wrote a couple of new pipeline stages (within a new module) so that I can do OpenPGP encryption and decryption. The file mentioned above starts out as a PGP encrypted file (366 kb), and when decrypted the result is a 9mb file (in this particular case...the file sizes vary). Anyway, in this case, I am also getting an out of memory error. My machine has 1.5 GB RAM.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I solved the decryption problem by changing my code a little. For the XpathExtract, I'm thinking of writing a stage that reads the doc with a SAX parser and uses callbacks (extend DefaultHandler, etc) to get the data I need for further processing.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
XSL, which I gather the XPath API uses under the covers, typically needs 8-10 times the memory to process an XML document of given size, ie if the doc is 200K, you need 2MB to XSL it.
Sherman
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Unfortunately processing monstro XML documents is a memory intensive job with Babeldoc which converts the input document into a DOM object. If this is an issue, take a look at cocoon which propagates SAX events which is more memory efficient.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a 9mb XML document that I am able to transform. I then need to send it to a flat file mapper (which happens to be babelblaster). I wrote an interface to babelblaster from a pipeline stage. I am able to take the input stream from the pipeline document and create a flat file via babelblaster. Babelblaster represents the new flat file in a BufferedOutputStream (which I have initialized with a ByteArrayOutputStream).
I need to replace the current document in the pipeline with this new (flat file document). It's at this point that I get an out of memory error.
Here are the steps of my process:
Stage 1:
Enter with a 363 kb PGP encrypted file.
Stage 2:
decrypt the file, which produces an 8.5 mb XML file
Stage 3:
transform the XML to another XML file, which results in a 6.5 mb file
Stage 4:
Take the new XML and convert it to a flat file
After the conversion, make the new flat file the pipeline file (out of memory error here for larger files. Is this because I still have the large pipeline document around, as well as the large byte array output?)
Stage 5:
PGP encrypt the flat file
Stage 6:
write it somewhere
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm getting an out of memory error when I use XpathExtract on a somewhat large file (9 mb). Also, I wrote a couple of new pipeline stages (within a new module) so that I can do OpenPGP encryption and decryption. The file mentioned above starts out as a PGP encrypted file (366 kb), and when decrypted the result is a 9mb file (in this particular case...the file sizes vary). Anyway, in this case, I am also getting an out of memory error. My machine has 1.5 GB RAM.
I solved the decryption problem by changing my code a little. For the XpathExtract, I'm thinking of writing a stage that reads the doc with a SAX parser and uses callbacks (extend DefaultHandler, etc) to get the data I need for further processing.
XSL, which I gather the XPath API uses under the covers, typically needs 8-10 times the memory to process an XML document of given size, ie if the doc is 200K, you need 2MB to XSL it.
Sherman
Another way to increase the amount of memory available to Java by using the BABELDOC_OPTS environment variable. So in Unix you could:
set BABELDOC_OPTS=-Xmx50MB
export BABELDOC_OPTS
Unfortunately processing monstro XML documents is a memory intensive job with Babeldoc which converts the input document into a DOM object. If this is an issue, take a look at cocoon which propagates SAX events which is more memory efficient.
I have a 9mb XML document that I am able to transform. I then need to send it to a flat file mapper (which happens to be babelblaster). I wrote an interface to babelblaster from a pipeline stage. I am able to take the input stream from the pipeline document and create a flat file via babelblaster. Babelblaster represents the new flat file in a BufferedOutputStream (which I have initialized with a ByteArrayOutputStream).
I need to replace the current document in the pipeline with this new (flat file document). It's at this point that I get an out of memory error.
Here are the steps of my process:
Stage 1:
Enter with a 363 kb PGP encrypted file.
Stage 2:
decrypt the file, which produces an 8.5 mb XML file
Stage 3:
transform the XML to another XML file, which results in a 6.5 mb file
Stage 4:
Take the new XML and convert it to a flat file
After the conversion, make the new flat file the pipeline file (out of memory error here for larger files. Is this because I still have the large pipeline document around, as well as the large byte array output?)
Stage 5:
PGP encrypt the flat file
Stage 6:
write it somewhere