I think there is a streaming problem(?) related to how Saxon handles writing
to databases.
I am processing two huge files, one 2GB and another 30GB. Initially, I had
written code Saxon/XSLT that processes the source XML files and outputs the
results to csv files. This worked fine for both files - meaning that the
code was well written using all the necessary streaming related code etc to
enable processing of huge files. In both cases, I didn't even have to use
the -Xmx java switch to allocate more memory to the JVM.
I am now processing the same files but with the Saxon/XSLT code modified to
output to a MySQL database. I am always getting a "Exception in thread
"main" java.lang.OutOfMemoryError: Java heap space" error. This error occurs
when its building a tree; specifically, the heap error occurs after this
statement has been displayed:
"Building tree for file:<file path> using class
net.sf.saxon.tree.TreeBuilder"
(I have set it to use linked instead of tiny trees though none of the two
options seem to make a difference)
The only way I could make it work was to use the -Xmx switch to allocate
alot more memory to the JVM. For the 30GB file, I set it at a very large
value (30000M); it managed to build the tree write out records but it was
extremely slow.
Questions:
1. Does Saxon use different techniques to build the trees depending on what
type of output it has to generate?
2. Is streaming disabled when output is a database as opposed to csv files?
Why didn't I need to use the -Xmx switch when outputting to csv files?
Michael Kay wrote:
>
> How big is "huge"? For some people it's 100Mb, for others it's 100Gb.
>
> There are two facilities now in Saxon for handling files that are too
> large
> to fit in memory:
>
> (1) "Streaming of large documents" in XSLT - see
> http://www.saxonica.com/documentation/sourcedocs/serial.html
>
> (2) "Document projection" in XQuery - see
> http://www.saxonica.com/documentation/sourcedocs/projection.html
>
> In both cases the feature only works for certain kinds of processing, so
> it
> depends on the exact nature of the XML.
>
> But if "huge" means say 200Mb, then it might be that all you need to do is
> to set the right options to allocate memory to Java.
>
> I can advise you at the coding level if you're having difficulty using
> particular Saxon features, but if you need help at the design level, or if
> you need help with the implementation, then please contact me off-list to
> see if we can organize some consultancy arrangement.
>
> Regards,
>
> Michael Kay
> http://www.saxonica.com/
>
>
> _____
>
> From: saxon-help-bounces@...
> [mailto:saxon-help-bounces@...] On Behalf Of Julio de la
> Vega
> Sent: 17 December 2007 10:15
> To: saxon-help@...
> Subject: [saxon] Question about XSLT >> Huge Files
>
>
>
> Hi *,
>
>
>
> First of all, thank you for you time and help.
>
>
>
> I am working in a case where I have to transform an XML input in a more
> complex XML output. I have developed an XSLT that creates my XML output
> according to my requirements. My problems have started when I have begun
> to
> run huge xml input files because of problems of memory.
>
> I have found information that makes me think than you can help me to solve
> my issue.
>
>
>
> Could you please give me an overview about how could I solve this case?
>
>
>
> Please do not hesitate to contact me if you need more information
>
>
>
> Thanks again
>
>
>
> Best Regards
>
>
>
> Julio de la Vega
>
>
> -------------------------------------------------------------------------
> SF.Net email is sponsored by:
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services
> for just about anything Open Source.
> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
> _______________________________________________
> saxon-help mailing list
> saxon-help@...
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>
>
--
View this message in context: http://www.nabble.com/Question-about-XSLT-%3E%3E-Huge-Files-tp14373477p21559911.html
Sent from the saxon-help mailing list archive at Nabble.com.
|