For a file of 684Mb I think you should look seriously at the options for processing it in streaming mode, see
 
http://www.saxonica.com/documentation/sourcedocs/serial.html
 
Also, document projection might be of use:
 
http://www.saxonica.com/documentation/sourcedocs/projection.html
 
It might be possible to build a tree for this in memory, but it's pushing it - it depends how much memory you have available; the usual requirement is about 4x the raw document size. I haven't actually done much measurement of the memory requirements on the .NET side.
 
Stripping whitespace text nodes can often give you a useful improvement in both space and speed.
 
Michael Kay
http://www.saxonica.com/


From: Stephen Caffo [mailto:steve@mailbranch.com]
Sent: 04 December 2008 22:20
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] Line numbering for XdmNode

Ok, I’ll try that.  Also, what is the proper way to load a large xml file for transformation?  I get out of System.OutOfMemory error using the code below.  The file is 684 mb.

 

Saxon.Api.Processor _SaxonProcessor = new Saxon.Api.Processor();

_SaxonProcessor.Implementation.setAllNodesUntyped(true);

_SaxonProcessor.Implementation.setLineNumbering(true);

 

Saxon.Api.DocumentBuilder _SaxonDoc = _SaxonProcessor.NewDocumentBuilder();

_SaxonDoc.BaseUri = new Uri("http://tempuri.org/");

_SaxonDoc.IsLineNumbering = true;

 

Saxon.Api.XdmNode saxonNode saxonNode = _SaxonDoc.Build(XMLFile);

 

 

From: Michael Kay [mailto:mike@saxonica.com]
Sent: Wednesday, December 03, 2008 4:47 AM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] Line numbering for XdmNode

 

Line numbers are only included in a document if it's built using a SAX parser (or a pull parser) that supplies line number information. They aren't available for documents created as the output of a transformation or query, or constructed from a DOM. In theory you could insert a step into the processing pipeline that computes and supplies line numbers, but it sounds as though there are easier ways of doing what you want. Have you considered <xsl:number/>?

 

Michael Kay

http://www.saxonica.com/

 


From: Stephen Caffo [mailto:steve@mailbranch.com]
Sent: 02 December 2008 19:59
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: [saxon] Line numbering for XdmNode

How can I use line numbering for an XDMNode?

 

I have something like the following:

 

 

Saxon.Api.XdmNode saxonNode;

Saxon.Api.XdmDestination results = new Saxon.Api.XdmDestination();

saxonNode = _SaxonDoc.Build(XMLFile);

 

transformer.Run(results);

transformer2.InitialContextNode = results.XdmNode;

results.Reset();

transformer2.Run(results);

 

 

 

Except I need line numbering on for the second transformation.  Do I have to pass my results.XDMNode back to the DocumentBuilder and if so what’s the best way to do that?

 

A little context: I’m using the saxon line number function to generate “unique” ids for elements.  I need unique integers (unique only for that element type for that file) and figured that was the best way to do it.

 

Steve