In this post I will highlight VTD-XML v2.13's whitespace handling capability along with some examples in Java.
Native to non-extractive parsing, VTD-XML's handling of XML tokens and elements frequently revolves around the concept of byte segments.Once an XML document is parsed into VTD tokens, the byte segment enveloping the entire content of any token or element can be visualized of as a pair of descriptors (i.e. offset and length) projecting into the original document.
For a large class of XML content extraction and modification operations, non-extractive parsing allows applications to circumvent the tedious, cycle-wasting tasks of de-serializing and re-serializing byte content of elements, and thereby help achieving maximum performance possible.
Version 2.12 of VTD-XML introduces two new methods that help either trim or expand the surrounding white spaces of byte
segments denoted by 64-bit integers.
It is worth noting that both methods are greedy: they will remove/expand as many white spaces as they possibly can. Furthermore, you can make the observation that the effect of one call often negates the other.
Three static constants and three more methods are added to VTDNav class in 2.13.
Those constants are:
The two new methods are:
Suppose you want to remove some element fragments from the master XML document, but you want the remaining XML text to retain the orginal format, or make slight, fine granular changes to it (ex. paragraph separation, indentation). You can also extract out
a segment of XML bytes without losing its surrounding formatting line breaks or tabs.
<root> <name>suresh</name> <address>Address</address> </root>
Consider the following example taken from a Q/A thread on StackOverflow web site
(http://stackoverflow.com/questions/36972163/vtd-xmlremoving-the-spaces-after-removing-the-element)
With 2.12, if you want to remove the "<name>suresh</name>" fragment using the following code, you will end up with an XML document
import com.ximpleware.*; import java.io.*; public class testExpandSpace { public static void main(String[] args) throws VTDException,IOException{ // TODO Auto-generated method stub VTDGen vg = new VTDGen(); AutoPilot ap = new AutoPilot(); XMLModifier xm = new XMLModifier(); if (!vg.parseFile("d://xml//testSuresh.xml",false)) return; VTDNav vn=vg.getNav(); ap.bind(vn); xm.bind(vn); ap.selectXPath("//name"); int index=-1; while((index=ap.evalXPath())!=-1) { System.out.println(" ===> "+vn.toString(index) +"===>"); long elementFragment=vn.getElementFragment(); xm.remove(vn.expandWhiteSpaces(elementFragment)); } xm.output("d://xml//test1111.xml"); } } <root><address>Address</address> </root>
With 2.13, you can trim off only the trailing space from the "name" fragment before removing it, thereby maintaining the desirable output
indentation
import com.ximpleware.*; import java.io.*; public class testExpandSpace { public static void main(String[] args) throws VTDException,IOException{ // TODO Auto-generated method stub VTDGen vg = new VTDGen(); AutoPilot ap = new AutoPilot(); XMLModifier xm = new XMLModifier(); if (!vg.parseFile("d://xml//testSuresh.xml",false)) return; VTDNav vn=vg.getNav(); ap.bind(vn); xm.bind(vn); ap.selectXPath("//name"); int index=-1; while((index=ap.evalXPath())!=-1) { System.out.println(" ===> "+vn.toString(index) +"===>"); long elementFragment=vn.getElementFragment(); xm.remove(vn.expandWhiteSpaces(elementFragment,VTDNav.WS_TRAILING)); } xm.output("d://xml//test1111.xml"); } }
Below is the output with desirable format.
<root> <address>Address</address> </root>