In this post I will highlight VTD-XML v2.13's whitespace handling capability along with some examples in Java.
Native to non-extractive parsing, VTD-XML's handling of XML tokens and elements frequently revolves around the concept of byte segments.Once an XML document is parsed into VTD tokens, the byte segment enveloping the entire content of any token or element can be visualized of as a pair of descriptors (i.e. offset and length) projecting into the original document.... read more
In case you are not aware, VTD-XML 2.11 and 2.12 are also available on Maven Repository. They are available at http://mvnrepository.com/artifact/com.ximpleware/vtd-xml
Unlike Sourceforge or Github, the maven repository hosts only snapshots of VTD-XML at the point of releases for only the Java platform. With sourceforge's CVS you got a complete, history of entire source base, as well as most up to the minute update, for the whole range of supported platforms.... read more
VTDGen has two main methods in version 2.12 that you can call to parse XML documents.
The first one is parse(), which accepts a boolean indicating the namespace awareness of the parsing operation. It throws a variety of exceptions, corresponding to various parsing errors, such as encoding errors, invalid entity reference, or name-space qualification errors , etc. You need to catch those exceptions in your code, and obtain the detailed diagnostic message about the nature of error. Parse() always works in conjunction with a pair of setDoc() methods, which either accepts a byte array containing the entire input XML, or a byte array and a pair of integers delimiting the segment in the byte array that contains the XML document. The maximum file size limit is 2 GB without namespace awareness, and 1 GB with. Also remember that you will need to manually read the file content into memory and the whole parsing takes about six to ten lines of code.... read more
In this post I am going to show you how to effectively remove comments from an XML document using the combination of XMLModifier and XPath. The input XML document looks like the following.
<clients>
<!-- some other code here -->
<function>
</function>
<function>
</function>
<function>
<name>data_values</name>
<variables>
<variable><!-- some other code here -->
<name>temp</name>
<!-- some other code here --> <type>double</type>
</variable>
</variables><!-- some other code here -->
<block><!-- some other code here -->
<opster>temp = 1</opster>
</block>
</function>
</clients>
... [read more](/p/vtd-xml/news/2016/05/how-to-remove-comment-nodes-from-an-xml-document/)
This blog shows an example of using vtd-xml 2.12's latest methods to remove the leading and trailing white spaces of the text nodes in an XML document.
This is the input document. And as you can easily see, ID's text nodes have long trailing white spaces.
<?xml version="1.0"?>
<ns:myOrder xmlns:ns="http://w3schools.com/BusinessDocument" xmlns:ct="http://something.com/CommonTypes">
<MessageHeader>
<ct:ID>i7 </ct:ID>
<ct:ID>i7 </ct:ID>
<ct:ID>i7 </ct:ID>
<ct:ID>i7 </ct:ID>
<ct:Name> Company Name </ct:Name>
</MessageHeader>
</ns:myOrder>
... [read more](/p/vtd-xml/news/2016/02/whitespace-trimming-in-212/)
VTD-XML 2.12 is released. To download the latest version, go to
https://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.12/
I was asked on stackoverflow the possible options available for VTD-XML to improve XML performance. Below is my answer that I think is useful in sharing with readers of this blog.
There are usually the following ways to optimize performance with VTD-XML:
VTD-XML full source repository is now available on Github (http://github.com/jzhang2004/vtd-xml). Every commit log is available and in the near all commits and check in should be in sync with CVS on sourceforge.
I recently came across an interesting paper by some researchers in Portugal. The topic of the paper is "PERFORMANCE ANALYSIS OF JAVA APIS FOR XML PROCESSING." In this paper, various XML Processing API are thoroughly bench-marked and compared. Those APIs include various flavors of DOM, SAX, PULL, JDOM and VTD-XML. Below is the abstract of the paper.
**ABSTRACT **
Over time, XML markup language has acquired a considerable importance in applications development, standards definition and in the representation of large volumes of data, such as databases. Today, processing XML documents in a short period of time is a critical activity in a large range of applications, which imposes choosing the most appropriate mechanism to parse XML documents quickly and efficiently. When using a programming language for XML processing, such as Java, it becomes necessary to use effective mechanisms, e.g. APIs, which allow reading and processing of large documents in appropriated manners. This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files.... read more
Version 2.11, simultaneously available in C, Java, C++, and C#, is the latest release of VTD-XML. So what is new? The shortly answer: (1) It is more standards-compliant by conforming strictly to XPath 1.0 spec's notion of node(). (2) It introduces major performance improvement for XPath expressions involving simple position index.(3)This release introduces major performance improvement for XPath expression containing complex predicates involving absolute location path expressions. (4) It also contains various bug releases as reported by VTD-XML users.... read more
VTD-XML is now released, i will add a blog talking about the major features/improvements in this release soon… The release can be downloaded from
http://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.11/
VTD-XML 2.10 is now released under Java, C#, C and C++. It can be downloaded at
https://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.10/. This release includes a number of new features and enhancement.
There are two ways to read all the attribute values of an element node.
The first one is to use XPath expression @* as in the example below
ap = new AutoPilot (vn);
ap.selectXPath(“@*”);
int i=-1;
while((i=ap.evalXPath())!=-1){
// i will be attr name, i+1 will be attribute value
}
The second is lighter weight, which is by directly using autoPilot’s selectAttr() and iterAttr()
ap = new AutoPilot(vn);
ap.selectAttr(“*”);
int i=-1;
while((i=ap.iterateAttr())!=-1){
// i will be attr name, i+1 will be attribute value
}
... [read more](/p/vtd-xml/news/2011/01/how-to-read-all-attributes-of-an-element-in-vtd-xml/)
I came across a recent blog in which the author benchmarks the performance of evaluating XPath using VTD-XML on a 20 MB and comparing it to JAXP. The result is a convincing 60X. Surprised? Don't be. The fact is that DOM and JAXP just have too much inherent issues (performance, memory usage etc). Below is the link to that blog
http://fahdshariff.blogspot.com/2010/08/faster-xpaths-with-vtd-xml.html... read more
VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit https://sourceforge.net/projects/vtd-xml/files/ to download the latest version.
* Strict Conformance
#VTD-XML now fully conforms to XML namespace 1.0 spec
* Performance Improvement
#Significantly improved parsing performance for small XML files
* Expand Core VTD-XML API
#Adds getPrefixString(), and toNormalizedString2()
* Cutting/Splitting
#Adds getSiblingElementFragment()
* A number of bug fixes and code enhancement including:
#Fixes a bug for reading very large XML documents on some platforms
#Fixes a bug in parsing processing instruction
#Fixes a bug in outputAndReparse()
VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit https://sourceforge.net/projects/vtd-xml/files/ to download the latest version.
Performance Improvement
This is a simple app that shuffles elements in an XML file. It uses XPath to address individual element then re-arrange and re-combine the fragments. Those fragments are identified by their offsets and lengths, both of which are obtained by calling VTDNav's getElementFragment().
Simply speaking, the lack of XPath support makes it very tedious, almost impossible, to re-arrange XML element fragments.... read more
VTD-XML 2.6 is released. It contains some of the latest bug fixes reported by users. Upgrade is recommended for all users.
* Added separate VTD indexing generating and loading (see http://vtd-xml.sf.net/persistence.html for further info)
* Integrated extended VTD supporting 256 GB doc (In Java only).
* Added duplicateNav() for replicate multiple VTDNav instances sharing XML, VTD and LC buffer (availabe in Java and C#).
* Various bug fixes and enhancements.
The Java version of extended VTD-XmL is released and available for download. This version supports 256 GB max file sizes and memory mapped capabilities. The updated documentation is
also available for download. In short, you can basically do full XPath query on documents that are
bigger than memory space available on your machine.
VTD-XML, the next geneation, document-centric XML processing model is now released. Please visit http://sourceforge.net/project/showfiles.php?group_id=110612
VTD-XML 2.3 is now released. Below is a list of new features and enhancements in this version.
* VTDException is now introduced as the root class for all other VTD-XML's exception classes (per suggestion of Max Rahder).
* Transcoding capability is now added for inter-document cut and paste. You can cut a chuck of bytes in a UTF-8 encoded document and paste it into a UTF-16 encoded document and the output document is still well-formed.... read more
New Article "Manipulate XML Content the Ximple Way" published on DevX