Menu

VTD-XML: The Future of XML Processing / Blog: Recent posts

White Space Handling in 2.13

In this post I will highlight VTD-XML v2.13's whitespace handling capability along with some examples in Java.

Quick Review

Native to non-extractive parsing, VTD-XML's handling of XML tokens and elements frequently revolves around the concept of byte segments.Once an XML document is parsed into VTD tokens, the byte segment enveloping the entire content of any  token or element can be visualized of as a pair of descriptors (i.e. offset and length) projecting into the original document.... read more

Posted by SourceForge Robot 2016-06-27

Maven Repository

In case you are not aware, VTD-XML 2.11 and 2.12 are also available on Maven Repository. They are available at http://mvnrepository.com/artifact/com.ximpleware/vtd-xml

Unlike Sourceforge or Github, the maven repository hosts only snapshots of VTD-XML at the point of releases for only the Java platform. With  sourceforge's CVS you got a complete, history of entire source base, as well as most up to the minute update, for the whole range of supported platforms.... read more

Posted by SourceForge Robot 2016-06-02

ParseFile vs Parse: A Quick Comparison

VTDGen has two main methods in version 2.12 that you can call to parse XML documents.

The first one is parse(), which accepts a boolean indicating the namespace awareness of the parsing operation. It throws a variety of exceptions, corresponding to various parsing errors, such as encoding errors, invalid entity reference, or name-space qualification errors , etc. You need to catch those exceptions in your code, and obtain the detailed diagnostic message about the nature of error. Parse() always works in conjunction with a pair of setDoc() methods, which either accepts a byte array containing the entire input XML, or a byte array and a pair of integers delimiting the segment in the byte array that contains the XML document. The maximum file size limit is 2 GB without namespace awareness, and 1 GB with. Also remember that you will need to manually read the file content into memory and the whole parsing takes about six to ten lines of code.... read more

Posted by SourceForge Robot 2016-06-02

How to remove comment nodes from an XML document?

In this post I am going to show you how to effectively remove comments from an XML document using the combination of XMLModifier and XPath. The input XML document looks like the following.

<clients>
<!-- some other code here -->

<function>
</function>

<function>
</function>

<function>
<name>data_values</name>
<variables>
<variable><!-- some other code here -->
<name>temp</name>
<!-- some other code here --> <type>double</type>
</variable>
</variables><!-- some other code here -->
<block><!-- some other code here -->
<opster>temp = 1</opster>
</block>
</function>
</clients>
... [read more](/p/vtd-xml/blog/2016/05/how-to-remove-comment-nodes-from-an-xml-document/)
Posted by SourceForge Robot 2016-05-02

Whitespace Trimming in 2.12

This blog shows an example of using vtd-xml 2.12's latest methods to remove the leading and trailing white spaces of the text nodes in an XML document.

 

This is the input document. And as you can easily see, ID's text nodes have long trailing white spaces.

<?xml version="1.0"?>
<ns:myOrder xmlns:ns="http://w3schools.com/BusinessDocument" xmlns:ct="http://something.com/CommonTypes">
    <MessageHeader>
        <ct:ID>i7         </ct:ID>
        <ct:ID>i7         </ct:ID>
        <ct:ID>i7         </ct:ID>
        <ct:ID>i7         </ct:ID>
        <ct:Name> Company Name    </ct:Name>
    </MessageHeader>
</ns:myOrder>

... [read more](/p/vtd-xml/blog/2016/02/whitespace-trimming-in-212/)
Posted by SourceForge Robot 2016-02-05

VTD-XML 2.12 Released

VTD-XML 2.12 is released. To download the latest version, go to

https://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.12/


link

Posted by SourceForge Robot 2015-12-21

The many ways that vtd-xml can help you optimize the performance of your applications

I was asked on stackoverflow the possible options available for VTD-XML to improve XML performance. Below is my answer that I think is useful in sharing with readers of this blog.

There are usually the following ways to optimize performance with VTD-XML:

  1. White space option- You can ask VTDGen to ignore or retain trivial white space characters. By default, VTDGen throws away those trivial white spaces. The difference is mainly in memory usage.
  2. Buffer reuse- You can ask VTDGen to reuse VTD buffers for the next parsing task. Otherwise, by default, VTDGen will allocate new buffer for each parsing run. This optimization technique is most useful if you are processing similar sized XML file, so that the VTD buffer page size remains unchanged across consecutive parsing runs.
  3. Adjust LC level- By default, it is 3. But you can set it to 5. When your XML are deeply nested, setting LC level to 5 results in better XPath performance. But it increases memory usage and parsing time very slightly.
  4. Reuse XPath: Compiling/selecting XPath is a relatively slow operation, especially when you run XPath expression over many small files. The key is to take any AutoPilot.selectXPath() out of loops and reuse them by calling ap.resetXPath().
  5. Use VTD+XML indexing- Instead of parsing XML files at the time of processing request, you can pre-index your XML into VTD+XML format and dump them on disk. When the processing request commences, simply load VTD+xml in memory and voila, parsing is no longer needed!!
  6. The overwrite feature aka. data templating- Because VTD-XML retains XML in memory as is, you can actually create a template XML file (pre-indexed in vtd+xml) whose value fields are left blank and let your app fill in the blank, thus creating XML data that never need to be parsed.... read more
Posted by SourceForge Robot 2015-10-12

VTD-XML Repository Available on GitHub

VTD-XML full source repository is now available on Github (http://github.com/jzhang2004/vtd-xml). Every commit log is available and in the near all commits and check in should be in sync with CVS on sourceforge.


link

Posted by SourceForge Robot 2015-10-11

An interesting paper on vtd-xml performance vs other XML parsers

I recently came across an interesting paper by some researchers in Portugal. The topic of the paper is "PERFORMANCE ANALYSIS OF JAVA APIS FOR XML PROCESSING." In this paper, various XML Processing API are thoroughly bench-marked  and compared. Those APIs include various flavors of DOM, SAX, PULL, JDOM and VTD-XML. Below is the abstract of the paper.

**ABSTRACT **

Over time, XML markup language has acquired a considerable importance in applications development, standards definition and in the representation of large volumes of data, such as databases. Today, processing XML documents in a short period of time is a critical activity in a large range of applications, which imposes choosing the most appropriate mechanism to parse XML documents quickly and efficiently. When using a programming language for XML processing, such as Java, it becomes necessary to use effective mechanisms, e.g. APIs, which allow reading and processing of large documents in appropriated manners. This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files.... read more

Posted by SourceForge Robot 2015-09-30

What is New in 2.11

Version 2.11, simultaneously available  in C, Java, C++, and C#, is the latest release of VTD-XML. So what is new? The shortly answer: (1) It is more standards-compliant by conforming strictly to XPath 1.0 spec's notion of node(). (2) It  introduces major performance improvement for XPath expressions involving simple position index.(3)This release introduces major performance improvement for XPath expression containing complex predicates involving  absolute location path expressions. (4) It also contains various bug releases as reported by VTD-XML users.... read more

Posted by SourceForge Robot 2012-10-15

VTD-XML 2.11 Released

VTD-XML is now released, i will add a blog talking about the major features/improvements in this release soon… The release can be downloaded from

http://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.11/


link

Posted by SourceForge Robot 2012-09-26

VTD-XML 2.10 Released

VTD-XML 2.10 is now released under Java, C#, C and C++. It can be downloaded at
https://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.10/. This release includes a number of new features and enhancement.

  • The core API of VTD-XML has been expanded. Users can now perform cut/paste/insert on an empty element.
  • This release also adds the support of deeper location cache support for parsing and indexing. This feature is useful for application performance  tuning for processing various XML documents.
  • The java version also added support for processing zip and gzip files. Direct processing of httpURL based XML is enhanced.
  • Extended Java version now support Iso-8859-10~16 encoding.
  • A full featured C++ port is released.
  • C version of VTD-XML now make use of thread local storage to address the  thread-safety issue for multi-threaded application.... read more
Posted by SourceForge Robot 2011-03-01

How to Read All Attributes of an Element in VTD-XML?

There are two ways to read all the attribute values of an element node.

The first one is to use XPath expression  @* as in the example below

  ap = new AutoPilot (vn);
  ap.selectXPath(“@*”);
  int i=-1;
while((i=ap.evalXPath())!=-1){
      // i will be attr name, i+1 will be attribute value
   }

The second is lighter weight, which is by directly using autoPilot’s selectAttr() and iterAttr()

ap = new AutoPilot(vn);
ap.selectAttr(*);
int i=-1;
while((i=ap.iterateAttr())!=-1){
 // i will be attr name, i+1 will be attribute value
}
... [read more](/p/vtd-xml/blog/2011/01/how-to-read-all-attributes-of-an-element-in-vtd-xml/)
Posted by SourceForge Robot 2011-01-10

60x? That sounds just right.

I came across a recent blog in which the author benchmarks the performance of evaluating XPath using VTD-XML on a 20 MB and comparing it to JAXP. The result is a convincing 60X. Surprised? Don't be. The fact is that DOM and JAXP just have too much inherent issues (performance, memory usage etc). Below is the link to that blog

http://fahdshariff.blogspot.com/2010/08/faster-xpaths-with-vtd-xml.html... read more

Posted by SourceForge Robot 2010-08-27

VTD-XML 2.9 Released

VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit  https://sourceforge.net/projects/vtd-xml/files/ to download the latest version.

  • Strict Conformance
    • VTD-XML now fully conforms to XML namespace 1.0 spec
  • Performance Improvement

    • Significantly improved parsing performance for small XML files
    • Expand Core VTD-XML API 
    • Adds getPrefixString(), and toNormalizedString2()
    • Cutting/Splitting
    • Adds getSiblingElementFragment() 
    • A number of bug fixes and code enhancement including:
    • Fixes a bug for reading very large XML documents on some platforms
    • Fixes a bug in parsing processing instruction
    • Fixes a bug in outputAndReparse()


link

Posted by SourceForge Robot 2010-08-11

Shuffle XML Elements with XPath and VTD-XML

This is a simple app that shuffles elements in an XML file. It uses XPath to address individual element then re-arrange and re-combine the fragments.  Those fragments are identified by their offsets and lengths, both of which are obtained by calling VTDNav's getElementFragment().

Why not using SAX and STaX?

Simply speaking, the lack of XPath support makes it very tedious, almost impossible, to re-arrange XML element fragments.... read more

Posted by SourceForge Robot 2010-05-25