Menu

How to remove comment nodes from an XML document?

In this post I am going to show you how to effectively remove comments from an XML document using the combination of XMLModifier and XPath. The input XML document looks like the following.

<clients>
<!-- some other code here -->

<function>
</function>

<function>
</function>

<function>
<name>data_values</name>
<variables>
<variable><!-- some other code here -->
<name>temp</name>
<!-- some other code here --> <type>double</type>
</variable>
</variables><!-- some other code here -->
<block><!-- some other code here -->
<opster>temp = 1</opster>
</block>
</function>
</clients>

The code that performs the task is listed below. The key is the XPath expression "//comment()" which selects all the comment nodes in the document. After binding VTDNav object to the XMLModifier object, you can simply call the "remove()" method, which will not only remove the content of the comment, but also the surrounding delimiting text (i.e. <!-, and ->).

import com.ximpleware.*;
import java.io.*;
public class removeNodesDemo {

public static void main(String[] args) throws VTDException, IOException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("d:\\xml\\input2.xml",false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
XMLModifier xm = new XMLModifier(vn);
ap.selectXPath("//comment()");

int i=0;
while((i=ap.evalXPath())!=-1){
xm.remove();
}
xm.output("d:\\xml\\output2.xml");
}
}

The output XML is

<clients>

<function>
</function>

<function>
</function>

<function>
<name>data_values</name>
<variables>
<variable>
<name>temp</name>
<type>double</type>
</variable>
</variables>
<block>
<opster>temp = 1</opster>
</block>
</function>
</clients>

You might ask what if I want to remove an attribute node,  a text node, or a CDATA node, an element node, or an processing instruction node?

The effective of XMLModifer's remove() method has the following effect on each type of nodes:

  • On non-CDATA text nodes: it will simply remove it
  • On a CDATA typed text nodes, it will remove the text content and surrounding delimiting texts
  • On an element node, it remove the entire fragment of it
  • On an attribute node, it will remove both the attribute name value pair in its entirety.
  • On a processing instruction node, it will remove both the content and the surrounding delimiting text.

In other words, to remove all processing instruction nodes, just substitute the XPath expression above with "//processing-instruction()."


link

Posted by SourceForge Robot 2016-05-02

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.