[Jtidy-user] [Help] getTextContent() always returning null

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

The following forum message was posted by Anonymous at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3683463:

Everytime I call getTextContent() on an org.w3c.dom.Node object, it always returns null. When I checked the documentation, it said getTextContent only returns null when the Node object is either of type DOCUMENT_NODE, DOCUMENT_TYPE_NODE, or NOTATION_NODE. This is odd because it returns null on virtually every DOM node.

It illustrate my issue, I\'ve written a small test case. I\'ve used the following HTML code:

[code]<!DOCTYPE html>
<html>
<head>
<title>jwz</title>
</head>
<body>
<p>text<b>b<i>i<u>u</u>i</i>b<br>b</b>text</p>
</body>
</html>[/code]

Using the following Java code I\'ve tried to get the textContent of the <body> element:

[code]// Load test.html.
InputStream in = new FileInputStream(\"test.html\");
OutputStream out = null;

// Parse test.html into a DOM tree.
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(in, out);

// Print <body>\'s text content.
org.w3c.dom.Node body = doc.getElementsByTagName(\"body\").item(0);
Element bodyElement = (Element) body;
String bodyTextContent = bodyElement.getTextContent();
System.out.print(\"<body> TextContent:\\n\" + bodyTextContent);[/code]

However, the result is:

[code]<body> TextContent:
null[/code]

Did I do something wrong here? Or is this not supposed to happen?

Thanks in advance!