From: SourceForge.net <no...@so...> - 2010-04-21 11:01:32
|
The following forum message was posted by Anonymous at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3683463: Everytime I call getTextContent() on an org.w3c.dom.Node object, it always returns null. When I checked the documentation, it said getTextContent only returns null when the Node object is either of type DOCUMENT_NODE, DOCUMENT_TYPE_NODE, or NOTATION_NODE. This is odd because it returns null on virtually every DOM node. It illustrate my issue, I\'ve written a small test case. I\'ve used the following HTML code: [code]<!DOCTYPE html> <html> <head> <title>jwz</title> </head> <body> <p>text<b>b<i>i<u>u</u>i</i>b<br>b</b>text</p> </body> </html>[/code] Using the following Java code I\'ve tried to get the textContent of the <body> element: [code]// Load test.html. InputStream in = new FileInputStream(\"test.html\"); OutputStream out = null; // Parse test.html into a DOM tree. Tidy tidy = new Tidy(); Document doc = tidy.parseDOM(in, out); // Print <body>\'s text content. org.w3c.dom.Node body = doc.getElementsByTagName(\"body\").item(0); Element bodyElement = (Element) body; String bodyTextContent = bodyElement.getTextContent(); System.out.print(\"<body> TextContent:\\n\" + bodyTextContent);[/code] However, the result is: [code]<body> TextContent: null[/code] Did I do something wrong here? Or is this not supposed to happen? Thanks in advance! |