Hi,
I need to read arbitrary HTML (HTML 4 transitional, XHTML 1.0 strict, ...) extract the body as a fragment and output it again as another (XHTML standard).
Reading the file is simple enough:
Parser p = new Parser(resource);
NodeFilter f = new NodeClassFilter(BodyTag.class);
NodeList listOfBodies = p.extractAllNodesThatMatch(f);
Node firstBody = listOfBodies.elementAt(0);
NodeList bodyChildren = firstBody.getChildren();
System.out.println(bodyChildren.toHtml());
From this hpw can I output either valid HTML 4.0 code or valid XHTML 1.0 code?
Best regards
Oliver
|