[Htmlparser-user] Extract HTML Body and output as (X)HTML standards
Brought to you by:
derrickoswald
From: Oliver S. <oli...@gm...> - 2010-07-05 16:31:02
|
Hi, I need to read arbitrary HTML (HTML 4 transitional, XHTML 1.0 strict, ...) extract the body as a fragment and output it again as another (XHTML standard). Reading the file is simple enough: Parser p = new Parser(resource); NodeFilter f = new NodeClassFilter(BodyTag.class); NodeList listOfBodies = p.extractAllNodesThatMatch(f); Node firstBody = listOfBodies.elementAt(0); NodeList bodyChildren = firstBody.getChildren(); System.out.println(bodyChildren.toHtml()); From this hpw can I output either valid HTML 4.0 code or valid XHTML 1.0 code? Best regards Oliver |