Re: [Htmlparser-user] Help on extracting clean body content from web page

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

For more clarification, here is what I tried:

        Parser lParser = new Parser();
        try {
            lParser.setInputHTML(pHTML);   //as instructed in the JavaDocs
        } catch(ParserException e) {
            mLogger.info("getContent():: Caught ParsingException...");
        }

        NodeList lDocumentNodeList;
        NodeList lNodes;

        try {
            lDocumentNodeList = lParser.parse (null);  //I want to start
with the entire document
            lNodes = lDocumentNodeList.extractAllNodesThatMatch (new
TagNameFilter ("BODY"));  //I want the BODY tag
            mLogger.info("lNodes.size() = " + lNodes.size());    //Using
Log4J, I see that the size returned is 0 when it should be 1.
            if(lNodes.size() > 0) {  //none of this code executes because
size = 0
                String lText = lNodes.toString();   //I'm not sure if I'm
doing this right or not, but until the NodeList problem is resolved I can't
troubleshoot it
                String lasString = lNodes.asString();
                mLogger.info("lTExt = " + lText);
                mLogger.info("lasString = " + lasString);

            }

        } catch (ParserException e) {
            mLogger.info("ResponseParser:: Parsing exception caught.");
        }

Thanks again for your help.

-- 

James Mortensen
A-CTI Development Team