HTML Parser / Discussion / Help: Why My Parser is not working

Jeff - 2007-11-19

I wrote the below code to get all the locations as one of the user in previous thread tried to do... I dont know why my code is not working. Can i know the reason. Anybody help me. I am verymuch new to the parsing. My code is not working with the link
http://cke.know-where.com/hardees/cgi/selection?mapid=US&lang=en&design=default&addr=&city=&region=&zip=19362&phone=
Thanks in advance
-Jeff

public class SampleParsing
{
    public static void parseData(String web_url)throws Exception
    {
        try
        {
            System.out.println(web_url);
            Parser parser = new Parser(htmlFileToParse);

            NodeList td_list = parser.parse( new AndFilter (new TagNameFilter("table"), new HasAttributeFilter("valign","top")));

            System.out.println(td_list.size());

            for(int i=0;i<td_list.size();i++)
            {
                String s=td_list.elementAt(i).toPlainTextString();
                System.out.println(s);
            }
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
    }



    public static void main(String args[])throws Exception
    {
        String weburl = null;
        try
        {
            weburl = "http://cke.know-where.com/hardees/cgi/selection?mapid=US&lang=en&design=default&addr=&city=&region=&zip=19362&phone=";
            parseData(weburl);
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
    }

}

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Clem Wang - 2008-03-26
  
  You got snared by the same bug I just encountered tonight. I've reported it as:
  
  http://sourceforge.net/tracker/index.php?func=detail&aid=1925846&group_id=24399&atid=381399
  
  The URL you mention has this apparently harmless HTML comment at the top of the page:
  
  
  
  It appears that if the parser encounters a THIRD dash before the last ">", it gets confused and thinks the comment keeps on going, until I'm not sure when... Hence, the parser causes large amounts of HTML to get absorbed by the comment.
  
  If you want to test your code, make a copy of the web page with these problematic comments stripped out.
  
  I also think this bug might be related:
  http://sourceforge.net/tracker/index.php?func=detail&aid=1845913&group_id=24399&atid=381399
  
  These two bugs put a real crimp in being able to use this htmlparser for random real world webpages, because nowadays, a lot of javascript gets embedded in a lot of webpages for tracking and stuff.
  
  Give your code another try. It might actually be working if it weren't for the bug(s).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Clem Wang - 2008-03-26
  
  Latest info! I got a response to my bug report.
  
  The "problem" (or not) is that the default value of:
  
  Lexer.STRICT_REMARKS is true, which causes the "misbehavior" that I believe you and are observing.
  
  To get your code to do "right thing" (or at least what I believe is the right thing), you need to add to your program:
  
  Lexer.STRICT_REMARKS = false;
  
  If you add this to your code, I believe then your program has a good shot at working.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Why My Parser is not working

Forums

Help

Why My Parser is not working document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Why My Parser is not working