Menu

Parsing HTML Problem

Help
Pandu
2007-07-31
2013-04-27
  • Pandu

    Pandu - 2007-07-31

    Hi Everybody,

    I am very much new to html parsing.
    I have to parse a html file which has table tag, tr and tds. Every TD may, or may not have  data. For example my html is like this

    <HTML>
    <HEAD>
    <TITLE>no title</TITLE>
    </HEAD>
    <BODY>
       <TABLE>
                 <TR>
                       <TD>Id</TD>
                       <TD>Name</TD>
                       <TD>Age</TD>
                       <TD>Sex</TD>
                       <TD>Salary</TD>
                 </TR>
                       <TD>1</TD>
                       <TD>one</TD>
                       <TD>20</TD>
                       <TD>Male</TD>
                       <TD>2000</TD>
                 </TR>
                 <TR>
                       <TD>2</TD>
                       <TD>Two</TD>
                       <TD>21</TD>
                       <TD>Female</TD>
                       <TD></TD>
                 </TR>
                 <TR>
                       <TD>3</TD>
                       <TD>Three</TD>
                       <TD>22</TD>
                       <TD>Male</TD>
                       <TD></TD>
                 </TR>
                 <TR>
                       <TD>4</TD>
                       <TD>Four</TD>
                       <TD>23</TD>
                       <TD></TD>
                       <TD></TD>

                 </TR>
                 <TR>
                       <TD>5</TD>
                       <TD>Five</TD>
                       <TD>24</TD>
                       <TD></TD>
                       <TD>30000</TD>

                 </TR>
       </TABLE>
    </BODY>
    </HTML>

    Now I have to go through all the trs ,get the text in tds and i have make it as object. For example object is like this (1 one twenty male 2000). I have managed to write the code to parse the html, but its not giving the exact result as i wanted. I am posting my code also.

                Parser parser = new Parser("D:/java/HtmlParse/Example.html");
                NodeList tablesList = parser.parse (new TagNameFilter("table"));
                NodeList tr_tagsList = tablesList.extractAllNodesThatMatch(new TagNameFilter("TR"), true);
               
               
                NodeList td_tagsList = tr_tagsList.extractAllNodesThatMatch(new TagNameFilter("TD"), true);
                System.out.println("found "+tr_tagsList.size() +" tr tags");
                System.out.println("found "+td_tagsList.size() +" td tags");
               

    Do suggest me any changes if needed. Please help me, Any help is appreciated and thanks in advance

     
    • Derrick Oswald

      Derrick Oswald - 2007-07-31

      How is it "not giving the exact result as i wanted"?
      What are you getting that isn't right?

      If you want each row individually you will need to extract the TD tags only from each node:

      ... for each tr tag in the tr_tagsList...
         tr.getChildren().extractAllNodesThatMatch(new TagNameFilter("TD"), true);

       
    • Pandu

      Pandu - 2007-08-01

      Thanks for the help Derrick.
      You got my intention. I want each row to be extracted as you said. Thanks for the help

       
    • Pandu

      Pandu - 2007-08-01

      The problem got solved. But now i struckedup with new problem, the td's which do not have values aren't giving any values, not even a null value or a space. It seems those are objects of somekind. I trimmed theat value but not getting any thing. Can you suggest anything for this??? My code is like this

      for(int i=0;i<tr_tagsList.size();i++)
      {
                                     
                      NodeList nlist = tr_tagsList.elementAt(i).getChildren().extractAllNodesThatMatch(new TagNameFilter("TD"), true);
                      for(int j=0;j<nlist.size();j++)
                          String str = nlist.elementAt(j).toPlainTextString().toString();
                          str.trim();
      if(str==null)
      {
          System.out.println("Hi");
      }

      if(str!=null)
      {
         System.out.println("Hello");
      }
      }

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.