Madhu - 2006-04-04

Hi EveryBody.My Progmam is to transfer html file data into EXcel sheet .
Following Examlpe reads data from html page but within the td  span,font tags and comments are

there.

Font tags have some data like
ZD8230/<font   class="font20">$$50</font>       

span tags with some columns
Vaio   RS421<span style='mso-spacerun:yes'> </span>

Comments like this
<!--[if gte vml 1]><v:shape id="_x0000_s1138"    type="#_x0000_t75" alt=""

style='position:absolute;margin-left:26.25pt;    margin-top:6pt;width:24pt;height:24pt;z-index:47'> 

  <x:ClientData ObjectType="Pict">     <x:SizeWithCells/>    </x:ClientData>  

</v:shape><![endif]--><![if !vml]><span style='mso-ignore:vglayout;  

position:absolute;z-index:47;margin-left:35px;margin-top:8px;width:32px;   height:32px'><img

width=32 height=32 src="price_files/image016.gif" v:shapes="_x0000_s1138"></span><![endif]><span  

style='mso-ignore:vglayout2'>   <table cellpadding=0 cellspacing=0>    <tr>     <td height=17

class=xl43 align=right width=37 style='height:12.75pt;    

border-top:none;border-left:none;width:28pt' x:num>2</td>    </tr>   </table>   </span>           

                               
Problems involved in this are::

HOw to remove comments from tds ?
How to remove span tags?
How ro read inner tag data ie font data  should append  to column data?
{<td>ZD8230/<font   class="font20">$$50</font></td>}       

if u won't understood this problem just run this program
packages Used are:
jxl for writing into EXcel sheet.
Htmlparser 1.6
version.

Plz Help me its urgent.
If any wrongs in my english excuse me

    import org.htmlparser.*;
        import org.htmlparser.util.NodeList;
    import org.htmlparser.tags.TableColumn;
    import org.htmlparser.beans.FilterBean;
    import org.htmlparser.filters.TagNameFilter;
    import org.htmlparser.util.NodeIterator;
    import org.htmlparser.tags.TableRow;
    import org.htmlparser.Text;
    import org.htmlparser.Node;
    import org.htmlparser.nodes.TagNode;
    import jxl.*;
    import jxl.write.Label;
    import jxl.write.*;
    import java.io.File;
    import java.lang.String;
    import org.htmlparser.tags.CompositeTag;
    public class sent
    {
    public static void main(String args[]) throws Exception
    {
       
        WritableWorkbook workbook = Workbook.createWorkbook(new File("sent.xls"));
        FilterBean fb = new FilterBean();
        fb.setURL ("c:/html/price.htm");
        fb.setFilters (new NodeFilter[] { new TagNameFilter ("TR") });
        NodeList list=fb.getNodes ();
        WritableSheet sheet = workbook.createSheet("First Sheet", 0);
        sheet.addCell(new Label(2,0, "MODEL"));
        sheet.addCell(new Label(8,0, "PRICE"));
        sheet.addCell(new Label(10,0, "QUANTITY"));
        int r=0;
        String rowInc="true";
        for (int i =0,j=0;i<list.size();i++)
        {//for
            Node node=(Node)list.elementAt(i);
            TableRow tr=(TableRow)node;
            int tdLength=tr.getColumnCount();
            //System.out.println("The Columns are:"+tdLength);
            if(tdLength==8)
            {
                r++;
                TableColumn[] tc=tr.getColumns();
                int c=0;
                for(int k=0;k<tc.length;k++)
                {
                    if(k==1||k==4||k==5)
                    {
   
                            String val=(String)tc[k].getStringText();
                           

if(!(val.equals("Model"))&&!(val.equals("&nbsp;"))&&!(val.equals("Qty"))&&!(val.equals("Price")))
                            {
                            //System.out.println("Thevallue ::"+val);
                            sheet.addCell(new

Label(c,r,tc[k].getStringText()));                                   

               
                            //System.out.print(c+""+r+"

"+tc[k].getStringText());
                            }
                    }
                            c+=2;
                }
                    System.out.println("\n");
            }
       
        }//for
        workbook.write();
        workbook.close();
        }
}

Finally
Is that  the right way to pickup columns like tdLength==8 if not plz give solution for this
how to remove empty tags like
<td></td>

Bye.......
Regards,
Madhu