Hi EveryBody.My Progmam is to transfer html file data into EXcel sheet .
Following Examlpe reads data from html page but within the td span,font tags and comments are
there.
Font tags have some data like
ZD8230/<font class="font20">$$50</font>
span tags with some columns
Vaio RS421<span style='mso-spacerun:yes'> </span>
Comments like this
<!--[if gte vml 1]><v:shape id="_x0000_s1138" type="#_x0000_t75" alt=""
HOw to remove comments from tds ?
How to remove span tags?
How ro read inner tag data ie font data should append to column data?
{<td>ZD8230/<font class="font20">$$50</font></td>}
if u won't understood this problem just run this program
packages Used are:
jxl for writing into EXcel sheet.
Htmlparser 1.6
version.
Plz Help me its urgent.
If any wrongs in my english excuse me
Hi EveryBody.My Progmam is to transfer html file data into EXcel sheet .
Following Examlpe reads data from html page but within the td span,font tags and comments are
there.
Font tags have some data like
ZD8230/<font class="font20">$$50</font>
span tags with some columns
Vaio RS421<span style='mso-spacerun:yes'> </span>
Comments like this
<!--[if gte vml 1]><v:shape id="_x0000_s1138" type="#_x0000_t75" alt=""
style='position:absolute;margin-left:26.25pt; margin-top:6pt;width:24pt;height:24pt;z-index:47'>
<x:ClientData ObjectType="Pict"> <x:SizeWithCells/> </x:ClientData>
</v:shape><![endif]--><![if !vml]><span style='mso-ignore:vglayout;
position:absolute;z-index:47;margin-left:35px;margin-top:8px;width:32px; height:32px'><img
width=32 height=32 src="price_files/image016.gif" v:shapes="_x0000_s1138"></span><![endif]><span
style='mso-ignore:vglayout2'> <table cellpadding=0 cellspacing=0> <tr> <td height=17
class=xl43 align=right width=37 style='height:12.75pt;
border-top:none;border-left:none;width:28pt' x:num>2</td> </tr> </table> </span>
Problems involved in this are::
HOw to remove comments from tds ?
How to remove span tags?
How ro read inner tag data ie font data should append to column data?
{<td>ZD8230/<font class="font20">$$50</font></td>}
if u won't understood this problem just run this program
packages Used are:
jxl for writing into EXcel sheet.
Htmlparser 1.6
version.
Plz Help me its urgent.
If any wrongs in my english excuse me
import org.htmlparser.*;
import org.htmlparser.util.NodeList;
import org.htmlparser.tags.TableColumn;
import org.htmlparser.beans.FilterBean;
import org.htmlparser.filters.TagNameFilter;
import org.htmlparser.util.NodeIterator;
import org.htmlparser.tags.TableRow;
import org.htmlparser.Text;
import org.htmlparser.Node;
import org.htmlparser.nodes.TagNode;
import jxl.*;
import jxl.write.Label;
import jxl.write.*;
import java.io.File;
import java.lang.String;
import org.htmlparser.tags.CompositeTag;
public class sent
{
public static void main(String args[]) throws Exception
{
WritableWorkbook workbook = Workbook.createWorkbook(new File("sent.xls"));
FilterBean fb = new FilterBean();
fb.setURL ("c:/html/price.htm");
fb.setFilters (new NodeFilter[] { new TagNameFilter ("TR") });
NodeList list=fb.getNodes ();
WritableSheet sheet = workbook.createSheet("First Sheet", 0);
sheet.addCell(new Label(2,0, "MODEL"));
sheet.addCell(new Label(8,0, "PRICE"));
sheet.addCell(new Label(10,0, "QUANTITY"));
int r=0;
String rowInc="true";
for (int i =0,j=0;i<list.size();i++)
{//for
Node node=(Node)list.elementAt(i);
TableRow tr=(TableRow)node;
int tdLength=tr.getColumnCount();
//System.out.println("The Columns are:"+tdLength);
if(tdLength==8)
{
r++;
TableColumn[] tc=tr.getColumns();
int c=0;
for(int k=0;k<tc.length;k++)
{
if(k==1||k==4||k==5)
{
String val=(String)tc[k].getStringText();
if(!(val.equals("Model"))&&!(val.equals(" "))&&!(val.equals("Qty"))&&!(val.equals("Price")))
{
//System.out.println("Thevallue ::"+val);
sheet.addCell(new
Label(c,r,tc[k].getStringText()));
//System.out.print(c+""+r+"
"+tc[k].getStringText());
}
}
c+=2;
}
System.out.println("\n");
}
}//for
workbook.write();
workbook.close();
}
}
Finally
Is that the right way to pickup columns like tdLength==8 if not plz give solution for this
how to remove empty tags like
<td></td>
Bye.......
Regards,
Madhu