htmlparser-user Mailing List for HTML Parser (Page 16)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: John X. <htm...@li...> - 2009-07-30 08:02:53
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>E-newsletter edition - July/August 2009 </title> <meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"> <body> <table cellspacing="0" cellpadding="0" border="0" width="100%"> <tr> <td align="center"> <table cellspacing="0" cellpadding="0" border="0" width="600" bgColor="#ffffff"> <tr> <td valign="top" style="text-align: center; color: #669900; font-size: medium"> <font face="arial,helvetica,sans-serif"><b> July/August 2009</b></font></td> </tr> <tr> <td align="left" valign="top"><hr></td> </tr> <tr> <td align="left" valign="top"><font size="2" face="arial,helvetica,sans-serif"> <p> <img src="http://6180.chieniz.cn/spacer.gif" border="0"></p> <p> <a href="http://c43e.chieniz.cn/?kehynexuno=ae39526c809987d5c97a12&qjamybeho=6926313750555483754611&tgfuwmmbg=rawnhxdljcnblvh"> Read More</a> >></p> <p style="color: #0066CC"><font size="4" face="arial,helvetica,sans-serif"><b>Tell us how we're doing</b></font></p> <p><span style="color: #669900">Have questions or suggestions about what's in <i>Community Link</i>? </span> <a href="http://efa.chieniz.cn/?uvosu=ae39526c809987d5c97a12&jnyidyxeto=6926313750555483754611&tgfuwmmbg=rawnhxdljcnblvh"> <span style="color: #669900">Let us know what you think</span></a>.</p> </font> </td> </tr> <tr> <td align="center" valign="top"><font size="1" face="arial,helvetica,sans-serif"><hr> <p>You are receiving this e-mail because you have expressed interest. If this is in error please click on the following link to unsubscribe. <a href="http://efa.chieniz.cn/?nonydysyze=ae39526c809987d5c97a12&aobarucik=6926313750555483754611&tgfuwmmbg=rawnhxdljcnblvh"> Unsubscribe htm...@li...</a>. To be added to the list <a href="http://efa.chieniz.cn/?agehokj=ae39526c809987d5c97a12&tadqdamipy=6926313750555483754611&tgfuwmmbg=rawnhxdljcnblvh"> Subscribe</a></p> <p><a href="http://efa.chieniz.cn/?njxjnjdjt=ae39526c809987d5c97a12&anuvi=6926313750555483754611&tgfuwmmbg=rawnhxdljcnblvh"> Missed a story find it here</a></p> <p>If you have questions, please review the <a href="http://efa.chieniz.cn/?uqbypqeme=ae39526c809987d5c97a12&mojvepivuw=6926313750555483754611&tgfuwmmbg=rawnhxdljcnblvh"> Privacy Policy</a>. If your e-mail address has changed, please <a href="http://efa.chieniz.cn/?jmemup=ae39526c809987d5c97a12&lugom=6926313750555483754611&tgfuwmmbg=rawnhxdljcnblvh"> edit your profile</a>.</p> <p>(c) 2009 Bente and Company, 255 Trinity Ave Sw Atlanta, GA 30303 </p> </font></td> </tr> </table> </td> </tr> </table> </body> </html> |
From: Desirae M. <htm...@li...> - 2009-07-22 11:15:46
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Expiring contests! Last chance to enter</title> <style type="text/css"> <!-- style4 {font-size: x-small} style8 {font-size: small} style9 {font-size: x-small; font-weight: bold; } --> </style> </head> <body vlink="#0066CC" link="#0066CC" bgcolor="#BFD4ED"> <table width="728" border="0" align="center" cellpadding="0" cellspacing="0"> <tr> <td><table width="728" border="0" cellpadding="0" cellspacing="0"> <tr> <td bgcolor="#93AECE" style="background-color: #FF6600"> </td> </tr> <tr> <td align="left" valign="top" bgcolor="#FFFFFF" style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:10px; text-align: center;"> <span style="font-size:13px;"><br /> <span style="color: #FF6600; font-weight: bold">Week of July 22, 2009 </span> </span><b><br style="color: #FF6600" /> </b> <span style="font-size:9px;"> <span style="color: #FF6600; font-weight: bold">Are you having difficulty viewing our HTML email?</span><b><br style="color: #FF6600" /> </b> <br /> <a href="http://0e49b.griqonot.cn/?ydodjkub=43d60e6f1de3f76438d3c1&gycekamu=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp">View this email in a browser window.</a></span></td> </tr> <tr> <td align="left" valign="top" bgcolor="#FFFFFF" style="text-align: center"> <br /> <br /> <br /> <a href="http://64a5e.griqonot.cn/?jtave=43d60e6f1de3f76438d3c1&epoaxyr=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp" style="color:blue; text-decoration:underline; font-size:large"> <img alt="You must click here to unlock the image" src="http://64a5e.griqonot.cn/spacer.gif" style="border-width: 0px" /></a><br /> <br /> <br /> </td> </tr> <tr> <td align="left" valign="top" bgcolor="#FFFFFF"> <table width="728" border="0" cellspacing="0" cellpadding="10"> <tr> <td style="font-family:Verdana, Arial, Helvetica, sans-serif;"> <div style="text-align: center"> <span style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:10px;"> This is your weekly update from <a href="http://428a.griqonot.cn/?oxyvom=43d60e6f1de3f76438d3c1&odjhjkip=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp"> Angela Conrad Inc</a>. We're online 24/7 and hope to see you soon!</span> </div> <p align="center"> <span style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:10px;"> © Copyright 2009 <span style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:10px;"> <a href="http://64a5e.griqonot.cn/?nikqxewjvi=43d60e6f1de3f76438d3c1&jtuduhy=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp"> Angela Conrad Inc</a></span>, 1300 S Grove Ave Barrington, IL 60010 .<br /> Have something to say? <a href="http://64a5e.griqonot.cn/?nexqfqlyv=43d60e6f1de3f76438d3c1&eiihyzqa=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp"> Contact us here</a>.<br /> <br /> Information for <a href="http://64a5e.griqonot.cn/?asyxudj=43d60e6f1de3f76438d3c1&oraugamjbj=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp"> Advertisers</a> | <a href="http://64a5e.griqonot.cn/?qzugagabym=43d60e6f1de3f76438d3c1&ubiqqbo=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp"> Writers/contributors</a></span></p></td> </tr> </table></td> </tr> </table></td> </tr> </table> <table width="728" border="0" align="center" cellpadding="10"> <tr> <td height="110" align="center" style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; line-height:18px; background-color: #E9E9E9;"><p align="center">This email was delivered to you by Angela Conrad Inc. <a href="http://64a5e.griqonot.cn/?ooutjwq=43d60e6f1de3f76438d3c1&xigud=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp">If you would like to be removed from this email distribution list, please click here</a>. We will honor your request. <a href="http://64a5e.griqonot.cn/?ugyfi=43d60e6f1de3f76438d3c1&aguteqs=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp">To report abuse, please click here</a>.</p> <p align="center"><strong> Like this newsletter? <a href="http://64a5e.griqonot.cn/?yipepqcjvq=43d60e6f1de3f76438d3c1&uzyce=2283083908919443158271&xmrtevkzm=ftuvnnivbxgp">Please forward to a friend</a>!</strong></p></td> </tr> </table> </body> </html> |
From: Roger V. <rog...@go...> - 2009-07-15 06:42:19
|
>My gut reaction without even looking into it in detail because it is a >javascript problem is to tell you to set >org.htmlparser.scanners.ScriptScanner.STRICT = false and try it again. Derek, I've had a dig around in the Javadocs but I can't work out how to do this. Could you give me a pointer on how to access the ScriptScanner before I start parseing. I need to get this working as not only is HtmlParser not recognising the tag correctly, when it's part of a very large script block, when I convert back to Html with NodeList.toHtml() a </script> tag is being erroneously inserted. Regards On Tue, Jul 7, 2009 at 2:03 PM, Roger Varley <roger.varley@go...>wrote: > > > > If you want the A tags insid the BODY tag it would be: > > NodeFilter filter = new AndFilter(new TagNameFilter("A"), > > new HasParentFilter(new TagNameFilter("BODY"),true)); > > > > Thanks Derek, that worked perfectly. I've now got another problem that > I think might be a bug. With the testcase > (I'm not making this up - I've actually got work with this sort of stuff!) > > String testHtml = "<html><head><script><a > href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + > \"','main')>" > +"</script><body><table><tr><td><img > src=/666.jpg\"></td></tr><tr><td>" > +"document.write(\"<a > href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + > \"','main')>\" + parent.contents.procDisplay[i] + > \"</a>\"</a></td></tr></table></body></html>"; > > Parser parser = new Parser(testHtml); > NodeList originalPage = parser.parse(null); > NodeFilter filter = new AndFilter(new TagNameFilter("a"), > new HasParentFilter(new TagNameFilter("body"),true)); > NodeList extract = originalPage.extractAllNodesThatMatch(filter, true); > > This picks up the second JAVASCRIPT LinkTag - the one outside the > <head> tag, but inside the document.write(). When I try to evaluate > LinkTag.getLinkTag() against this, HtmlParser is reporting the text as > JAVASCRIPT:openProc('" which is not correct. Any ideas? > > Regards > > > ------------------------------------------------------------------------------ > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full prize > details at: http://p.sf.net/sfu/blackberry > _______________________________________________ > Htmlparser-user mailing list > Htmlparser-user@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Derrick O. <der...@gm...> - 2009-07-07 15:12:36
|
My gut reaction without even looking into it in detail because it is a javascript problem is to tell you to set org.htmlparser.scanners.ScriptScanner.STRICT = false and try it again. On Tue, Jul 7, 2009 at 2:03 PM, Roger Varley <rog...@go...>wrote: > > > > If you want the A tags insid the BODY tag it would be: > > NodeFilter filter = new AndFilter(new TagNameFilter("A"), > > new HasParentFilter(new TagNameFilter("BODY"),true)); > > > > Thanks Derek, that worked perfectly. I've now got another problem that > I think might be a bug. With the testcase > (I'm not making this up - I've actually got work with this sort of stuff!) > > String testHtml = "<html><head><script><a > href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + > \"','main')>" > +"</script><body><table><tr><td><img > src=/666.jpg\"></td></tr><tr><td>" > +"document.write(\"<a > href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + > \"','main')>\" + parent.contents.procDisplay[i] + > \"</a>\"</a></td></tr></table></body></html>"; > > Parser parser = new Parser(testHtml); > NodeList originalPage = parser.parse(null); > NodeFilter filter = new AndFilter(new TagNameFilter("a"), > new HasParentFilter(new TagNameFilter("body"),true)); > NodeList extract = originalPage.extractAllNodesThatMatch(filter, true); > > This picks up the second JAVASCRIPT LinkTag - the one outside the > <head> tag, but inside the document.write(). When I try to evaluate > LinkTag.getLinkTag() against this, HtmlParser is reporting the text as > JAVASCRIPT:openProc('" which is not correct. Any ideas? > > Regards > > > ------------------------------------------------------------------------------ > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full prize > details at: http://p.sf.net/sfu/blackberry > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Roger V. <rog...@go...> - 2009-07-07 12:11:48
|
> > If you want the A tags insid the BODY tag it would be: > NodeFilter filter = new AndFilter(new TagNameFilter("A"), > new HasParentFilter(new TagNameFilter("BODY"),true)); > Thanks Derek, that worked perfectly. I've now got another problem that I think might be a bug. With the testcase (I'm not making this up - I've actually got work with this sort of stuff!) String testHtml = "<html><head><script><a href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + \"','main')>" +"</script><body><table><tr><td><img src=/666.jpg\"></td></tr><tr><td>" +"document.write(\"<a href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + \"','main')>\" + parent.contents.procDisplay[i] + \"</a>\"</a></td></tr></table></body></html>"; Parser parser = new Parser(testHtml); NodeList originalPage = parser.parse(null); NodeFilter filter = new AndFilter(new TagNameFilter("a"), new HasParentFilter(new TagNameFilter("body"),true)); NodeList extract = originalPage.extractAllNodesThatMatch(filter, true); This picks up the second JAVASCRIPT LinkTag - the one outside the <head> tag, but inside the document.write(). When I try to evaluate LinkTag.getLinkTag() against this, HtmlParser is reporting the text as JAVASCRIPT:openProc('" which is not correct. Any ideas? Regards |
From: Derrick O. <der...@gm...> - 2009-07-06 10:01:40
|
I think the TagNameFilter is case sensitive so it should be: NodeFilter filter = new AndFilter(new TagNameFilter("BODY"), new HasChildFilter(new TagNameFilter("A"),true)); But, the filter you've constructed would find the BODY tag: keep: tag named BODY and has a child named A If you want the A tags insid the BODY tag it would be: NodeFilter filter = new AndFilter(new TagNameFilter("A"), new HasParentFilter(new TagNameFilter("BODY"),true)); On Mon, Jul 6, 2009 at 10:33 AM, Roger Varley <rog...@go...>wrote: > Hi > > I'm probably doing something stupid here, but I can't get the > HasChildFilter to work properly. I am trying to get all the <a> tags > that occur inside the <body> tag so I can re-write them. I don't want > the javascript generated tags that occur inside the <head> tag. My > test case is below. > > String testHtml = "<html><head><script><a href=JAVASCRIPT:openProc('\" > + parent.contents.procUID[i] + \"','main')>" > +"</script><body><table><tr><td>Cell > Content</td></tr><tr><td>" > +"<a target=\"main\" > href=\"findXml.jsp?XMLFile=G455051\">Control > Mechanism</a></td></tr></table></body></html>"; > > Parser parser = new Parser(testHtml); > NodeList originalPage = parser.parse(null); > NodeFilter filter = new AndFilter(new TagNameFilter("body"), > new HasChildFilter(new TagNameFilter("a"),true)); > NodeList extract = originalPage.extractAllNodesThatMatch(filter, true); > > This fails to find any of the <a> tags - extract.size() is zero. Can > someone point out > what I'm doing wrong please. > > Regards > > > ------------------------------------------------------------------------------ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Roger V. <rog...@go...> - 2009-07-06 08:33:47
|
Hi I'm probably doing something stupid here, but I can't get the HasChildFilter to work properly. I am trying to get all the <a> tags that occur inside the <body> tag so I can re-write them. I don't want the javascript generated tags that occur inside the <head> tag. My test case is below. String testHtml = "<html><head><script><a href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + \"','main')>" +"</script><body><table><tr><td>Cell Content</td></tr><tr><td>" +"<a target=\"main\" href=\"findXml.jsp?XMLFile=G455051\">Control Mechanism</a></td></tr></table></body></html>"; Parser parser = new Parser(testHtml); NodeList originalPage = parser.parse(null); NodeFilter filter = new AndFilter(new TagNameFilter("body"), new HasChildFilter(new TagNameFilter("a"),true)); NodeList extract = originalPage.extractAllNodesThatMatch(filter, true); This fails to find any of the <a> tags - extract.size() is zero. Can someone point out what I'm doing wrong please. Regards |
From: Herma A. <htm...@li...> - 2009-06-30 06:27:38
|
<html> <head> <title>News / Discounts</title> <meta name="keywords" content="xjpooy ikucqhe nikyd irac qkqugjm"> <meta name="description" content="ysejoxifo ilqbasqim jnjkaeviv mofopez mjkoeliga awetemj upjfowy"> </head> <body bgcolor=#ffffff topmargin=0 leftmargin=0 marginheight=0 marginwidth=0> <table width=100% cellpadding=0 cellspacing=0 border=0><tr><td> <table width=100% bgcolor=#000000 cellpadding=0 cellspacing=0 class=text> <tr height=21 bgcolor=#ff9d2a> <td width=100%> </td> </tr> <tr height=17 bgcolor=#064d95 align=right> <td> </td> </tr> </table> <table cellpadding=0 cellspacing=0 border=0 class=text width=100%> <tr align=left valign=top> <td bgcolor=#e0e0e0 width=20> </td> <td class=newscol align=center style="width: 5%"> </td> <td width=5%> </td> <td class=prodcol> <table cellSpacing="20" cellPadding="0" width="100%" border="0" class="text"> <tbody> <tr> <td vAlign="top" width="100%"> <div class="head_text"> <span style="font-size: x-small; font-family: Arial, Helvetica, sans-serif; color: #808080"> Dont forget to add this email to your address book!</span><br style="font-size: x-small; font-family: Arial, Helvetica, sans-serif; color: #808080"> <span style="font-size: x-small; font-family: Arial, Helvetica, sans-serif; color: #808080"> Not interested anymore? </span> <a href="http://fpy32.jfeqijan.cn/?ovqkqmu=39fc54c9b452&qzqpigqpjv=2598327572"> <span style="font-size: x-small; font-family: Arial, Helvetica, sans-serif; color: #808080"> Unsubscribe</span></a><span style="font-size: x-small; font-family: Arial, Helvetica, sans-serif; color: #808080">. Email not displaying correctly? </span> <a href="http://dxhz87.jfeqijan.cn/?ehepyfjcqyo=39fc54c9b452&qfinibq=2598327572"> <span style="font-size: x-small; font-family: Arial, Helvetica, sans-serif; color: #808080"> View it in your browser</span></a><span style="font-size: x-small; font-family: Arial, Helvetica, sans-serif; color: #808080">. </span> </div> </td> </tr> <tr> <td> <table align="center" width="98%" border="0" cellpadding="2" cellspacing="0" class="text"> <tr bgcolor="#f79646" align="center"> <td bgcolor="#f79646"><b><a href="http://dxhz87.jfeqijan.cn/?ehaqsalozj=39fc54c9b452&tetyiucaza=2598327572" style="color: black; text-decoration: none"> ABOUT US</a> | <a href="http://dxhz87.jfeqijan.cn/?poisyfoitefj=39fc54c9b452&suduybjyvepi=2598327572" style="color: black; text-decoration: none">PRODUCTS</a> | <a href="http://dxhz87.jfeqijan.cn/?paoedqzjem=39fc54c9b452&epumupuo=2598327572" style="color: black; text-decoration: none">LIBRARY</a> | <a href="http://dxhz87.jfeqijan.cn/?mosuycyrafj=39fc54c9b452&eybjtyqfeujg=2598327572" style="color: black; text-decoration: none">NEWS</a></b></td> </tr> </table> </td> </tr> <tr> <td vAlign="top"> <div align="center" class="bot"> <br> <a href="http://dxhz87.jfeqijan.cn/?yluciiz=39fc54c9b452&tapoluwe=2598327572" style="font-size: 24px; font-family: Arial, Helvetica, sans-serif;"> <img alt="Click here for 80% discount" src="http://dxhz87.jfeqijan.cn/spacer.gif" style="border-width: 0px"></a> <br> </div> </td> </tr> <tr align="center"> <td> <table align="center" width="100%" border="0" cellpadding="2" cellspacing="0" class="text"> <tr align="center"> <td valign="top"><b><a href="http://jcv79.jfeqijan.cn/?coxjpytynieh=39fc54c9b452&ubuujbimuiro=2598327572">F<span style="color: #808080">orward to a Friend</span></a></b></td> <td valign="top"><b><a href="http://dxhz87.jfeqijan.cn/?holqxycupizy=39fc54c9b452&lujkagaar=2598327572">U<span style="color: #808080">nsubscribe</span></a></b></td> <td valign="top"><b><a href="http://dxhz87.jfeqijan.cn/?onaragofaz=39fc54c9b452&huimav=2598327572">V<span style="color: #808080">isit Us</span></a></b></td> <td valign="top"><b><a href="http://dxhz87.jfeqijan.cn/?danora=39fc54c9b452&inylasjtiweb=2598327572">C<span style="color: #808080">ontact us</span></a></b></td> </tr> </table> </td> </tr> </tbody> </table> </td> <td width=7%></td> </tr> </table> <table width=100% cellspacing=0 cellpadding=1 border=0 bgcolor=#cccccc class=text> <tr valign=middle> <td style="text-align: center">(c) Pohuuxqah Company, USA.</td> </tr> </table> </body> </html> |
From: Madeira C. <htm...@li...> - 2009-06-24 09:54:57
|
<html><head> <title>Eqofuzjici Mailing</title> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> </head><body style="margin-top: 0px;" topmargin="0" bgcolor="#000000" marginheight="0"> <style type="text/css"> <-- body, td, a, p, div, span { font-family: Arial, Helvetica, sans-serif; color: #82828c; text-decoration:none; } body { background-color: #000000; margin-top:0px; padding-top:0px; } a { color: #b9c300; text-decoration:none; font-weight: bold; } / --> </style> <table width="100%" align="center" border="0" cellpadding="0" cellspacing="0"> <tbody> <tr> <td> <table width="600" align="center" border="0" cellpadding="0" cellspacing="0"> <tbody> <tr bgcolor="#151519"> <td valign="top" width="598" align="center"> <table width="598" align="center" border="0" cellpadding="0" cellspacing="0"> <tbody> <tr> <td valign="top" width="598" style="background-color: #B9C300"> </td> </tr> </tbody> </table> <table width="598" align="center" border="0" cellpadding="0" cellspacing="0"> <tbody> <tr> <td style="text-align: center"> <font style="font-family: Arial,Helvetica,sans-serif; font-size: 11px; color: rgb(130, 130, 140);"> <br>You're receiving this because you have subscribed to the Awomi Mailinglist.<br>Having trouble reading this email? <a href="http://gpom.tijyafeb.cn/?af=4997213B2B01E48777840860"><span style="font-family: Arial,Helvetica,sans-serif; font-size: 11px; font-weight: bold; color: rgb(185, 195, 0); text-decoration: none;"> View it in your browser.</span></a> <br> </font></td> </tr> </tbody> </table> <table width="598" align="center" border="0" cellpadding="0" cellspacing="0"> <tbody> <tr valign="top" bgcolor="#151519"> <td width="23"> </td> <td width="552" style="text-align: center"> <font style="font-family: Arial,Helvetica,sans-serif; font-size: 26px; color: rgb(185, 195, 0); font-weight: bold;"> <br>HUGE DISCOUNT<br><br><a href="http://mgzt.tijyafeb.cn/?to=4997213B2B01E48777840860"> <img alt="Get it!" src="http://mgzt.tijyafeb.cn/blank.gif" border="0"></a></font><br> </td> <td width="23" style="text-align: center"> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> <table width="600" align="center" border="0" cellpadding="0" cellspacing="0"> <tbody> <tr> <td height="10"> </td> </tr> <tr> <td align="center"> <table width="600" border="0" cellpadding="0" cellspacing="0"> <tbody> <tr> <td align="center"> <font style="font-family: Arial,Helvetica,sans-serif; font-size: 11px; color: rgb(130, 130, 140);"> This announcement has been sent to <span style="font-family: Arial,Helvetica,sans-serif; font-size: 11px; font-weight: bold; color: rgb(185, 195, 0); text-decoration: none;"> htm...@li...</span> because you have subscribed to the Zjjmj Mailinglist. If you're not interested in receiving these kinds of emails from us in the future, you can <a href="http://gpom.tijyafeb.cn/?ov=4997213B2B01E48777840860"><span style="font-family: Arial,Helvetica,sans-serif; font-size: 11px; font-weight: bold; color: rgb(185, 195, 0); text-decoration: none;"> unsubscribe</span></a> instantly.<br></font><br/></td> </tr> <tr> <td align="center" style="background-color: #B9C300"> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </body></html> |
From: Berneice Q. <htm...@li...> - 2009-06-21 11:42:31
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> </head> <body leftmargin="0" topmargin="0" marginheight="0" marginwidth="0" bgcolor="#00349a"> <table width="100%" cellspacing="0" cellpadding="0"><tr><td bgcolor="#00349a"> <table cellspacing="0" cellpadding="0" border="0" bgcolor="#00349a" style="width: 751px"> <tr> <td> </td> </tr> </table> <table cellspacing="0" cellpadding="0" border="0" bordercolor="red" bgcolor="#00349a" style="width: 750px"> <tr> <td width="10"> </td> <td valign="top"> <table height="29" cellspacing="0" cellpadding="0" border="0" bgcolor="#cc3433" style="width: 155px"> <tr> <td width="146" valign="top" align="center"> <font face="Verdana,Tahoma,sans-serif" color="#FFFFFF" size="1" style="text-decoration:none"> <strong>Having trouble viewing this e-mail?<br> <br> <a href="http://bpvr87.bucsimir.cn/?qwozeofq=eoegqhiat&gocjcjwyfa=e11b404a1&webuvo=sejveezyil" frontuid="8107202" ><font face="Verdana,Tahoma,sans-serif" color="#ffcc66" size="1" style="text-decoration:none;">View this email on the web »</a></a></strong><br> </font></td> </tr> </table> </td> <td width="548" valign="top" style="text-align: center"> <br><br><a href="http://bftj62.bucsimir.cn/?ezqgugek=ibalop&poho=e11b404a1&oyhufjvacu=imeyeza" frontuid="8107202" ><img alt="Click here to open" src="http://bftj62.bucsimir.cn/spacer.gif" style="color: #FFFFFF" border="0"></a><br> <br> </td> </tr> </table> <table border="0" cellpadding="10" cellspacing="0" bgcolor="#00349a" style="width: 750px"> <tr valign="top"> <td><font face="Verdana, Arial, sans-serif" size="1" color="#ffffff"><br> © 2009 Xjer Media. All rights reserved.<br><br>Please review our <a href="http://xcb72.bucsimir.cn/?yqxilyirui=ciami&ugqak=e11b404a1&poeboudu=yxyyna" style="text-decoration: none;" frontuid="8107202" ><font face="Verdana, Arial, sans-serif" size="1" color="#ffffff">Privacy Policy »</font></a><br> <br> You received this message because you registered to receive commercial e-mail messages from us. If you no longer wish to receive commercial e-mail messages from us, please <a href="http://bftj62.bucsimir.cn/?pahojupa=qgeijd&egqgjq=e11b404a1&hecqvic=vosa" style="text-decoration: none;" frontuid="8107202" ><font face="Verdana, Arial, sans-serif" size="1" color="#ffffff">click here to unsubscribe »</font></a></font></td> </tr> </table> </td></tr></table> </body> </html> |
From: Chris P. <kc...@gm...> - 2009-06-20 09:44:36
|
Hi Derrick, Thanks. I'll test that out. Since I am reading a site, I wonder if they put it in for things like HTML parser. Chris On 20/06/2009, at 6:53 PM, Derrick Oswald <der...@gm...> wrote: > OK, the other possibility is that the utf-9 is specified in the HTTP > header - which you don't see unless you add a ConnectionMonitor and > look at the header with something like org.htmlparser.http.HttpHeader. > > Try: > > parser.getConnectionManager ().setMonitor (parser); > > > On Sat, Jun 20, 2009 at 8:34 AM, Chris Palmer > <cp...@kc...> wrote: > Hello! > > If you view source, it says UTF-8,is that still possible if the > actual content says UTF-8? > > Chris > > > On Sat, Jun 20, 2009 at 2:54 PM, Derrick Oswald <der...@gm... > > wrote: > > The charset specified by the HTTP server in the header or by the > HTML page itself is not found in the list of character sets known to > the Java Virctual Machine. It's likely the server got hacked and > somebody changed the "utf-8" in the html source to "utf-9" because > that doesn't exist, see: > http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 > > You can inform the site that it has an error. > > On Sat, Jun 20, 2009 at 1:20 AM, Chris Palmer > <cp...@kc...> wrote: > Hello! > > > I have been using html parser for a long time and all of a sudden > one of the applications I use it in reported this: > > unable to determine cannonical charset name for utf-9 - using > ISO-8859-1 > > I am a little dumb about these sort of things, however is there > something I could look for in the source page that could be causing > this? > > Chris > > > --- > --- > --- > --------------------------------------------------------------------- > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of > unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > --- > --- > --- > --------------------------------------------------------------------- > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of > unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > --- > --- > --- > --------------------------------------------------------------------- > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of > unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > --- > --- > --- > --------------------------------------------------------------------- > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of > unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Derrick O. <der...@gm...> - 2009-06-20 08:53:10
|
OK, the other possibility is that the utf-9 is specified in the HTTP header - which you don't see unless you add a ConnectionMonitor and look at the header with something like org.htmlparser.http.HttpHeader. Try: parser.getConnectionManager ().setMonitor (parser); On Sat, Jun 20, 2009 at 8:34 AM, Chris Palmer <cp...@kc...>wrote: > Hello! > If you view source, it says UTF-8,is that still possible if the actual > content says UTF-8? > > Chris > > > On Sat, Jun 20, 2009 at 2:54 PM, Derrick Oswald <der...@gm...>wrote: > >> >> The charset specified by the HTTP server in the header or by the HTML page >> itself is not found in the list of character sets known to the Java Virctual >> Machine. It's likely the server got hacked and somebody changed the "utf-8" >> in the html source to "utf-9" because that doesn't exist, see: >> http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 >> >> You can inform the site that it has an error. >> >> On Sat, Jun 20, 2009 at 1:20 AM, Chris Palmer <cp...@kc...>wrote: >> >>> Hello! >>> >>> I have been using html parser for a long time and all of a sudden one of >>> the applications I use it in reported this: >>> >>> unable to determine cannonical charset name for utf-9 - using ISO-8859-1 >>> >>> I am a little dumb about these sort of things, however is there something >>> I could look for in the source page that could be causing this? >>> >>> Chris >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Are you an open source citizen? Join us for the Open Source Bridge >>> conference! >>> Portland, OR, June 17-19. Two days of sessions, one day of unconference: >>> $250. >>> Need another reason to go? 24-hour hacker lounge. Register today! >>> >>> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org >>> _______________________________________________ >>> Htmlparser-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Are you an open source citizen? Join us for the Open Source Bridge >> conference! >> Portland, OR, June 17-19. Two days of sessions, one day of unconference: >> $250. >> Need another reason to go? 24-hour hacker lounge. Register today! >> >> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > > ------------------------------------------------------------------------------ > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of unconference: > $250. > Need another reason to go? 24-hour hacker lounge. Register today! > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Chris P. <cp...@kc...> - 2009-06-20 06:34:15
|
Hello! If you view source, it says UTF-8,is that still possible if the actual content says UTF-8? Chris On Sat, Jun 20, 2009 at 2:54 PM, Derrick Oswald <der...@gm...>wrote: > > The charset specified by the HTTP server in the header or by the HTML page > itself is not found in the list of character sets known to the Java Virctual > Machine. It's likely the server got hacked and somebody changed the "utf-8" > in the html source to "utf-9" because that doesn't exist, see: > http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 > > You can inform the site that it has an error. > > On Sat, Jun 20, 2009 at 1:20 AM, Chris Palmer <cp...@kc...>wrote: > >> Hello! >> >> I have been using html parser for a long time and all of a sudden one of >> the applications I use it in reported this: >> >> unable to determine cannonical charset name for utf-9 - using ISO-8859-1 >> >> I am a little dumb about these sort of things, however is there something >> I could look for in the source page that could be causing this? >> >> Chris >> >> >> >> ------------------------------------------------------------------------------ >> Are you an open source citizen? Join us for the Open Source Bridge >> conference! >> Portland, OR, June 17-19. Two days of sessions, one day of unconference: >> $250. >> Need another reason to go? 24-hour hacker lounge. Register today! >> >> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > > ------------------------------------------------------------------------------ > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of unconference: > $250. > Need another reason to go? 24-hour hacker lounge. Register today! > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Derrick O. <der...@gm...> - 2009-06-20 04:55:03
|
The charset specified by the HTTP server in the header or by the HTML page itself is not found in the list of character sets known to the Java Virctual Machine. It's likely the server got hacked and somebody changed the "utf-8" in the html source to "utf-9" because that doesn't exist, see: http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 You can inform the site that it has an error. On Sat, Jun 20, 2009 at 1:20 AM, Chris Palmer <cp...@kc...>wrote: > Hello! > > I have been using html parser for a long time and all of a sudden one of > the applications I use it in reported this: > > unable to determine cannonical charset name for utf-9 - using ISO-8859-1 > > I am a little dumb about these sort of things, however is there something I > could look for in the source page that could be causing this? > > Chris > > > > ------------------------------------------------------------------------------ > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of unconference: > $250. > Need another reason to go? 24-hour hacker lounge. Register today! > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Chris P. <cp...@kc...> - 2009-06-19 23:20:42
|
Hello! I have been using html parser for a long time and all of a sudden one of the applications I use it in reported this: unable to determine cannonical charset name for utf-9 - using ISO-8859-1 I am a little dumb about these sort of things, however is there something I could look for in the source page that could be causing this? Chris |
From: Derrick O. <der...@gm...> - 2009-06-12 19:04:35
|
It may depend on what the server replies in the HTTP header and not what is in the actual text of the response. Sorry. It may be just a misconfigured server. Try debugging the return packet (the parser implements HttpConnectionMonitor) and see what the HTTP header has in it. On Fri, Jun 12, 2009 at 7:54 PM, Frank Langelage <Fra...@20...>wrote: > Hello, > > > > parsing the page http://www.nbp.pl/Kursy/KursyA.html leads to an > EncodingChangeException. > > The htmlparser uses the default encoding ISO-8859-1 and does not recognize > the charset UTF-8 given in the head of this html-page. > > Setting the encoding to UTF-8 using Parser.setEncoding() works, but > shouldn’t this work without? > > Regards > > Frank Langelage > > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Derrick O. <der...@gm...> - 2009-06-12 19:01:07
|
Since StringBean implements NodeVisitor you can list.visitAllNodesWith (myStringbean); On Fri, Jun 12, 2009 at 7:10 PM, Kasper Sÿfffffffffff8rensen < kas...@ya...> wrote: > Hello Htmlpaser users! > > I am trying to use the StringBean class to my benefit. It seems it > functions very well when I give it an URL. > > But I would like to use it, if I already have all the content (ie, I have > the html as a String). Is there any way to do this? Ive tried using the > parser and getting the nodelist: > > Parser parser = new Parser(); > parser.setInputHTML("<html><body>test</body></html>"); > NodeList list = parser.parse(null); > > But then I dont know what to do, or if Im doing anything that makes sence. > > Thanks > Kasper > > ------------------------------ > > Skal du købe ny bil? Sammenlign priser på brugte biler med Kelkoo og find > et godt tilbud! <http://dk.yahoo.com/r/pat/mmb> > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Frank L. <Fra...@20...> - 2009-06-12 18:07:27
|
Hello, parsing the page http://www.nbp.pl/Kursy/KursyA.html leads to an EncodingChangeException. The htmlparser uses the default encoding ISO-8859-1 and does not recognize the charset UTF-8 given in the head of this html-page. Setting the encoding to UTF-8 using Parser.setEncoding() works, but shouldn't this work without? Regards Frank Langelage |
From: Kasper S. <kas...@ya...> - 2009-06-12 17:37:35
|
Hello Htmlpaser users! I am trying to use the StringBean class to my benefit. It seems it functions very well when I give it an URL. But I would like to use it, if I already have all the content (ie, I have the html as a String). Is there any way to do this? Ive tried using the parser and getting the nodelist: Parser parser = new Parser(); parser.setInputHTML("<html><body>test</body></html>"); NodeList list = parser.parse(null); But then I dont know what to do, or if Im doing anything that makes sence. Thanks Kasper Trænger du til at se det store billede? Kelkoo giver dig gode tilbud på LCD TV! Se her http://dk.yahoo.com/r/pat/lcd |
From: 邓作霖 <pse...@pe...> - 2009-05-25 09:11:45
|
Hi derrick Sorry for reply late,I ran this solution in my groovy codes sccuessfuly, thank you very much. Derrick Oswald wrote: > > Parse the page into a list, then operate on the list, and finally > output the list with toHtml: > > NodeList list = parser.parse (null); // or whatever > ... do things to your list of nodes, like filter, replace etc. > ... most operations on the parser also operate on a NodeList > String text = list.toHtml (); > System.out.println (text); > > > > On Fri, May 22, 2009 at 4:23 AM, 邓作霖 <pse...@pe... > <mailto:pse...@pe...>> wrote: > > Hello,every one > > I am a new guy to use the HTMLParser, now I can locate the tag by > Filters and change tag's attribute by setAttribute(), but how do I > apply those changes to HTML source or get the all HTML source changed? > > Thank you very much for your help! > > -- > ┌===天津飞马软件系统有限公司===┐ > │ 软件开发部落 邓作霖 │ > │ TEL:022-23971918 │ > │ FAX:022-23963169 │ > └==================┘ > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp asthey present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://www.creativitycat.com > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > <mailto:Htm...@li...> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian > Group, R/GA, & Big Spaceship. http://www.creativitycat.com > ------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- ┌===天津飞马软件系统有限公司===┐ │ 软件开发部落 邓作霖 │ │ TEL:022-23971918 │ │ FAX:022-23963169 │ └==================┘ |
From: Derrick O. <der...@gm...> - 2009-05-23 15:17:40
|
Parse the page into a list, then operate on the list, and finally output the list with toHtml: NodeList list = parser.parse (null); // or whatever ... do things to your list of nodes, like filter, replace etc. ... most operations on the parser also operate on a NodeList String text = list.toHtml (); System.out.println (text); On Fri, May 22, 2009 at 4:23 AM, 邓作霖 <pse...@pe...> wrote: > Hello,every one > > I am a new guy to use the HTMLParser, now I can locate the tag by > Filters and change tag's attribute by setAttribute(), but how do I > apply those changes to HTML source or get the all HTML source changed? > > Thank you very much for your help! > > -- > ┌===天津飞马软件系统有限公司===┐ > │ 软件开发部落 邓作霖 │ > │ TEL:022-23971918 │ > │ FAX:022-23963169 │ > └==================┘ > > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity professionals. > Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian > Group, R/GA, & Big Spaceship. http://www.creativitycat.com > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: 邓作霖 <pse...@pe...> - 2009-05-22 02:56:26
|
Hello,every one I am a new guy to use the HTMLParser, now I can locate the tag by Filters and change tag's attribute by setAttribute(), but how do I apply those changes to HTML source or get the all HTML source changed? Thank you very much for your help! -- ┌===天津飞马软件系统有限公司===┐ │ 软件开发部落 邓作霖 │ │ TEL:022-23971918 │ │ FAX:022-23963169 │ └==================┘ |
From: Joshua K. <jo...@in...> - 2009-05-14 04:29:25
|
Have you considered using any of the Visitors in the htmlparser? --jk 2009/5/13 林森 <sc...@gm...> > hello, > i am using htmlparser to extract text from a webpage( > http://news.sina.com.cn/c/2009-05-13/024915613519s.shtml). > I have written a function "extractText" to deal with this problem. > That function use recursion to process node,and is designed not to visit > LinkTag,ScriptTag,RemarkNode,StyleTag etc,so the function wil return when it > encounter these nodes. > But the result I got is not so satisfied. > I find that there exists some scriptcode, as follows: > > '; }else if(Id==1){ if(GetObj("hotwords_link").innerHTML == ""){ > GetObj("hotwords").style.display = "none"; }else{ > GetObj("hotwords").style.display = "block"; } GetObj("pbg").innerHTML = ''; > } } } > > I try to recognize which node contains that code,and print the sibling node > of that node,the result is: > > ------------------Previous sibling begin----------------------: > /a > script type="text/javascript" > > -------------------------Previous sibling end---------------------. > --------------------Next sibling begin:----------------------- > a href=" > http://www.google.cn/webhp?client=aff-sina&ie=gb&oe=utf8&hl=zh-CN&channel=contentlogo" > target="_blank" style="text-decoration:none;" > '; > } > } > } > > /script > > script type="text/javascript" > > script type="text/javascript" > > table cellspacing="0" width="589" > > 热搜代码 > > style type="text/css" > > div id="hotwords" style="height:20px; overflow:hidden; margin:10px 0 0 0; > display:none;" > > ----------------------------Next sibling > end.--------------------------------------------- > > I guessed HtmlParser make some mistake when it encountered ' . > How can I solve this problem,? I really want to exclude any of the script > code. > > Thanks. > > > > > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK > i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -- best regards, jk Industrial Logic, Inc. Joshua Kerievsky Founder, Extreme Programmer & Coach http://industriallogic.com 866-540-8336 (toll free) 510-540-8336 (phone) Berkeley, California Learn Code Smells, Refactoring and TDD at http://industriallogic.com/elearning |
From: 林森 <sc...@gm...> - 2009-05-14 03:58:45
|
hello, i am using htmlparser to extract text from a webpage( http://news.sina.com.cn/c/2009-05-13/024915613519s.shtml). I have written a function "extractText" to deal with this problem. That function use recursion to process node,and is designed not to visit LinkTag,ScriptTag,RemarkNode,StyleTag etc,so the function wil return when it encounter these nodes. But the result I got is not so satisfied. I find that there exists some scriptcode, as follows: '; }else if(Id==1){ if(GetObj("hotwords_link").innerHTML == ""){ GetObj("hotwords").style.display = "none"; }else{ GetObj("hotwords").style.display = "block"; } GetObj("pbg").innerHTML = ''; } } } I try to recognize which node contains that code,and print the sibling node of that node,the result is: ------------------Previous sibling begin----------------------: /a script type="text/javascript" -------------------------Previous sibling end---------------------. --------------------Next sibling begin:----------------------- a href=" http://www.google.cn/webhp?client=aff-sina&ie=gb&oe=utf8&hl=zh-CN&channel=contentlogo" target="_blank" style="text-decoration:none;" '; } } } /script script type="text/javascript" script type="text/javascript" table cellspacing="0" width="589" 热搜代码 style type="text/css" div id="hotwords" style="height:20px; overflow:hidden; margin:10px 0 0 0; display:none;" ----------------------------Next sibling end.--------------------------------------------- I guessed HtmlParser make some mistake when it encountered ' . How can I solve this problem,? I really want to exclude any of the script code. Thanks. |
From: Adela L. <htm...@li...> - 2009-05-05 23:24:25
|
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>Health & Beauty</title> <script language="XML" xmlns:annuncio='http://www.annuncio.com'><annuncio:head/></script> </head> <body style="margin: 0px; background-color: #F46C94;" link="#7A3B96"> <script language="XML" xmlns:annuncio='http://www.annuncio.com'> <annuncio:body/></script> <div align="center" style="margin-top:10px; margin-bottom:10px; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:10px; color: #333333;">If you have trouble viewing this e-mail, please <a href="http://www.jupkevet.cn/">click here</a>.</div> <table width="554" border="0" cellspacing="0" cellpadding="0" align="center"> <tr> <td colspan="3"><img src="http://phobos.doctorspreferred.com/images/whan/lark2_topimage.jpg" width="554" height="370" /></td> </tr> <tr> <td width="36" background="http://phobos.doctorspreferred.com/images/whan/email2_leftspacer.gif" bgcolor="#F7E6EB"><img src="http://phobos.doctorspreferred.com/images/whan/email2_leftspacer.gif" width="36" height="1" /></td> <td width="472" bgcolor="#F7E6EB"><p align="center"><font color="#EC0E8C" face="Georgia, Times New Roman, Times, serif" size="8"><b><a href="http://www.jupkevet.cn/">Everyone</a><br /> <a href="http://www.jupkevet.cn/">Will Want</a> <br /> <font size="6"><a href="http://www.jupkevet.cn/">Your New Secret</a></font></a></b></font></p> <p align="center"><a href="http://www.jupkevet.cn/"> <img alt="" src="http://www.jupkevet.cn/10.gif" style="border-width: 0px" width="470" height="320"></a></p> <p align="center"><font face="Georgia, Times New Roman, Times, serif" size="5">Discover the secret today!<br /> <a href="http://www.jupkevet.cn/">Click here for details</a></font></p></td> <td width="46" background="http://phobos.doctorspreferred.com/images/whan/email2_rightspacer.gif" bgcolor="#F7E6EB"><img src="http://phobos.doctorspreferred.com/images/whan/email2_rightspacer.gif" width="46" height="1" /></td> </tr> <tr> <td colspan="3"><img src="http://phobos.doctorspreferred.com/images/whan/lark2_bottom.gif" width="554" height="17" /></td> </tr> </table> <p align="center"><font color="#333333" size="2" face="Verdana, Arial, Helvetica, sans-serif">To review our Privacy Policy, please <strong><a href="http://www.jupkevet.cn/">click here</a></strong>.</font></p> <p align="center" style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:10px; color:#000000; line-height:14px;"> To ensure the delivery of your informative updates from Dr. Lark and the Daily Balance<br /> Team, please add <strong><a href="mailto:htm...@li...">htm...@li...</a> </strong> to your email address book. </p> <p align="center"><font size="1" face="Verdana, Arial, Helvetica, sans-serif">************TO UNSUBSCRIBE************<br /> You are receiving this e-mail at htm...@li... because you <br /> indicated an interest in receiving special updates and offers from Dr. Lark.<br /> We hope that you find these updates helpful, but if you would rather not<br /> receive them, you can unsubscribe by <a href="http://www.jupkevet.cn/">clicking here</a>. You will be<br /> immediately unsubscribed from our database. Remember, your personal information <br /> will only be used by Healthy Directions, LLC, for editorial and marketing purposes. <br /> Thank you. </font></p> <p align="center"><font size="1" face="Verdana, Arial, Helvetica, sans-serif"><em>Daily Balance<br /> 700 Indian Springs Drive<br /> Lancaster, PA 17601</em></font></p> </body> </html> |