htmlparser-user Mailing List for HTML Parser (Page 17)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Athar S. S. <ath...@gm...> - 2009-04-19 02:45:06
|
This is perhaps the only sourceforge mailing list with spam. You guys must be doing something right to get the clandestine world of spammers interested in you. (the clandestine world of spammers and the search engine companies like google) -- Shiraz 500 Riverside Drive #425 New York, NY 10027 (703) 879-8342 (skype prefer) (571) 276 2404 (cell) (212) 316 8630 (landline) (extn 8630) |
From: Athar S. S. <ath...@gm...> - 2009-04-19 02:22:33
|
I just found a problem with sitecapturer. So line 686 says : if (isToBeCaptured (image)) when you go inside istobecaptured the images are not recognized as being part of the website. The code here is the culprit : return ( link.toLowerCase ().contains (getSource ().toLowerCase ()) && (-1 == link.indexOf ("?")) && (-1 == link.indexOf ("#"))); Here the source is http://www1.cs.columbia.edu/~shiraz/psetv001.htm and the link maybe : http://www1.cs.columbia.edu/~shiraz/psetv001_files/image001.jpg so the above keeps returning false!... On Sat, Apr 18, 2009 at 9:22 PM, Athar Shiraz Siddiqui <ath...@gm...> wrote: > Good evening everyone, > > I am trying to simply screen scrape or download html and images of a > webpage. I cannot do it however with sitecapturer. Could someone > indicate why I cannot find the images when I run the following code : > > worker = new SiteCapturer (); > worker.setSource ("http://www1.cs.columbia.edu/~shiraz/psetv001.htm"); > worker.setTarget ("C:\\Temp\\set\\download1"); > //C:\\Temp\\set\\download1 > worker.setCaptureResources (true); > worker.capture (); > System.out.println("Done!"); > > I stepped through the code and there dont seem to be any images in the > mImages array. > > I would merely like to get the text of a website and a handle on the > accompanying images. Thanks! > > > -- > Shiraz > > 500 Riverside Drive > #425 > New York, NY 10027 > (703) 879-8342 (skype prefer) > (571) 276 2404 (cell) > (212) 316 8630 (landline) (extn 8630) > -- Shiraz 500 Riverside Drive #425 New York, NY 10027 (703) 879-8342 (skype prefer) (571) 276 2404 (cell) (212) 316 8630 (landline) (extn 8630) |
From: Athar S. S. <ath...@gm...> - 2009-04-19 01:23:10
|
Good evening everyone, I am trying to simply screen scrape or download html and images of a webpage. I cannot do it however with sitecapturer. Could someone indicate why I cannot find the images when I run the following code : worker = new SiteCapturer (); worker.setSource ("http://www1.cs.columbia.edu/~shiraz/psetv001.htm"); worker.setTarget ("C:\\Temp\\set\\download1"); //C:\\Temp\\set\\download1 worker.setCaptureResources (true); worker.capture (); System.out.println("Done!"); I stepped through the code and there dont seem to be any images in the mImages array. I would merely like to get the text of a website and a handle on the accompanying images. Thanks! -- Shiraz 500 Riverside Drive #425 New York, NY 10027 (703) 879-8342 (skype prefer) (571) 276 2404 (cell) (212) 316 8630 (landline) (extn 8630) |
From: Athar S. S. <ath...@gm...> - 2009-04-19 00:59:27
|
I am using the following snippet in site capturer to save images but it wont save any images. What is going on? worker = new SiteCapturer (); worker.setSource ("http://www1.cs.columbia.edu/~shiraz/psetv001.htm"); // http://www1.cs.columbia.edu/~shiraz/psetv001.htm worker.setTarget ("C:\\Temp\\set\\download1"); //C:\\Temp\\set\\download1 worker.setCaptureResources (true); worker.capture (); System.out.println("Done!"); System.exit (0); -- Shiraz 500 Riverside Drive #425 New York, NY 10027 (703) 879-8342 (skype prefer) (571) 276 2404 (cell) (212) 316 8630 (landline) (extn 8630) |
From: Snir K. <sk...@gm...> - 2009-03-31 07:48:00
|
Hi all, I'm trying to leverage HTMLParser to extract proximity/layout properties (as one would be able to do through using the DOM and offsetWidth/offsetHeight recursively on parents to a given element). Is this something I can accomplish with the API, and if so, how? Thanks for all the help. Cheers, Nick |
From: Pony N. <nth...@gm...> - 2009-03-31 07:21:44
|
- Pony Onthusitse Nthatsi +267 71467530 |
From: Aravind R P. <Ara...@in...> - 2009-03-25 03:10:33
|
Dint understand :) -----Original Message----- From: alaeddine [mailto:ala...@sa...] Sent: Tuesday, March 24, 2009 8:22 PM To: htm...@li... Subject: [Htmlparser-user] extract table Hi when i test the next code ///////////////////////// Parser parser = new Parser(url); NodeList nl = parser.parse(null); for (NodeIterator iterator = n1.elements(); iterator.hasMoreNodes();) { Node node = iterator.nextNode(); if (node instanceof Tag) { Tag tag = (Tag) node; ////////////////// i usually have a result outside test ' if (node instanceof Tag) {' so how i can progress in the next node and test if the name of the tag is body or not? Thank you for your help > Message: 2 > Date: Tue, 24 Mar 2009 12:31:49 +0100 > From: "alaeddine" <ala...@sa...> > Subject: [Htmlparser-user] Help me > To: <htm...@li...> > Message-ID: <E716CD9DBF704D9A9E982AB95DE9BDE7@aladin> > Content-Type: text/plain; charset="iso-8859-1" > > Hi > > I would to extract a table from a html url and i cant make a filter > > please help me to do this > > Thank you for your help > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 3 > Date: Tue, 24 Mar 2009 17:24:44 +0530 > From: Aravind R Pillai <Ara...@in...> > Subject: Re: [Htmlparser-user] Help me > To: htmlparser user list <htm...@li...> > Message-ID: > <E92...@BL...> > > Content-Type: text/plain; charset="us-ascii" > > Hi > > Parser parser = new Parser(url); > NodeList nl = parser.parse(null); > > This will give u firsrt set of all nodes. Like every node that's is inside > the <html> tag. > > for (NodeIterator iterator = n1.elements(); iterator.hasMoreNodes();) { > Node node = iterator.nextNode(); > if (node instanceof Tag) { > Tag tag = (Tag) node; > This way u will get every node and cast it to tag from that u can get the > tag name.compare it to "BODY". > Once tag body is obtained take the children and repeat the same process > using for loop until u get tag name "TABLE". > > U have to iterate through every tag.no other way.. try using a recursion. > > From: alaeddine [mailto:ala...@sa...] > Sent: Tuesday, March 24, 2009 5:02 PM > To: htm...@li... > Subject: [Htmlparser-user] Help me > > Hi > > I would to extract a table from a html url and i cant make a filter > > please help me to do this > > Thank you for your help > > **************** CAUTION - Disclaimer ***************** > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended > solely > for the use of the addressee(s). If you are not the intended recipient, > please > notify the sender by e-mail and delete the original message. Further, you > are not > to copy, disclose, or distribute this e-mail or its contents to any other > person and > any such actions are unlawful. This e-mail may contain viruses. Infosys > has taken > every reasonable precaution to minimize this risk, but is not liable for > any damage > you may sustain as a result of any virus in this e-mail. You should carry > out your > own virus checks before opening the e-mail or attachment. Infosys reserves > the > right to monitor and review the content of all messages sent to or from > this e-mail > address. Messages sent to or from this e-mail address may be stored on the > Infosys e-mail system. > ***INFOSYS******** End of Disclaimer ********INFOSYS*** > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > easily build your RIAs with Flex Builder, the Eclipse(TM)based development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com > > ------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > End of Htmlparser-user Digest, Vol 30, Issue 2 > ********************************************** ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: y. <qik...@16...> - 2009-03-25 02:26:28
|
why not use org.htmlparser.filters.NodeClassFilter? 2009-03-25 yangqike 发件人: alaeddine 发送时间: 2009-03-24 19:51:20 收件人: htm...@li... 抄送: 主题: [Htmlparser-user] Help me Hi I would to extract a table from a html url and i cant make a filter please help me to do this Thank you for your help |
From: alaeddine <ala...@sa...> - 2009-03-24 14:51:56
|
Hi when i test the next code ///////////////////////// Parser parser = new Parser(url); NodeList nl = parser.parse(null); for (NodeIterator iterator = n1.elements(); iterator.hasMoreNodes();) { Node node = iterator.nextNode(); if (node instanceof Tag) { Tag tag = (Tag) node; ////////////////// i usually have a result outside test ' if (node instanceof Tag) {' so how i can progress in the next node and test if the name of the tag is body or not? Thank you for your help > Message: 2 > Date: Tue, 24 Mar 2009 12:31:49 +0100 > From: "alaeddine" <ala...@sa...> > Subject: [Htmlparser-user] Help me > To: <htm...@li...> > Message-ID: <E716CD9DBF704D9A9E982AB95DE9BDE7@aladin> > Content-Type: text/plain; charset="iso-8859-1" > > Hi > > I would to extract a table from a html url and i cant make a filter > > please help me to do this > > Thank you for your help > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 3 > Date: Tue, 24 Mar 2009 17:24:44 +0530 > From: Aravind R Pillai <Ara...@in...> > Subject: Re: [Htmlparser-user] Help me > To: htmlparser user list <htm...@li...> > Message-ID: > <E92...@BL...> > > Content-Type: text/plain; charset="us-ascii" > > Hi > > Parser parser = new Parser(url); > NodeList nl = parser.parse(null); > > This will give u firsrt set of all nodes. Like every node that's is inside > the <html> tag. > > for (NodeIterator iterator = n1.elements(); iterator.hasMoreNodes();) { > Node node = iterator.nextNode(); > if (node instanceof Tag) { > Tag tag = (Tag) node; > This way u will get every node and cast it to tag from that u can get the > tag name.compare it to "BODY". > Once tag body is obtained take the children and repeat the same process > using for loop until u get tag name "TABLE". > > U have to iterate through every tag.no other way.. try using a recursion. > > From: alaeddine [mailto:ala...@sa...] > Sent: Tuesday, March 24, 2009 5:02 PM > To: htm...@li... > Subject: [Htmlparser-user] Help me > > Hi > > I would to extract a table from a html url and i cant make a filter > > please help me to do this > > Thank you for your help > > **************** CAUTION - Disclaimer ***************** > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended > solely > for the use of the addressee(s). If you are not the intended recipient, > please > notify the sender by e-mail and delete the original message. Further, you > are not > to copy, disclose, or distribute this e-mail or its contents to any other > person and > any such actions are unlawful. This e-mail may contain viruses. Infosys > has taken > every reasonable precaution to minimize this risk, but is not liable for > any damage > you may sustain as a result of any virus in this e-mail. You should carry > out your > own virus checks before opening the e-mail or attachment. Infosys reserves > the > right to monitor and review the content of all messages sent to or from > this e-mail > address. Messages sent to or from this e-mail address may be stored on the > Infosys e-mail system. > ***INFOSYS******** End of Disclaimer ********INFOSYS*** > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > easily build your RIAs with Flex Builder, the Eclipse(TM)based development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com > > ------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > End of Htmlparser-user Digest, Vol 30, Issue 2 > ********************************************** |
From: Aravind R P. <Ara...@in...> - 2009-03-24 11:55:16
|
Hi Parser parser = new Parser(url); NodeList nl = parser.parse(null); This will give u firsrt set of all nodes. Like every node that's is inside the <html> tag. for (NodeIterator iterator = n1.elements(); iterator.hasMoreNodes();) { Node node = iterator.nextNode(); if (node instanceof Tag) { Tag tag = (Tag) node; This way u will get every node and cast it to tag from that u can get the tag name.compare it to "BODY". Once tag body is obtained take the children and repeat the same process using for loop until u get tag name "TABLE". U have to iterate through every tag.no other way.. try using a recursion. From: alaeddine [mailto:ala...@sa...] Sent: Tuesday, March 24, 2009 5:02 PM To: htm...@li... Subject: [Htmlparser-user] Help me Hi I would to extract a table from a html url and i cant make a filter please help me to do this Thank you for your help **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS*** |
From: alaeddine <ala...@sa...> - 2009-03-24 11:49:41
|
Hi I would to extract a table from a html url and i cant make a filter please help me to do this Thank you for your help |
From: yangqike <qik...@16...> - 2009-03-17 08:14:16
|
org.htmlparser.tags.HeadTag 2009-03-17 yangqike 发件人: Aravind R Pillai 发送时间: 2009-03-17 14:35:24 收件人: htm...@li... 抄送: 主题: [Htmlparser-user] Regarding the Head Tag Hi Am pretty new to Html Parser and needs help in extracting and editing a particular set of tags in the html. I was going through the tutorial and I found this bit of code. Head head = heads.elementAt (0); I can’t find the “Head” class. Can anyone please help me. The e.g.: is listed in http://htmlparser.sourceforge.net/javadoc/index.html in parse method. Any help is greatly appreciated. Regards, Aravind R Pillai |
From: Aravind R P. <Ara...@in...> - 2009-03-17 06:31:04
|
Hi Am pretty new to Html Parser and needs help in extracting and editing a particular set of tags in the html. I was going through the tutorial and I found this bit of code. Head head = heads.elementAt (0); I can't find the "Head" class. Can anyone please help me. The e.g.: is listed in http://htmlparser.sourceforge.net/javadoc/index.html in parse method. Any help is greatly appreciated. Regards, Aravind R Pillai **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS*** |
From: <qik...@16...> - 2009-02-22 12:06:58
|
CompositeTagScanner.java Line 125 pls comment else if (isTagToBeEndedFor (ret, next)) // check DTD => //else if (isTagToBeEndedFor (ret, next)) // check DTD for the other tag ,you can also comment the balance code for the style tag StyleScanner.java comment the following code /* // build new end tag if required if (null == node) { attribute = new Attribute ("/style", null); vector = new Vector (); vector.addElement (attribute); node = lexer.getNodeFactory ().createTagNode ( lexer.getPage (), position, position, vector); } */ ======= 2009-02-22 17:09 11:09:25 Roy Michael 您在来信中写到: [Htmlparser-user] Avoid tag balancing======= I am using version 1.6, and I am wondering if there is a way to avoid / disable the tag balancing operation when using the parser. I still need the parser functionality over the laxer, but I want to avoid the tag balancing operations, even if the document is malformed. Roy. = = = = = = = = = = = = = = = = = = = = |
From: Roy M. <roy...@gm...> - 2009-02-22 09:09:47
|
I am using version 1.6, and I am wondering if there is a way to avoid / disable the tag balancing operation when using the parser. I still need the parser functionality over the laxer, but I want to avoid the tag balancing operations, even if the document is malformed. Roy. |
From: Pony N. <nth...@gm...> - 2009-02-16 06:48:09
|
-- Pony Onthusitse Nthatsi +267 71467530 |
From: Pony N. <nth...@gm...> - 2009-02-16 06:46:45
|
-- Pony Onthusitse Nthatsi +267 71467530 |
From: yangqike <qik...@16...> - 2009-02-12 03:17:45
|
case1:do you mean you want to repalce all the LinkTag to some Other tag(MyTag) ? case2:or you just want to find all the LinkTag and then modify the href property? solution of case 1:just use the NodeClassFilter and then copy all the property to you MyTag(rember to remove the LinkTag and add Your tag) solution of case 2:just use the NodeClassFilter and then process each the returned nodeList is my understanding right? if a am wrong,please ingore above. 2009-02-12 yangqike 发件人: Randy Paries 发送时间: 2009-02-11 23:36:24 收件人: htmlparser-user 抄送: 主题: [Htmlparser-user] replace links Not sure if anyone even responds to this list anymore i am trying to figure out a way of replacing all PDF links in a html doc i was thinking if using htmlparser i can use the parser to find all the LinkTag objects, but not sure how to or if it is possible to replace nodes as well as find them Thanks Randy ------------------------------------------------------------------------------ Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user . |
From: Randy P. <rtp...@gm...> - 2009-02-11 15:35:25
|
Not sure if anyone even responds to this list anymore i am trying to figure out a way of replacing all PDF links in a html doc i was thinking if using htmlparser i can use the parser to find all the LinkTag objects, but not sure how to or if it is possible to replace nodes as well as find them Thanks Randy |
From: Kadir V. <abd...@ba...> - 2009-02-05 15:13:22
|
How can I get the text inside the font tag. For example: <font id="bla">requested text</font> thanks |
From: Kadir V. <abd...@ba...> - 2009-02-05 14:14:07
|
How can I get the text inside the font tag. For example: <font id="bla">requested text</font> thanks |
From: Peter A. D. <pet...@gm...> - 2009-01-22 02:41:15
|
I'm using htmlparser very successfully for specific tag extraction, but am having trouble trying to implementing a plain text export for a "word count" function. I have spent half of today in JavaDoc and experimenting trying to get only the "printable" words on a page. I cannot get the javascript to not be included, although I'm able to exclude the script tags themselves (script body still prints) using the NotFilter class combined with a ScriptTag filter. Am I not going about this correctly? Maybe a better question is how I should be going about trying to do this? I can think of complicated ways I could use brute force to make this work, but it seems as if there is a simple and elegant solution I am missing. Thank you for any help, -Pete |
From: Guo Y. <bli...@gm...> - 2009-01-18 08:20:01
|
Dear, I hope someone could give me a help. When I using HTML Parser to parse webpages and grab certain texts, I noticed that some texts shown in IE cannot be found in the source of HTML. I think they are generated by Javascript dynamically. So, is there a way to get the whole page with all the texts which have been generated by Javascript? Thank your for your patience. -- Yang Guo |
From: Thushara W. <th...@gm...> - 2009-01-15 01:23:41
|
Can HTMLParser follow redirects set by this type of meta tag: meta http-equiv="refresh" content="0;url=http://www.myblog.net/thatpage/" /> Seems like HTMLParser and HttpURLConnection follow the standard HTTP redirect, but not this meta refresh form of redirect. thanks, thushara |
From: Ian M. <ia...@ia...> - 2009-01-14 13:12:18
|
You can set the User-Agent property either at execution time (with a -d parameter) or using System.setProperty (the property name is http.agent). I think that should work, look around on the net for examples. However, if your demands are anything more complex than setting a User-Agent I'd recommend looking into the Apache HttpClient - http://hc.apache.org/httpclient-3.x/ as it's a much more fully featured library for HTTP requests. Ian 2008/12/18 abdulkadir vardar <abd...@ya...>: > Hello, > Is there a way to set user-agent property ? > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. > The future of the web can't happen without you. Join us at MIX09 to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |