htmlparser-user Mailing List for HTML Parser (Page 87)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Navid H.L. <na...@ya...> - 2002-12-27 23:34:17
|
Hi Somik, I still cann't do it. I installed ANT on my windows XP. Then using Eclipse, I logged to CVS Repositories, there by clicking on HEAD, and then htmlparser, I could see another set of folders and also build.xml 1.24. I right click on xml file, but could not RUN ANT did not appear!! May be my Eclipse configuration is wronge, or I am on wronge path!! I tried Run -> External Tools -->configure --> and then gave ANT name and Tool Location as [antfolder]\bootstrap\bin\ant in configuration window, but still I got error by running the External Tool --> ANT!! It looks kind of complecated now, but I will learn it any way. Any suggestion will be appreciated. Thanks for your help Navid --- Somik Raha <so...@ya...> wrote: > Hi Navid, > > > [3] go to the source directory where you see > > build.xml - and type ant in the command line. > > > > last step does not make sense to me, If I run ANT > on > > command line, and CVS is running on Eclipse, what > is > > the connection between these two? I think I should > > copy or download something or build.xml from CVS. > > CVS is not running on Eclipse. Eclipse supports CVS > through a plugin. There > is no connection b/w Ant and CVS. Ant is a build > mechanism (have you read > the docs on http://jakarta.apache.org/ant/ ?) > > Eclipse also supports ant. Simply right-click on > build.xml, and choose Run > Ant. That should create the release in the > distribution directory. The check > under your eclipse workspace, and navigate to the > directory (or if its > easier for you, do a search for zip files in windows > titled htmlparser* ). > > Regards, > Somik > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Joshua K. <jo...@in...> - 2002-12-27 18:33:44
|
Sam wrote: > The Visitor pattern sounds interesting, and I look forward to hearing > more about it. However, duplicated code itself is not IMO necessarily > an evil. It all depends on whether one thinks that the duplicated > components are going to diverge in functionality in the future. If you > are sure they are not, then fine, refactor away. At the moment, Somik and I speculate that 40% to 50% of the current html parser code base is pure fat -- unnecessary code that only serves to bloat the code base. Duplicate code, in subtle or not-so-subtle versions, is mostly responsible for the bloat. In my experience in industry, I'd say that 95% of the time, duplicate code is bad. > I guess my surprise at your (or perhaps Somik's) focus on refactoring > comes from the fact that while the htmlparser is a great piece of > software, the javadocs and other documentation could use some attention. > For example, I can't find any explanation in the javadocs or otherwise > of how the filters are supposed to work with the different scanners, or > what values they are allowed to take. The html parser is in sore need of refactoring Sam. In fact, sometimes code becomes unnecessarily complex, in which case people *really* need documentation. I confront such a problem by first asking if the code could be simplified so we didn't need so much documentation. In the event that we do need docs, it is best to write executable documentation. Are you familiar with executable documentation? > I generally work to "if it's not broken don't fix it", but I often add > "before you start fixing it, make sure your documentation is up to date". Software becomes brittle and bloated under the philosophy of "if it ain't broke, don't fix it." We have clients with code bases that are 2 millions lines of "working" speghetti code. They need lots of help to learn to do continuous refactoring. > Using the Visitor pattern may make it easier for clients to get the data > they need, but given that the htmlparser is "working" (well it works for > me), I would say that the more urgent issue here is making sure all the > documentation is up to date. I have a lot of positive things to say > about htmlparser, so don't take it the wrong way, when I say that the > biggest problem I've had in using it in the last few weeks is inadequate > javadocs. I'm not content to live with the world as it is Sam. If something isn't easy, it's wrong. The best software in the world is easy to use, it's self-explanatory. We should always strive for that. In the meantime, if you have tasks to complete and don't know how to do them because of lack of docs, I'd suggest you ask questions here. The best result will be executable documentation for the html parser. > >>Is there some efficiency reason why you want to refactor these methods > >>or is it just for neatness? > >> > >> > > > >Duplication removal is reason #1. > > > As I mention above. One should be careful of duplication removal for > the sake of it. And as I mentioned above, I completely disagree with you. I wonder, how much refactoring have you done in your career given you philosophy of "if it ain't broke, don't fix it?" Have you read Martin Fowler's landmark book, Refactoring? If not, I'd suggest you study it thoroughly - you'll be a better programmer for it. > > Removal of hard-coded logic is reason #2. > > > This is a good reason. However I get the feeling that introduction of > these Visitor classes will make the system conceptually more difficult > to use rather than easier. I would feel better if the current set up > was more fully documented before more complexity was added. As I also said above, if it ain't simple, it's wrong. Our changes will not add complexity - that would be foolish. > And even if the Visitor pattern is used, I would recommend leaving > methods like toPlainTextString() etc in place, but just making them > short cut implementations to certain kinds of visitor-using methods. > This will allow people who have yet to grasp the Visitor pattern > something to work with. People can always call deprecated methods. > If you are keen to see lots of people using htmlparser, I think that you > don't want people to have to come to terms with too many new concepts at > once. You say yourself that the Visitor pattern takes some getting used > to. I think the whole scanner concept takes some getting used to .... Our Visitor implementation is so trivial that folks won't even know they're using the pattern. > >Simplicity is reason #3: there is little reason to fatten the interfaces of > >tag and node classes with various data accumulation/alteration methods when > >one method and a variety of concrete Visitors can do the job with much less > >code. > > > well I would agree if you could guarantee that there will be no > divergence whatsoever in how the different methods will be used. If you > can create a flexible enough implementation of the Visitor pattern then > I guess that will support any possible divergence in the separate > methods. However, I think there is a reason to have a fatter interface, > in that convenience methods lower the barrier to entry for new users. Wouldn't it be marvelous if the barrier to entry was low and our code wasn't bloatware? > Perhaps ideally one has a well implemented Visitor pattern that supports > a raw method access, and a number of convenience methods? The changes will make the code easier to use - if they don't, they really cool thing about software is that it is soft - we can change it. > A well implemented Visitor pattern will, I assume, support all sorts of > different operations, but I would feel much happier if the htmlparser > had a complete javadoc and documentation review before any refactoring > took place. People are trying to use the existing system and having > trouble not because of the lack of refactoring, but a lack of well > described methods. Well I say people, I mean me, I don't know if anyone > else feels the same. Maybe it's just me :-) Executable documentation is a term we use for Customter Tests. A Customer Tests shows how a body of code gets used to perform real tasks. You have real-world tasks to perform. We can perform those tasks via automated tests. Then, when we refactor, we must make sure our executable documentation is up to date (i.e. it passes its tests). We find this ideal for making sure that documentation reflects what the code is actually doing. This also leaves room for a document that gives some graphical representation of how things work, which must be maintained by someone, lest it get out of date. We just minimize the need for such documents by keeping our code simple and small and surrounding it with easy to understand executable documentation. best regards jk |
From: Somik R. <so...@ya...> - 2002-12-27 02:43:21
|
Hi Navid, > [3] go to the source directory where you see > build.xml - and type ant in the command line. > > last step does not make sense to me, If I run ANT on > command line, and CVS is running on Eclipse, what is > the connection between these two? I think I should > copy or download something or build.xml from CVS. CVS is not running on Eclipse. Eclipse supports CVS through a plugin. There is no connection b/w Ant and CVS. Ant is a build mechanism (have you read the docs on http://jakarta.apache.org/ant/ ?) Eclipse also supports ant. Simply right-click on build.xml, and choose Run Ant. That should create the release in the distribution directory. The check under your eclipse workspace, and navigate to the directory (or if its easier for you, do a search for zip files in windows titled htmlparser* ). Regards, Somik |
From: Navid H.L. <na...@ya...> - 2002-12-27 00:13:44
|
Thank you Somik, it was very interesting. I could access the CVS. and see the directories, But sill have problem I am installing ANT now, but I do not understand the next step [1] Check the CVS page of htmlparser - there are instructions for doing an anonymous checkout [2] Install Ant - [3] go to the source directory where you see build.xml - and type ant in the command line. last step does not make sense to me, If I run ANT on command line, and CVS is running on Eclipse, what is the connection between these two? I think I should copy or download something or build.xml from CVS. Please let me know what I should do. Thank you very much. Navid --- Somik Raha <so...@ya...> wrote: > Hi Navid, > Here's an easy way to do it. Get the IDE Eclipse. > That has inbuilt CVS integration, so all you have to > do is specify your repository (Go in anonymous > pserver > mode). > Honestly speaking, I do not know why you were > using > putty. It is not required at all. I think you might > be > making a simple mistake- what is the command you are > issuing to cvs for checking out code ? Remember, you > should be trying anonymous pserver, and not the > developer mode. > > Regards, > Somik > > --- "Navid H.Langaroudi" <na...@ya...> wrote: > > Hi Somik, > > unfortunatly I could not access the CVS as > discribed > > on CVS page. I do not know what I am doing Wrong. > I > > tried putty.exe with ssh and slow I installed a > > version of CVS on my XP, but none of them worked. > > I would like to learn this eventually. > > But for now, I desperately need version 1.3 which > > Derrick mensioned. Is there any way I get this? > > I greatly appreciate your help. > > > > Thanks you, > > Regards, > > Navid > > > > --- Somik Raha <so...@ya...> wrote: > > > Hi Navid, > > > [1] Check the CVS page of htmlparser - there > > are > > > instructions for doing an anonymous checkout > > > [2] Install Ant - > > http://jakarata.apache.org/ant/ > > > [3] go to the source directory where you see > > > build.xml - and type ant in the command line. > > > [4] If the above steps are too hard, wait for > a > > > week for the integration release for 1.3. > > > > > > Regards, > > > Somik > > > --- "Navid H.Langaroudi" <na...@ya...> > > wrote: > > > > Hi Derrick, > > > > Thank you very much, But i could not find the > > > > version > > > > 1.3, and is it possible to rebuilt it on > > windows? > > > > Somik told me before, for unreleased versions > I > > > > should > > > > run a the build.xml using ANT. I need some > more > > > info > > > > to do so. Thank you > > > > > > > > Regards, > > > > Navid > > > > > > > > --- Derrick Oswald <Der...@ro...> > > > wrote: > > > > > David, > > > > > > > > > > There is now code in the repository to do > this > > > > > (version 1.3 only > > > > > available straight out of CVS at the > moment). > > > > > There are now 'URLConnection constructors' > on > > > the > > > > > HTMLParser object. > > > > > So you would do something like: > > > > > > > > > > url = new URL > > > > > ("http://www.teamstore.com"); > > > > > connection = > > > > > (HttpURLConnection)url.openConnection (); > > > > > connection.setRequestProperty > > > > > ("User-Agent", > > > > > "Mozilla/3.0(Windows NT 4.0; U) Opera 6.0 > > > [en]"); > > > > > // ... and whatever else is > > required > > > > > parser = new HTMLParser > > > (connection); > > > > > for (enumeration = > parser.elements > > > (); > > > > > > > > > enumeration.hasMoreNodes ();) > > > > > // ... process your nodes > > > > > > > > > > See the test case HTMLParserTest.testPOST() > > for > > > a > > > > > working example. > > > > > > > > > > Derrick > > > > > > > > > > >Date: Mon, 23 Dec 2002 11:44:51 -0800 (PST) > > > > > >From: "Navid H.Langaroudi" > > <na...@ya...> > > > > > >To: htm...@li... > > > > > >Subject: [Htmlparser-user] a possible need > > > > feature > > > > > >Reply-To: > > htm...@li... > > > > > > > > > > > >Hi Somik, > > > > > >I had a problem accessing this site > > > > > >(www.teamstore.com) which was giving me > HTTP > > > > error > > > > > >501. I tried to access the site with Java's > > > > > > >URLConnection class, and again I got same > > > result, > > > > > >error 501! > > > > > > > > > > > >But once I used this setting (see line 3 > > > below), > > > > I > > > > > >gained access to the site, an no more > errors: > > > > > > > > > > > >1-URL mysite = new > > > > URL("http://www.teamstore.com"); > > > > > >2-URLConnection yc = > mysite.openConnection(); > > > > > >3-yc.setRequestProperty("User-Agent", > > > > "Mozilla/3.0 > > > > > >(Windows NT 4.0; U) Opera 6.0 [en]") ; > > > > > > > > > > > > > > > > > >I was wonder if it is possible to do the > > same, > > > > > >"setRequestProperty" with HTMLParser > objects? > > > > > > > > > > > > > >I really appreciate it if you could let me > > know > > > > > this. > > > > > > > > > > > >By the way, I wish you and all others > > involved > > > in > > > > > this > > > > > >project a Happy Xmas and New Year! > > > > > > > > > > > >With Best wishes > > > > > >Navid > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > > This sf.net email is sponsored by:ThinkGeek > > > > > Welcome to geek heaven. > > > > > http://thinkgeek.com/sf > > > > > > > _______________________________________________ > > > > > Htmlparser-user mailing list > > > > > Htm...@li... > > > > > > > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > > > > > > __________________________________________________ > > > > Do you Yahoo!? > > > > Yahoo! Mail Plus - Powerful. Affordable. Sign > up > > > > now. > > > > http://mailplus.yahoo.com > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > This sf.net email is sponsored by:ThinkGeek > > > > Welcome to geek heaven. > === message truncated === __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Somik R. <so...@ya...> - 2002-12-26 21:41:00
|
Hi Navid, Here's an easy way to do it. Get the IDE Eclipse. That has inbuilt CVS integration, so all you have to do is specify your repository (Go in anonymous pserver mode). Honestly speaking, I do not know why you were using putty. It is not required at all. I think you might be making a simple mistake- what is the command you are issuing to cvs for checking out code ? Remember, you should be trying anonymous pserver, and not the developer mode. Regards, Somik --- "Navid H.Langaroudi" <na...@ya...> wrote: > Hi Somik, > unfortunatly I could not access the CVS as discribed > on CVS page. I do not know what I am doing Wrong. I > tried putty.exe with ssh and slow I installed a > version of CVS on my XP, but none of them worked. > I would like to learn this eventually. > But for now, I desperately need version 1.3 which > Derrick mensioned. Is there any way I get this? > I greatly appreciate your help. > > Thanks you, > Regards, > Navid > > --- Somik Raha <so...@ya...> wrote: > > Hi Navid, > > [1] Check the CVS page of htmlparser - there > are > > instructions for doing an anonymous checkout > > [2] Install Ant - > http://jakarata.apache.org/ant/ > > [3] go to the source directory where you see > > build.xml - and type ant in the command line. > > [4] If the above steps are too hard, wait for a > > week for the integration release for 1.3. > > > > Regards, > > Somik > > --- "Navid H.Langaroudi" <na...@ya...> > wrote: > > > Hi Derrick, > > > Thank you very much, But i could not find the > > > version > > > 1.3, and is it possible to rebuilt it on > windows? > > > Somik told me before, for unreleased versions I > > > should > > > run a the build.xml using ANT. I need some more > > info > > > to do so. Thank you > > > > > > Regards, > > > Navid > > > > > > --- Derrick Oswald <Der...@ro...> > > wrote: > > > > David, > > > > > > > > There is now code in the repository to do this > > > > (version 1.3 only > > > > available straight out of CVS at the moment). > > > > There are now 'URLConnection constructors' on > > the > > > > HTMLParser object. > > > > So you would do something like: > > > > > > > > url = new URL > > > > ("http://www.teamstore.com"); > > > > connection = > > > > (HttpURLConnection)url.openConnection (); > > > > connection.setRequestProperty > > > > ("User-Agent", > > > > "Mozilla/3.0(Windows NT 4.0; U) Opera 6.0 > > [en]"); > > > > // ... and whatever else is > required > > > > parser = new HTMLParser > > (connection); > > > > for (enumeration = parser.elements > > (); > > > > > > > enumeration.hasMoreNodes ();) > > > > // ... process your nodes > > > > > > > > See the test case HTMLParserTest.testPOST() > for > > a > > > > working example. > > > > > > > > Derrick > > > > > > > > >Date: Mon, 23 Dec 2002 11:44:51 -0800 (PST) > > > > >From: "Navid H.Langaroudi" > <na...@ya...> > > > > >To: htm...@li... > > > > >Subject: [Htmlparser-user] a possible need > > > feature > > > > >Reply-To: > htm...@li... > > > > > > > > > >Hi Somik, > > > > >I had a problem accessing this site > > > > >(www.teamstore.com) which was giving me HTTP > > > error > > > > >501. I tried to access the site with Java's > > > > >URLConnection class, and again I got same > > result, > > > > >error 501! > > > > > > > > > >But once I used this setting (see line 3 > > below), > > > I > > > > >gained access to the site, an no more errors: > > > > > > > > > >1-URL mysite = new > > > URL("http://www.teamstore.com"); > > > > >2-URLConnection yc = mysite.openConnection(); > > > > >3-yc.setRequestProperty("User-Agent", > > > "Mozilla/3.0 > > > > >(Windows NT 4.0; U) Opera 6.0 [en]") ; > > > > > > > > > > > > > > >I was wonder if it is possible to do the > same, > > > > >"setRequestProperty" with HTMLParser objects? > > > > > > > > > > >I really appreciate it if you could let me > know > > > > this. > > > > > > > > > >By the way, I wish you and all others > involved > > in > > > > this > > > > >project a Happy Xmas and New Year! > > > > > > > > > >With Best wishes > > > > >Navid > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > This sf.net email is sponsored by:ThinkGeek > > > > Welcome to geek heaven. > > > > http://thinkgeek.com/sf > > > > > _______________________________________________ > > > > Htmlparser-user mailing list > > > > Htm...@li... > > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > > __________________________________________________ > > > Do you Yahoo!? > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up > > > now. > > > http://mailplus.yahoo.com > > > > > > > > > > > > ------------------------------------------------------- > > > This sf.net email is sponsored by:ThinkGeek > > > Welcome to geek heaven. > > > http://thinkgeek.com/sf > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Mail Plus - Powerful. Affordable. Sign up > > now. > > http://mailplus.yahoo.com > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up > now. > http://mailplus.yahoo.com > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > === message truncated === __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Navid H.L. <na...@ya...> - 2002-12-26 19:42:45
|
Hi Somik, unfortunatly I could not access the CVS as discribed on CVS page. I do not know what I am doing Wrong. I tried putty.exe with ssh and slow I installed a version of CVS on my XP, but none of them worked. I would like to learn this eventually. But for now, I desperately need version 1.3 which Derrick mensioned. Is there any way I get this? I greatly appreciate your help. Thanks you, Regards, Navid --- Somik Raha <so...@ya...> wrote: > Hi Navid, > [1] Check the CVS page of htmlparser - there are > instructions for doing an anonymous checkout > [2] Install Ant - http://jakarata.apache.org/ant/ > [3] go to the source directory where you see > build.xml - and type ant in the command line. > [4] If the above steps are too hard, wait for a > week for the integration release for 1.3. > > Regards, > Somik > --- "Navid H.Langaroudi" <na...@ya...> wrote: > > Hi Derrick, > > Thank you very much, But i could not find the > > version > > 1.3, and is it possible to rebuilt it on windows? > > Somik told me before, for unreleased versions I > > should > > run a the build.xml using ANT. I need some more > info > > to do so. Thank you > > > > Regards, > > Navid > > > > --- Derrick Oswald <Der...@ro...> > wrote: > > > David, > > > > > > There is now code in the repository to do this > > > (version 1.3 only > > > available straight out of CVS at the moment). > > > There are now 'URLConnection constructors' on > the > > > HTMLParser object. > > > So you would do something like: > > > > > > url = new URL > > > ("http://www.teamstore.com"); > > > connection = > > > (HttpURLConnection)url.openConnection (); > > > connection.setRequestProperty > > > ("User-Agent", > > > "Mozilla/3.0(Windows NT 4.0; U) Opera 6.0 > [en]"); > > > // ... and whatever else is required > > > parser = new HTMLParser > (connection); > > > for (enumeration = parser.elements > (); > > > > > enumeration.hasMoreNodes ();) > > > // ... process your nodes > > > > > > See the test case HTMLParserTest.testPOST() for > a > > > working example. > > > > > > Derrick > > > > > > >Date: Mon, 23 Dec 2002 11:44:51 -0800 (PST) > > > >From: "Navid H.Langaroudi" <na...@ya...> > > > >To: htm...@li... > > > >Subject: [Htmlparser-user] a possible need > > feature > > > >Reply-To: htm...@li... > > > > > > > >Hi Somik, > > > >I had a problem accessing this site > > > >(www.teamstore.com) which was giving me HTTP > > error > > > >501. I tried to access the site with Java's > > > >URLConnection class, and again I got same > result, > > > >error 501! > > > > > > > >But once I used this setting (see line 3 > below), > > I > > > >gained access to the site, an no more errors: > > > > > > > >1-URL mysite = new > > URL("http://www.teamstore.com"); > > > >2-URLConnection yc = mysite.openConnection(); > > > >3-yc.setRequestProperty("User-Agent", > > "Mozilla/3.0 > > > >(Windows NT 4.0; U) Opera 6.0 [en]") ; > > > > > > > > > > > >I was wonder if it is possible to do the same, > > > >"setRequestProperty" with HTMLParser objects? > > > > > > > >I really appreciate it if you could let me know > > > this. > > > > > > > >By the way, I wish you and all others involved > in > > > this > > > >project a Happy Xmas and New Year! > > > > > > > >With Best wishes > > > >Navid > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > This sf.net email is sponsored by:ThinkGeek > > > Welcome to geek heaven. > > > http://thinkgeek.com/sf > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Mail Plus - Powerful. Affordable. Sign up > > now. > > http://mailplus.yahoo.com > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up > now. > http://mailplus.yahoo.com > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: <dha...@or...> - 2002-12-26 09:32:51
|
Hi, I agree with Sam when he says that "don't fix it when its not broken". But at the same time I see the need to make code better, more readable and simpler. However as suggested below it seems that the Visitor pattern is going to make things difficult to understand. And I too like Sam had a problem with HTMLParser initially. It may be with the documentation but this whole tag-scanner things was extremely confusing in the beginning(though subsequently I have loved them so much that I have even written a few myself without any problem). Hence I think that even if the visitor pattern is used, the user must have some simple easy-to-use methods to get his work done rather than try to understand the Visitor pattern. Just like Same, this was my two cents. Cheers, Dhaval -----Original Message----- From: gaijin [mailto:ga...@yh...] Sent: Tuesday, December 24, 2002 12:21 PM To: htmlparser-developer Cc: gaijin; htmlparser-user Subject: Re: [Htmlparser-developer] toPlainTextString() feedback requested Hi Somik and Joshua Joshua Kerievsky wrote: >>Could you explain why you want to refactor these methods? Remember the >>danger of premature refactoring ... you lose flexibility that then has >>to be re-added later on, making more work in the long run. >> >> >There's a good deal of duplicate code in way the two toHTML methods and the >toPlainTextString method do their work. The central theme is information >accumulation/alteration. That involves outputing tag and node results and >recusing through tags. The refactoring to Visitor allows us to > >* remove many lines of duplicate code, spread across many classes >* remove hard-coded accumulation/alteration logic, thereby making it easier >for clients to get the data they need > >Visitor takes some getting used to. I rarely use the pattern. In this case, >IMO, it was a good fit. > The Visitor pattern sounds interesting, and I look forward to hearing more about it. However, duplicated code itself is not IMO necessarily an evil. It all depends on whether one thinks that the duplicated components are going to diverge in functionality in the future. If you are sure they are not, then fine, refactor away. I guess my surprise at your (or perhaps Somik's) focus on refactoring comes from the fact that while the htmlparser is a great piece of software, the javadocs and other documentation could use some attention. For example, I can't find any explanation in the javadocs or otherwise of how the filters are supposed to work with the different scanners, or what values they are allowed to take. I generally work to "if it's not broken don't fix it", but I often add "before you start fixing it, make sure your documentation is up to date". Using the Visitor pattern may make it easier for clients to get the data they need, but given that the htmlparser is "working" (well it works for me), I would say that the more urgent issue here is making sure all the documentation is up to date. I have a lot of positive things to say about htmlparser, so don't take it the wrong way, when I say that the biggest problem I've had in using it in the last few weeks is inadequate javadocs. >>Is there some efficiency reason why you want to refactor these methods >>or is it just for neatness? >> >> > >Duplication removal is reason #1. > As I mention above. One should be careful of duplication removal for the sake of it. > Removal of hard-coded logic is reason #2. > This is a good reason. However I get the feeling that introduction of these Visitor classes will make the system conceptually more difficult to use rather than easier. I would feel better if the current set up was more fully documented before more complexity was added. And even if the Visitor pattern is used, I would recommend leaving methods like toPlainTextString() etc in place, but just making them short cut implementations to certain kinds of visitor-using methods. This will allow people who have yet to grasp the Visitor pattern something to work with. If you are keen to see lots of people using htmlparser, I think that you don't want people to have to come to terms with too many new concepts at once. You say yourself that the Visitor pattern takes some getting used to. I think the whole scanner concept takes some getting used to .... >Simplicity is reason #3: there is little reason to fatten the interfaces of >tag and node classes with various data accumulation/alteration methods when >one method and a variety of concrete Visitors can do the job with much less >code. > well I would agree if you could guarantee that there will be no divergence whatsoever in how the different methods will be used. If you can create a flexible enough implementation of the Visitor pattern then I guess that will support any possible divergence in the separate methods. However, I think there is a reason to have a fatter interface, in that convenience methods lower the barrier to entry for new users. Perhaps ideally one has a well implemented Visitor pattern that supports a raw method access, and a number of convenience methods? A well implemented Visitor pattern will, I assume, support all sorts of different operations, but I would feel much happier if the htmlparser had a complete javadoc and documentation review before any refactoring took place. People are trying to use the existing system and having trouble not because of the lack of refactoring, but a lack of well described methods. Well I say people, I mean me, I don't know if anyone else feels the same. Maybe it's just me :-) Just my two cents. CHEERS> SAM ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2002-12-24 07:01:48
|
Hi Sam, > I guess my surprise at your (or perhaps Somik's) focus on refactoring > comes from the fact that while the htmlparser is a great piece of > software, the javadocs and other documentation could use some attention. > For example, I can't find any explanation in the javadocs or otherwise > of how the filters are supposed to work with the different scanners, or > what values they are allowed to take. The last few weeks, I've been only adding docs... The Sample Programs are new - you will find an example of using filters here : http://htmlparser.sourceforge.net/samples/linksEmbedded.html From the javadoc : http://htmlparser.sourceforge.net/javadoc/org/htmlparser/HTMLNode.html (check collectInto) I guess more javadoc should be written about this, but I didn't bother to write it bcos the filters were mostly used for demonstration from the command line. When you type java -jar htmlparser.jar, you would see a help menu of each filter. Its only with the introduction of Collection Parameter that we've been using filter strings for actual collection of data, and that has been documented. If its not important, its not documented :). > I generally work to "if it's not broken don't fix it", but I often add > "before you start fixing it, make sure your documentation is up to date". > Using the Visitor pattern may make it easier for clients to get the data > they need, but given that the htmlparser is "working" (well it works for > me), I would say that the more urgent issue here is making sure all the > documentation is up to date. I have a lot of positive things to say > about htmlparser, so don't take it the wrong way, when I say that the > biggest problem I've had in using it in the last few weeks is inadequate > javadocs. Sam - feel free to post as often as you like on the list. I'd be glad to help you out. The lack of adequate documentation is of course my responsibility and in light of my explanation above, if you can suggest other areas of inadequate docs, I'll look into it. > >Duplication removal is reason #1. > > > As I mention above. One should be careful of duplication removal for > the sake of it. > > > Removal of hard-coded logic is reason #2. > > > This is a good reason. However I get the feeling that introduction of > these Visitor classes will make the system conceptually more difficult > to use rather than easier. I would feel better if the current set up > was more fully documented before more complexity was added. > > And even if the Visitor pattern is used, I would recommend leaving > methods like toPlainTextString() etc in place, but just making them > short cut implementations to certain kinds of visitor-using methods. > This will allow people who have yet to grasp the Visitor pattern > something to work with. Don't worry, we'll leave toPlainTextString() alone - in fact, we've already begun doing the short cut implementations. :) The purpose of my initial mail was not to alarm you, but to know more about realistic "customer" stories. > If you are keen to see lots of people using htmlparser, I think that you > don't want people to have to come to terms with too many new concepts at > once. You say yourself that the Visitor pattern takes some getting used > to. I think the whole scanner concept takes some getting used to .... > Bytway, have you gone thru the Sample Programs - I should've thought that this new addition will make life very simple. If not, we'd probably need more docs.. Regards, Somik |
From: Sam J. <ga...@yh...> - 2002-12-24 06:35:41
|
Hi Somik and Joshua Joshua Kerievsky wrote: >>Could you explain why you want to refactor these methods? Remember the >>danger of premature refactoring ... you lose flexibility that then has >>to be re-added later on, making more work in the long run. >> >> >There's a good deal of duplicate code in way the two toHTML methods and the >toPlainTextString method do their work. The central theme is information >accumulation/alteration. That involves outputing tag and node results and >recusing through tags. The refactoring to Visitor allows us to > >* remove many lines of duplicate code, spread across many classes >* remove hard-coded accumulation/alteration logic, thereby making it easier >for clients to get the data they need > >Visitor takes some getting used to. I rarely use the pattern. In this case, >IMO, it was a good fit. > The Visitor pattern sounds interesting, and I look forward to hearing more about it. However, duplicated code itself is not IMO necessarily an evil. It all depends on whether one thinks that the duplicated components are going to diverge in functionality in the future. If you are sure they are not, then fine, refactor away. I guess my surprise at your (or perhaps Somik's) focus on refactoring comes from the fact that while the htmlparser is a great piece of software, the javadocs and other documentation could use some attention. For example, I can't find any explanation in the javadocs or otherwise of how the filters are supposed to work with the different scanners, or what values they are allowed to take. I generally work to "if it's not broken don't fix it", but I often add "before you start fixing it, make sure your documentation is up to date". Using the Visitor pattern may make it easier for clients to get the data they need, but given that the htmlparser is "working" (well it works for me), I would say that the more urgent issue here is making sure all the documentation is up to date. I have a lot of positive things to say about htmlparser, so don't take it the wrong way, when I say that the biggest problem I've had in using it in the last few weeks is inadequate javadocs. >>Is there some efficiency reason why you want to refactor these methods >>or is it just for neatness? >> >> > >Duplication removal is reason #1. > As I mention above. One should be careful of duplication removal for the sake of it. > Removal of hard-coded logic is reason #2. > This is a good reason. However I get the feeling that introduction of these Visitor classes will make the system conceptually more difficult to use rather than easier. I would feel better if the current set up was more fully documented before more complexity was added. And even if the Visitor pattern is used, I would recommend leaving methods like toPlainTextString() etc in place, but just making them short cut implementations to certain kinds of visitor-using methods. This will allow people who have yet to grasp the Visitor pattern something to work with. If you are keen to see lots of people using htmlparser, I think that you don't want people to have to come to terms with too many new concepts at once. You say yourself that the Visitor pattern takes some getting used to. I think the whole scanner concept takes some getting used to .... >Simplicity is reason #3: there is little reason to fatten the interfaces of >tag and node classes with various data accumulation/alteration methods when >one method and a variety of concrete Visitors can do the job with much less >code. > well I would agree if you could guarantee that there will be no divergence whatsoever in how the different methods will be used. If you can create a flexible enough implementation of the Visitor pattern then I guess that will support any possible divergence in the separate methods. However, I think there is a reason to have a fatter interface, in that convenience methods lower the barrier to entry for new users. Perhaps ideally one has a well implemented Visitor pattern that supports a raw method access, and a number of convenience methods? A well implemented Visitor pattern will, I assume, support all sorts of different operations, but I would feel much happier if the htmlparser had a complete javadoc and documentation review before any refactoring took place. People are trying to use the existing system and having trouble not because of the lack of refactoring, but a lack of well described methods. Well I say people, I mean me, I don't know if anyone else feels the same. Maybe it's just me :-) Just my two cents. CHEERS> SAM |
From: Joshua K. <jo...@in...> - 2002-12-24 05:40:23
|
> Could you explain why you want to refactor these methods? Remember the > danger of premature refactoring ... you lose flexibility that then has > to be re-added later on, making more work in the long run. Hi Sam, There's a good deal of duplicate code in way the two toHTML methods and the toPlainTextString method do their work. The central theme is information accumulation/alteration. That involves outputing tag and node results and recusing through tags. The refactoring to Visitor allows us to * remove many lines of duplicate code, spread across many classes * remove hard-coded accumulation/alteration logic, thereby making it easier for clients to get the data they need Visitor takes some getting used to. I rarely use the pattern. In this case, IMO, it was a good fit. > Is there some efficiency reason why you want to refactor these methods > or is it just for neatness? Duplication removal is reason #1. Removal of hard-coded logic is reason #2. Simplicity is reason #3: there is little reason to fatten the interfaces of tag and node classes with various data accumulation/alteration methods when one method and a variety of concrete Visitors can do the job with much less code. best regards jk |
From: Somik R. <so...@ya...> - 2002-12-24 03:37:22
|
Hi Sam, > Could you explain why you want to refactor these > methods? Remember the > danger of premature refactoring ... you lose > flexibility that then has > to be re-added later on, making more work in the > long run. There seems to be so much duplication in the parser, that it seems like Merciless_Refactoring can alone help clean the code. > Is there some efficiency reason why you want to > refactor these methods > or is it just for neatness? Readability and efficiency are important factors - far more important is to have a simple way of getting data from the parser. The idea is to pass in your own visitors into the parser, which could either extract text, or html. If you need to modify them, then you can modify these visitors to customize behaviour. The weight of the parser would reduce, code duplication would not be there, and it would be easier to customize. > Either way, I would certainly recommend a > deprecation step. Will keep that in mind. Cheers, Somik __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Sam J. <ga...@yh...> - 2002-12-24 02:39:57
|
Hi Somik, I tend to use toPlainTextString() in all three ways described below. I use the first to generate a plain test overview, the second for debugging statements, and the final one for when I want to add my own additional formatting to the final text. Could you explain why you want to refactor these methods? Remember the danger of premature refactoring ... you lose flexibility that then has to be re-added later on, making more work in the long run. Is there some efficiency reason why you want to refactor these methods or is it just for neatness? Either way, I would certainly recommend a deprecation step. CHEERS> SAM Somik Raha wrote: >Hi Folks, > We are performing a series of refactorings, which >will involve merging of the functionality of >toPlainTextString() and toHTML(HTMLRenderer), and >perhaps, toHTML() itself.. > > We'd like to know user stories of how you've been >using toPlainTextString(). > >Do you enumerate through the nodes and : >[1] collect all the text data using >toPlainTextString() into a buffer? > >[2] make calls to toPlainTextString() but don't >collect the data ? > >[3] collect all text data using toPlainTextString() >into a buffer, but only after having processed the >string in some way ? > >Another question - if we remove toPlainTextString() >[and provide you an alternative], will it hurt, or >would you prefer a deprecation step (in 1.3) before >removal (in 1.4) ? > >It would be good for us to know more details about >your applications, so that we can perform our design >modifications correctly. Feel free to tell us your >stories in detail. (Remember to click Reply All so the >replies go to both lists). > >Regards, >Somik > >__________________________________________________ >Do you Yahoo!? >Yahoo! Mail Plus - Powerful. Affordable. Sign up now. >http://mailplus.yahoo.com > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > |
From: Somik R. <so...@ya...> - 2002-12-24 02:20:37
|
Hi Folks, We are performing a series of refactorings, which will involve merging of the functionality of toPlainTextString() and toHTML(HTMLRenderer), and perhaps, toHTML() itself.. We'd like to know user stories of how you've been using toPlainTextString(). Do you enumerate through the nodes and : [1] collect all the text data using toPlainTextString() into a buffer? [2] make calls to toPlainTextString() but don't collect the data ? [3] collect all text data using toPlainTextString() into a buffer, but only after having processed the string in some way ? Another question - if we remove toPlainTextString() [and provide you an alternative], will it hurt, or would you prefer a deprecation step (in 1.3) before removal (in 1.4) ? It would be good for us to know more details about your applications, so that we can perform our design modifications correctly. Feel free to tell us your stories in detail. (Remember to click Reply All so the replies go to both lists). Regards, Somik __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Somik R. <so...@ya...> - 2002-12-24 01:58:49
|
Hi Navid, [1] Check the CVS page of htmlparser - there are instructions for doing an anonymous checkout [2] Install Ant - http://jakarata.apache.org/ant/ [3] go to the source directory where you see build.xml - and type ant in the command line. [4] If the above steps are too hard, wait for a week for the integration release for 1.3. Regards, Somik --- "Navid H.Langaroudi" <na...@ya...> wrote: > Hi Derrick, > Thank you very much, But i could not find the > version > 1.3, and is it possible to rebuilt it on windows? > Somik told me before, for unreleased versions I > should > run a the build.xml using ANT. I need some more info > to do so. Thank you > > Regards, > Navid > > --- Derrick Oswald <Der...@ro...> wrote: > > David, > > > > There is now code in the repository to do this > > (version 1.3 only > > available straight out of CVS at the moment). > > There are now 'URLConnection constructors' on the > > HTMLParser object. > > So you would do something like: > > > > url = new URL > > ("http://www.teamstore.com"); > > connection = > > (HttpURLConnection)url.openConnection (); > > connection.setRequestProperty > > ("User-Agent", > > "Mozilla/3.0(Windows NT 4.0; U) Opera 6.0 [en]"); > > // ... and whatever else is required > > parser = new HTMLParser (connection); > > for (enumeration = parser.elements (); > > > enumeration.hasMoreNodes ();) > > // ... process your nodes > > > > See the test case HTMLParserTest.testPOST() for a > > working example. > > > > Derrick > > > > >Date: Mon, 23 Dec 2002 11:44:51 -0800 (PST) > > >From: "Navid H.Langaroudi" <na...@ya...> > > >To: htm...@li... > > >Subject: [Htmlparser-user] a possible need > feature > > >Reply-To: htm...@li... > > > > > >Hi Somik, > > >I had a problem accessing this site > > >(www.teamstore.com) which was giving me HTTP > error > > >501. I tried to access the site with Java's > > >URLConnection class, and again I got same result, > > >error 501! > > > > > >But once I used this setting (see line 3 below), > I > > >gained access to the site, an no more errors: > > > > > >1-URL mysite = new > URL("http://www.teamstore.com"); > > >2-URLConnection yc = mysite.openConnection(); > > >3-yc.setRequestProperty("User-Agent", > "Mozilla/3.0 > > >(Windows NT 4.0; U) Opera 6.0 [en]") ; > > > > > > > > >I was wonder if it is possible to do the same, > > >"setRequestProperty" with HTMLParser objects? > > > > > >I really appreciate it if you could let me know > > this. > > > > > >By the way, I wish you and all others involved in > > this > > >project a Happy Xmas and New Year! > > > > > >With Best wishes > > >Navid > > > > > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up > now. > http://mailplus.yahoo.com > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Navid H.L. <na...@ya...> - 2002-12-23 23:02:55
|
Hi Derrick, Thank you very much, But i could not find the version 1.3, and is it possible to rebuilt it on windows? Somik told me before, for unreleased versions I should run a the build.xml using ANT. I need some more info to do so. Thank you Regards, Navid --- Derrick Oswald <Der...@ro...> wrote: > David, > > There is now code in the repository to do this > (version 1.3 only > available straight out of CVS at the moment). > There are now 'URLConnection constructors' on the > HTMLParser object. > So you would do something like: > > url = new URL > ("http://www.teamstore.com"); > connection = > (HttpURLConnection)url.openConnection (); > connection.setRequestProperty > ("User-Agent", > "Mozilla/3.0(Windows NT 4.0; U) Opera 6.0 [en]"); > // ... and whatever else is required > parser = new HTMLParser (connection); > for (enumeration = parser.elements (); > enumeration.hasMoreNodes ();) > // ... process your nodes > > See the test case HTMLParserTest.testPOST() for a > working example. > > Derrick > > >Date: Mon, 23 Dec 2002 11:44:51 -0800 (PST) > >From: "Navid H.Langaroudi" <na...@ya...> > >To: htm...@li... > >Subject: [Htmlparser-user] a possible need feature > >Reply-To: htm...@li... > > > >Hi Somik, > >I had a problem accessing this site > >(www.teamstore.com) which was giving me HTTP error > >501. I tried to access the site with Java's > >URLConnection class, and again I got same result, > >error 501! > > > >But once I used this setting (see line 3 below), I > >gained access to the site, an no more errors: > > > >1-URL mysite = new URL("http://www.teamstore.com"); > >2-URLConnection yc = mysite.openConnection(); > >3-yc.setRequestProperty("User-Agent", "Mozilla/3.0 > >(Windows NT 4.0; U) Opera 6.0 [en]") ; > > > > > >I was wonder if it is possible to do the same, > >"setRequestProperty" with HTMLParser objects? > > > >I really appreciate it if you could let me know > this. > > > >By the way, I wish you and all others involved in > this > >project a Happy Xmas and New Year! > > > >With Best wishes > >Navid > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Derrick O. <Der...@ro...> - 2002-12-23 20:59:13
|
David, There is now code in the repository to do this (version 1.3 only available straight out of CVS at the moment). There are now 'URLConnection constructors' on the HTMLParser object. So you would do something like: url = new URL ("http://www.teamstore.com"); connection = (HttpURLConnection)url.openConnection (); connection.setRequestProperty ("User-Agent", "Mozilla/3.0(Windows NT 4.0; U) Opera 6.0 [en]"); // ... and whatever else is required parser = new HTMLParser (connection); for (enumeration = parser.elements (); enumeration.hasMoreNodes ();) // ... process your nodes See the test case HTMLParserTest.testPOST() for a working example. Derrick >Date: Mon, 23 Dec 2002 11:44:51 -0800 (PST) >From: "Navid H.Langaroudi" <na...@ya...> >To: htm...@li... >Subject: [Htmlparser-user] a possible need feature >Reply-To: htm...@li... > >Hi Somik, >I had a problem accessing this site >(www.teamstore.com) which was giving me HTTP error >501. I tried to access the site with Java's >URLConnection class, and again I got same result, >error 501! > >But once I used this setting (see line 3 below), I >gained access to the site, an no more errors: > >1-URL mysite = new URL("http://www.teamstore.com"); >2-URLConnection yc = mysite.openConnection(); >3-yc.setRequestProperty("User-Agent", "Mozilla/3.0 >(Windows NT 4.0; U) Opera 6.0 [en]") ; > > >I was wonder if it is possible to do the same, >"setRequestProperty" with HTMLParser objects? > >I really appreciate it if you could let me know this. > >By the way, I wish you and all others involved in this >project a Happy Xmas and New Year! > >With Best wishes >Navid > |
From: Navid H.L. <na...@ya...> - 2002-12-23 19:44:56
|
Hi Somik, I had a problem accessing this site (www.teamstore.com) which was giving me HTTP error 501. I tried to access the site with Java's URLConnection class, and again I got same result, error 501! But once I used this setting (see line 3 below), I gained access to the site, an no more errors: 1-URL mysite = new URL("http://www.teamstore.com"); 2-URLConnection yc = mysite.openConnection(); 3-yc.setRequestProperty("User-Agent", "Mozilla/3.0 (Windows NT 4.0; U) Opera 6.0 [en]") ; I was wonder if it is possible to do the same, "setRequestProperty" with HTMLParser objects? I really appreciate it if you could let me know this. By the way, I wish you and all others involved in this project a Happy Xmas and New Year! With Best wishes Navid __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Somik R. <so...@ya...> - 2002-12-22 18:45:54
|
Hi Visarut Check http://htmlparser.sourceforge.net/samples/text.html In general, check http://htmlparser.sourceforge.net/samples/ for a list of other commonly used programs. Regards Somik ----- Original Message ----- From: "Visarut Tanpoungchoey" <piz...@ho...> To: <htm...@li...> Sent: Sunday, December 22, 2002 8:58 AM Subject: [Htmlparser-user] help > help > I want to know if I want to parse only text (not in tag) in a website how > can I do , > or how can I modify. > > please tell me quickly .I want this to use in my project. > > Thank you very much > > _________________________________________________________________ > MSN 8: advanced junk mail protection and 3 months FREE*. > http://join.msn.com/?page=features/junkmail&xAPID=42&PS=47575&PI=7324&DI=747 4&SU= > http://www.hotmail.msn.com/cgi-bin/getmsg&HL=1216hotmailtaglines_advancedjmf _3mf > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Visarut T. <piz...@ho...> - 2002-12-22 17:00:48
|
help I want to know if I want to parse only text (not in tag) in a website how can I do , or how can I modify. please tell me quickly .I want this to use in my project. Thank you very much _________________________________________________________________ MSN 8: advanced junk mail protection and 3 months FREE*. http://join.msn.com/?page=features/junkmail&xAPID=42&PS=47575&PI=7324&DI=7474&SU= http://www.hotmail.msn.com/cgi-bin/getmsg&HL=1216hotmailtaglines_advancedjmf_3mf |
From: Somik R. <so...@ya...> - 2002-12-22 03:24:26
|
Hi Folks, Finally, after 8 months of hard work, we have the next production release of the parser. 1.2 has tons of bug fixes and features. The change log difference b/w 1.2 and 1.1 is too big to be listed in this mail - check the change log when you are downloading (its also in the download package). Documentation has been considerably improved (the Sample programs would be the place to start). There's a section on the patterns in action as well. You can modify the rendering process for links and images, as well as provide collecting parameters to pick up nodes that you wish (currently images and links supported). Below is the change log (as compared to last week's integration release) : Production Release 1.2 ---------------------- [1] Rewrote HTMLLinkProcessor.extract() so URL class does all the heavy lifting [2] Partially fixed bug 654746 - HTMLLinkScanner error, code review needed [3] Rendering bug fixed - allowing uniform rendering for links and images [4] Fixed bug 655917, made HTMLParameterParser.parseParameters() thread-safe [5] Refactored HTMLFormTag (introduced POST and GET static members) [6] Bug fixed in HTMLFormTag.getInputTag() (NullPointerException when input tag has no name) [7] Added ability to get textarea tag from HTMLFormTag. [8] Added search capability in HTMLFormTag [9] Fixed bug 655627 - JSP tags with < sign (for loops) were not being parsed correctly [10] Fixed bug 655603 - JSP tags within src of script not recognized correctly when using single apostrophes [11] Fixed bug 655580 - JSP tags within title tags not recognized correctly [12] Fixed bug 655599 - Erroneous end-of-line characters were being added in string nodes [13] Fixed bug 656870 - HTMLFormScanner goes into infinite loop if a previous link has not been closed Thanks to Derrick Oswald and Dhaval Udani for their work on the last few releases. Thanks to Joe Robins for pointing out an important bug in HTMLFormScanner. A special mention for Dhaval - all his bug reports come with testcases making it really easy for us to reproduce the bug and fix them. Regards, Somik |
From: Navid H.L. <na...@ya...> - 2002-12-20 22:06:07
|
Hi, I am getting a 501 error when trying to access this site using parser, I even tried it with a simple URLConnection class of java. The site is www.teamstore.com I am sure I am missing some protocol checking or so here. I preciate any comment or guide. Navid --- Somik Raha <so...@ya...> wrote: > No. You could write your own scanner to do this. > > Regards, > Somik > --- "Navid H.Langaroudi" <na...@ya...> wrote: > > Hi > > Does htmlparser scan image maps, > > <map><area><area></map>. Or any other scanner does > > these. > > > > Navid > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Mail Plus - Powerful. Affordable. Sign up > > now. > > http://mailplus.yahoo.com > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: > > With Great Power, Comes Great Responsibility > > Learn to use your power at OSDN's High Performance > > Computing Channel > > http://hpc.devchannel.org/ > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up > now. > http://mailplus.yahoo.com > > > ------------------------------------------------------- > This sf.net email is sponsored by: > With Great Power, Comes Great Responsibility > Learn to use your power at OSDN's High Performance > Computing Channel > http://hpc.devchannel.org/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Somik R. <so...@ya...> - 2002-12-15 09:29:43
|
Hi Folks, Candidate 6 is out, and there are some goodies in this one.. Thanks to Derrick Oswald and Leslie Rohde (our two new developers) who have put in their time. From the Change Log : Integration Build 1.2 - 20021215 -------------------------------- [1] Modified API of HTMLImageTag (refactored name of image loc), HTMLLinkTag (added getters) [2] Fixed bug 650457 - removeEscapeCharacters() incorrect [3] Fixed bug 652263 - HTMLParser and null feedback [4] Changed encoding used from 8859_4 to 8859_1 [5] HTMLRemarkNode returns string data in toPlainTextString() (This is a rollback) [6] Fixed bug 652746 - HTMLFormTag gets links correctly now [7] Fixed bug 653720 - HTMLNode uses sun specific class [8] Improved StringExtractor parser application [9] Major design improvement, implemented Collection-Parameter pattern - in HTMLNode.collectInto() [10] Fixed reset crash bug. Reader providers have to explicitly call mark and reset now. This is now documented in HTMLParser.java. [11] Fixed bug 649269 in HTMLLinkTag.isHttpLink(), now correctly identifies relative links as Http links. A major API improvement has occurred - HTMLNode now has a new method - collectInto(), which uses a collection parameter to collect nodes. A sample program demonstrating this feature is at : http://htmlparser.sourceforge.net/samples/linksEmbedded.html Thanks to everyone who participated in the discussions and architecture changes. There has been a rollback as well, we've taken out the mark and reset mechanism, and this is now the responsibility of the reader supplier. Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-12-13 20:51:34
|
No. You could write your own scanner to do this. Regards, Somik --- "Navid H.Langaroudi" <na...@ya...> wrote: > Hi > Does htmlparser scan image maps, > <map><area><area></map>. Or any other scanner does > these. > > Navid > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up > now. > http://mailplus.yahoo.com > > > ------------------------------------------------------- > This sf.net email is sponsored by: > With Great Power, Comes Great Responsibility > Learn to use your power at OSDN's High Performance > Computing Channel > http://hpc.devchannel.org/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Navid H.L. <na...@ya...> - 2002-12-13 20:32:02
|
Hi Does htmlparser scan image maps, <map><area><area></map>. Or any other scanner does these. Navid __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Navid H.L. <na...@ya...> - 2002-12-13 20:28:49
|
Hi Does htmlparser scan image maps, <map><area><area></map>. Or any other scanner does these. Navid __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |