htmlparser-developer Mailing List for HTML Parser (Page 22)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: Somik R. <so...@ya...> - 2002-12-20 18:04:40
|
Hi Sam, This is interesting - bcos I was doing much the same thing last evening. I've added a lot of searching methods into HTMLFormTag - as I was using this to do test-first development of an XSLT stylesheet. I was happy with the results - I was actually able to develop the stylesheet test-first. > I have started creating a TestCodeGenerator using > HtmlParser. I am > using an HTMLFormScanner to strip out all the form > details, and from > this I hope to generate testing and handling java > classes .... I will probably add some more utility methods into HTMLParserTestCase that will make life easier - but even in its current form, you might find it useful. I've documented it here : http://htmlparser.sourceforge.net/design/tests.html Regards, Somik __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Sam J. <ga...@yh...> - 2002-12-20 12:16:07
|
Some investigation of HttpUnit has turned up that it includes the NekoHTMLparser. Alsio there does not seem to be any mechanism to apply HttpUnit to a local file. It looks like grab a page off a web server or nothing .... I have started creating a TestCodeGenerator using HtmlParser. I am using an HTMLFormScanner to strip out all the form details, and from this I hope to generate testing and handling java classes .... CHEERS> SAM p.s. still not sure what the filters do in scanners. the filter parameter in HTMLTagScanner and HTMLFormScanner don't seem to be used for anything .... Sam Joseph wrote: > What I mean is that HttpUnit has to contain something similar to > Htmlparser somewhere in its code. |
From: Sam J. <ga...@yh...> - 2002-12-20 09:42:05
|
Somik Raha wrote: >Sam Joseph wrote : > > > >>Have you looked at HTTPUnit? http://httpunit.sourceforge.net/ >> >>They have to deal with a lot of similar problems and there may be >> >> >synergies. > > > >I am curious to hear more about this - I am going to be using HttpUnit real >soon - what sort of problems did you face ? > I didn't face problems as such. What I mean is that HttpUnit has to contain something similar to Htmlparser somewhere in its code. Htmlparser lets you parse HTML. So does HttpUnit, but HttpUnit lets you interact with the HTML forms that you have parsed out of the HTML. For example: WebConversation o_wc = new WebConversation; WebReponse x_jdoc = o_wc.getResponse(new GetMethodWebRequest("some_url")); WebForm x_form = x_jdoc.getFormWithName("my_form"); assertTrue(x_form.hasParameterNamed("some_param")); SubmitButton x_submit_button = x_form.getSubmitButton("Submit"); x_submit_button.click(); will open a url grab the html data coming off the response, and then lets you create objects like WebForms, SubmitButtons etc. You can then manipulate these, setting parameters on the forms, submitting them etc. This is how HttpUnit allows you to create unit tests for your html interfaces. Anyway, so my point is that underneath the API HttpUnit must be doing something similar to HtmlParser in order to allow it to get access to the html data, i.e. they are both parsing HTML. The main difference is the level of the API. Currently I have had this weird idea (which I mailed to the NinJava list) which is to use the html templates that I build by web management screens from to generate java outlines for the code that handles the web forms and also the test code itself. Synchronistically, in order to implement such a thing, I would need to start of my parsing the html templates, which are themselves HTML. I guess I could use either HttpUnit or HtmlParser to do this, but I'm not sure if HttpUnit can be used to parse local files .... Anyway, for those of you reading down this far, the idea would be that you could define in one place your web form structure, and then the tedious parts of writing the support class and the test class would be removed, allowing one to produce reliable web management screens much faster. Naturally look and feel would be farmed out to CSS, and he logical extentsion would be to define your web form structures in XML, in fact to generate them directly off a data model like the ones used in Torque and Turbine .... Then providing web management screens would just be a question of choosing which forms/objects to allow which users access. Although I guess we woudl still want the ability to specify aggregate forms that gave users access to data made up of components of more fundamental data structures. Apologies for the long post ... CHEERS> SAM |
From: Somik R. <so...@ya...> - 2002-12-19 07:47:04
|
> > I've used this POST mechanism many times, not for testing a site. > > A typical example is fetching a postalcode by hitting the (for me) > http://www.canadapost.ca/tools/pcl/bin/default-e.asp site and posting a > filled in form. This has to be parsed, and a table element extracted. I understand now! This sounds like a pretty useful feature to me - and if it is written test-first, I can't think of why this can't go into 1.2. The table element though is better off for 1.3.. a lot of clean-up needed in existing system. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-19 07:44:55
|
Hi Derrick, > Youu might want to do what's necessary to CVS (tag, branch, > label...whatever) to freeze version 1.2 and open up the 1.3 version as > the head revision. I've been swamped for a while - but I will get to this tomorrow. Actually, I was kind of hoping that we could put in some work to wrap up 1.2. If we get even a week without bug reports - we can close 1.2 - but the reports just keep coming and keep coming. Which leads me to believe that we may not be having enough testcases, and we may need to do some merciless refactorings... How do you think about cleaning it up before we release - taking time till Jan 1 to do this ? It'd be really good to have more eyeballs going over the code and performing refactorings, etc.. before we add any new features. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-19 07:40:22
|
Sorry for the other mail - its getting late here and at such times Outlook Express misinterprets any action of mine to be Ctrl+Enter. > > <INPUT name="foo" value="foobar" type="text" /=""> > > There are obviously no side effects but probably this could be avoided. bcos assume dirty html, not clean html :) this is definitely a bug! Go ahead and file. > One more issue I came across while doing some hardcore HTML+JSP parsing. > > Consider a tag like this > > <INPUT <%=someValue%> name="foo" value="foobar" type="text"> > > This tag is parsed as <INPUT>. The problem being that the beginning JSP > tag is considered to be the beginning of another tag. > Also I was not too sure whether the scope of the parser extends to solve > these problems hence have not put it into the bug report. Let me know if > I should put it into the bug report. This issue is a real pain!! I really went crazy solving this stuff for HTMLLinkTag. Sometimes I think the parser is not all that intelligent as it could be - why can't it think like us and decide that this tag is really an INPUT tag ? I am seriously considering using Neural Networks in the parser to add some intelligence - perhaps Bayesian Networks (maybe v1.4). I've recently been checking out SpamAssassin and that uses a Bayesian network - it is so good that I've even installed it for my yahoo pop account. Sam--> If I remember correctly, the Neurogrid project uses Bayesian networks .. Do you recommend us going down this path ? I think this whole problem of correcting dirty html is non-linear - and there needs to be some innovative approaches to handling it. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-19 07:32:58
|
> One more issue I came across while doing some hardcore HTML+JSP parsing. > > Consider a tag like this > > <INPUT <%=someValue%> name="foo" value="foobar" type="text"> > > This tag is parsed as <INPUT>. The problem being that the beginning JSP > tag is considered to be the beginning of another tag. > > Also the following valid HTML : > > <INPUT name="foo" value="foobar" type="text" /> > > is reprinted using toHTML() as: > > Also I was not too sure whether the scope of the parser extends to solve > these problems hence have not put it into the bug report. Let me know if > I should put it into the bug report. > > Regards, > > Dhaval Udani > Senior Analyst > M-Line, QPEG > OrbiTech Solutions Ltd. > +91-22-28290019 Extn. 1457 > > > > > -----Original Message----- > > From: Udani, Dhaval H. > > Sent: Thursday, December 19, 2002 9:51 AM > > To: htmlparser-developer > > Cc: Udani, Dhaval H. > > Subject: RE: [Htmlparser-developer] Registering Scanners > > > > > > > OTOH, from the design perspective, it serves for > > > better encapsulation to have child scanners within the > > > parent scanners - otherwise - it'd be really difficult > > > to determine dependencies of scanners. e.g. looking at > > > the code of global registrations, how would you be > > > able to tell that an option tag can appear only in a > > > select tag? > > > > > > Question is - should one care? That is open to debate. > > > For some scanners, it does not matter (like the link > > > or form scanners, was links and forms can contain just > > > about anything inside them). But for select tags, that > > > is not the case. > > > > > > > I agree that the parent-child relationship is really good but > > as I said that there are so many possibilities for data > > inside a parent tag that we may not be able to retrieve all > > the information and subsequently reproduce it like the > > original. As for select tags as I said earlier it is possible > > to have either comment tags or jsp tags and the situation is > > true for any other type of tag as well. > > > > > Regarding potential for having bugs - I have noticed > > > that whenever we've had something without tests, it > > > had bugs (mostly due to my inconsistency in following > > > the practice). I am trying to always always write > > > tests - hence you'd have noticed that there is a > > > considerable effort on improving the testing > > > mechanism. I'd say that good testcases are the > > > pre-requisite for good code. Code that runs without > > > tests is really a fluke. I guess the aim is to combine > > > solid tests with a simple, natural design. > > > > I completely agree with you out here. My point just being > > that there are so may possibilties many of which are > > unthinkable at this point of time that it is probably better > > to be safe about the parsing. Any increase in test cases is > > always welcome. > > > > Regards, > > Dhaval > > > > > > |
From: <dha...@or...> - 2002-12-19 05:30:21
|
One more issue I came across while doing some hardcore HTML+JSP parsing. Consider a tag like this <INPUT <%=someValue%> name="foo" value="foobar" type="text"> This tag is parsed as <INPUT>. The problem being that the beginning JSP tag is considered to be the beginning of another tag. Also the following valid HTML : <INPUT name="foo" value="foobar" type="text" /> is reprinted using toHTML() as: <INPUT name="foo" value="foobar" type="text" /=""> There are obviously no side effects but probably this could be avoided. Also I was not too sure whether the scope of the parser extends to solve these problems hence have not put it into the bug report. Let me know if I should put it into the bug report. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: Udani, Dhaval H. > Sent: Thursday, December 19, 2002 9:51 AM > To: htmlparser-developer > Cc: Udani, Dhaval H. > Subject: RE: [Htmlparser-developer] Registering Scanners > > > > OTOH, from the design perspective, it serves for > > better encapsulation to have child scanners within the > > parent scanners - otherwise - it'd be really difficult > > to determine dependencies of scanners. e.g. looking at > > the code of global registrations, how would you be > > able to tell that an option tag can appear only in a > > select tag? > > > > Question is - should one care? That is open to debate. > > For some scanners, it does not matter (like the link > > or form scanners, was links and forms can contain just > > about anything inside them). But for select tags, that > > is not the case. > > > > I agree that the parent-child relationship is really good but > as I said that there are so many possibilities for data > inside a parent tag that we may not be able to retrieve all > the information and subsequently reproduce it like the > original. As for select tags as I said earlier it is possible > to have either comment tags or jsp tags and the situation is > true for any other type of tag as well. > > > Regarding potential for having bugs - I have noticed > > that whenever we've had something without tests, it > > had bugs (mostly due to my inconsistency in following > > the practice). I am trying to always always write > > tests - hence you'd have noticed that there is a > > considerable effort on improving the testing > > mechanism. I'd say that good testcases are the > > pre-requisite for good code. Code that runs without > > tests is really a fluke. I guess the aim is to combine > > solid tests with a simple, natural design. > > I completely agree with you out here. My point just being > that there are so may possibilties many of which are > unthinkable at this point of time that it is probably better > to be safe about the parsing. Any increase in test cases is > always welcome. > > Regards, > Dhaval > > |
From: <dha...@or...> - 2002-12-19 04:41:30
|
> OTOH, from the design perspective, it serves for > better encapsulation to have child scanners within the > parent scanners - otherwise - it'd be really difficult > to determine dependencies of scanners. e.g. looking at > the code of global registrations, how would you be > able to tell that an option tag can appear only in a > select tag? > > Question is - should one care? That is open to debate. > For some scanners, it does not matter (like the link > or form scanners, was links and forms can contain just > about anything inside them). But for select tags, that > is not the case. > I agree that the parent-child relationship is really good but as I said that there are so many possibilities for data inside a parent tag that we may not be able to retrieve all the information and subsequently reproduce it like the original. As for select tags as I said earlier it is possible to have either comment tags or jsp tags and the situation is true for any other type of tag as well. > Regarding potential for having bugs - I have noticed > that whenever we've had something without tests, it > had bugs (mostly due to my inconsistency in following > the practice). I am trying to always always write > tests - hence you'd have noticed that there is a > considerable effort on improving the testing > mechanism. I'd say that good testcases are the > pre-requisite for good code. Code that runs without > tests is really a fluke. I guess the aim is to combine > solid tests with a simple, natural design. I completely agree with you out here. My point just being that there are so may possibilties many of which are unthinkable at this point of time that it is probably better to be safe about the parsing. Any increase in test cases is always welcome. Regards, Dhaval |
From: Somik R. <so...@ya...> - 2002-12-18 17:54:00
|
> I had a certain design point to discuss out here. > Many times we register > scanners of one tag during the scan() operation of > another tag. An > example is registration of the HTMLOptionTagScanner > during scan() of > HTMLSelectTagScanner. > > This has an advantage of hierarchy of tags but it is > not foolproof since > between tags a host of otehrtags can also be > present. > > Using the specific example mentioned above, a jsp > tag or a commetn tag > could be present within the select tags apart from > the option tag. > > Hence I think it may be a good idea to just extract > all the data from > between the 2 tags and let an application register > more scanners and > work on this extracted data if he so requires. We > should probably just > not get into that business coz a solution has the > potential to be buggy. > This is specially so if I am trying to reproduce the > tag as output. For > only parsing procedures I guess the hierarcy is > useful but for > reproduction the solution has the tendency to be > buggy. If this'd been the older architecture, I wouldnt have agreed. But as its the newer one - with a table of scanners - the time taken to find the right scanner is O(1). Hence, it does not matter if the scanners are registered globally (in HTMLParser) or locally (in the particular scanner. This is from the performance perspective. OTOH, from the design perspective, it serves for better encapsulation to have child scanners within the parent scanners - otherwise - it'd be really difficult to determine dependencies of scanners. e.g. looking at the code of global registrations, how would you be able to tell that an option tag can appear only in a select tag? Question is - should one care? That is open to debate. For some scanners, it does not matter (like the link or form scanners, was links and forms can contain just about anything inside them). But for select tags, that is not the case. Regarding potential for having bugs - I have noticed that whenever we've had something without tests, it had bugs (mostly due to my inconsistency in following the practice). I am trying to always always write tests - hence you'd have noticed that there is a considerable effort on improving the testing mechanism. I'd say that good testcases are the pre-requisite for good code. Code that runs without tests is really a fluke. I guess the aim is to combine solid tests with a simple, natural design. Regards, Somik __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: <dha...@or...> - 2002-12-18 10:55:37
|
Hi, I am trying to write a scanner, but for some nodes I getting the node.elementBegin() value as -1. I do not understand how my node could begin from -1. I guess the lowest number possible is 2.1. Can anyone tell me when this kind of output can be expected? Dhaval -----Original Message----- From: Udani, Dhaval H. Sent: Wednesday, December 18, 2002 3:38 PM To: htmlparser-developer Cc: Udani, Dhaval H. Subject: [Htmlparser-developer] Registering Scanners Hi all, I had a certain design point to discuss out here. Many times we register scanners of one tag during the scan() operation of another tag. An example is registration of the HTMLOptionTagScanner during scan() of HTMLSelectTagScanner. This has an advantage of hierarchy of tags but it is not foolproof since between tags a host of otehrtags can also be present. Using the specific example mentioned above, a jsp tag or a commetn tag could be present within the select tags apart from the option tag. Hence I think it may be a good idea to just extract all the data from between the 2 tags and let an application register more scanners and work on this extracted data if he so requires. We should probably just not get into that business coz a solution has the potential to be buggy. This is specially so if I am trying to reproduce the tag as output. For only parsing procedures I guess the hierarcy is useful but for reproduction the solution has the tendency to be buggy. I would like to know your views on the same. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 |
From: <dha...@or...> - 2002-12-18 10:08:16
|
Hi all, I had a certain design point to discuss out here. Many times we register scanners of one tag during the scan() operation of another tag. An example is registration of the HTMLOptionTagScanner during scan() of HTMLSelectTagScanner. This has an advantage of hierarchy of tags but it is not foolproof since between tags a host of otehrtags can also be present. Using the specific example mentioned above, a jsp tag or a commetn tag could be present within the select tags apart from the option tag. Hence I think it may be a good idea to just extract all the data from between the 2 tags and let an application register more scanners and work on this extracted data if he so requires. We should probably just not get into that business coz a solution has the potential to be buggy. This is specially so if I am trying to reproduce the tag as output. For only parsing procedures I guess the hierarcy is useful but for reproduction the solution has the tendency to be buggy. I would like to know your views on the same. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 |
From: <dha...@or...> - 2002-12-18 06:55:37
|
Hi everyone, I had one more suggestions for the test cases. It would be nice to have a main method in every test case class, so that the class could be tested individually instead of runnign it as a suite alongwith other classes. Something as simple as adding (the example is for HTMLTag class) public static void main(String[] args) { new junit.awtui.TestRunner().start(new String[] {HTMLTag.class.getName()}); } to every test case class so that it can be tested independently outside the suite as well. I think this would be of great help to testing individual classes. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 |
From: <dha...@or...> - 2002-12-17 14:29:40
|
Hi Somik, I just downloaded Candidate 6 and unzipped the src.zip file. I found that it contained a src folder and also contained a docs folder which is already present in the main package and a resources folder which has files present in the bin directory of the main package. Is there any reason that the files are repeated. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 -----Original Message----- From: DerrickOswald [mailto:Der...@ro...] Sent: Tuesday, December 17, 2002 6:51 PM To: htmlparser-developer Cc: DerrickOswald Subject: Re: [Htmlparser-developer] version 1.3 I've used this POST mechanism many times, not for testing a site. A typical example is fetching a postalcode by hitting the (for me) http://www.canadapost.ca/tools/pcl/bin/default-e.asp site and posting a filled in form. This has to be parsed, and a table element extracted. Derrick Somik Raha wrote: >>Derrick Oswald wrote: >> >> >> >>>POST constructor. >>>The basically two constructors that HTMLParser has either take a >>>string URL or a HTMLReader. This shifts the onus on performing HTTP >>>to the API user for POST operations. It might be good to have a >>>HttpURLConnection or URLConnection argument constructor, where a >>>primed and loaded connection is passed to the parser. >>> >>> > >Like Sam said - this sounds like HttpUnit. Are you using the parser for >making tests ? > >Regards, >Somik > > > ------------------------------------------------------- This sf.net email is sponsored by: With Great Power, Comes Great Responsibility Learn to use your power at OSDN's High Performance Computing Channel http://hpc.devchannel.org/ _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2002-12-17 13:13:18
|
I've used this POST mechanism many times, not for testing a site. A typical example is fetching a postalcode by hitting the (for me) http://www.canadapost.ca/tools/pcl/bin/default-e.asp site and posting a filled in form. This has to be parsed, and a table element extracted. Derrick Somik Raha wrote: >>Derrick Oswald wrote: >> >> >> >>>POST constructor. >>>The basically two constructors that HTMLParser has either take a >>>string URL or a HTMLReader. This shifts the onus on performing HTTP >>>to the API user for POST operations. It might be good to have a >>>HttpURLConnection or URLConnection argument constructor, where a >>>primed and loaded connection is passed to the parser. >>> >>> > >Like Sam said - this sounds like HttpUnit. Are you using the parser for >making tests ? > >Regards, >Somik > > > |
From: Derrick O. <Der...@ro...> - 2002-12-17 13:07:38
|
This looks like a good place to start. This will mean touching a ton of files though. Somik, Youu might want to do what's necessary to CVS (tag, branch, label...whatever) to freeze version 1.2 and open up the 1.3 version as the head revision. Derrick Craig Raw wrote: >Take a look at the logging wrapper provided by Jakarta. It provides a >thin bridge between different logging APIs. > >http://jakarta.apache.org/commons/logging.html > >Craig > > |
From: Craig R. <cr...@qu...> - 2002-12-17 07:14:20
|
Take a look at the logging wrapper provided by Jakarta. It provides a thin bridge between different logging APIs. http://jakarta.apache.org/commons/logging.html Craig > > Great to have a discussion going! I'd like to branch off all the issues > into > seperate threads so that we could deal with them seperately. > > > > Logging > > > The use of a feedback object is adequate, but JDK version 1.4 has a > > > rich API, java.util.logging, that we might want to emulate (presuming > > > we don't want to force JDK 1.4 usage). > > > > I would be against forcing JDK 1.4 usage. I would recommend log4j > > http://jakarta.apache.org/log4j/docs/index.html > > > > Using either JDK 1.4 or log4j ties you down to a specific logging API. The > latter will add to the weight of the parser. (I was actually considering > log4j sometime back, but Claude Duguay convinced me otherwise) > > If however, more logging support is needed, I guess it could be added > using > a facade (or adapter) with JDK 1.4 (or log4j), externally. This is of > course open to discussion. > > Regards, > Somik > |
From: Somik R. <so...@ya...> - 2002-12-17 06:59:32
|
To add an issue of my own: Refactoring testing mechanism - I've been thinking about the testcases = that the parser is using now, and I feel it is not so intuitive, even = with the utility methods like createParser()... , to create a testcase = quickly. I was thinking that it would be important to refactor this asap - as the = parser seems to be growing fast now. How about if we keep the tests in = files - with the test HTML and expected results in text files - which = should be easy to author and add.=20 Also, I think we are looking at a shipping date of Jan 1 for 1.2. So, it = would be good if we can have some help reviewing the current design, = tests, and suggesting any last minute modifications that are essential. Derrick - thanks a ton for your work - I can see the no of tests has = jumped from 219 to 254! Cheers, Somik |
From: <dha...@or...> - 2002-12-17 06:59:22
|
> > I agree with you. This should be added to our to-do list for > 1.3. But you > have to volunteer to help us do it :). > No problemo!!! |
From: Somik R. <so...@ya...> - 2002-12-17 06:52:38
|
Derrick Oswald wrote: > beans > It might be nice to create one or more java beans that can be used > within GUI IDE's. The predefined behavior might be what the > parserapplications do now, but exposing some accessors on HTMLParser and > providing a zero arg constructor may also prove useful. > > executable jar > There is no default application for the htmlparser.jar, i.e. java -jar > htmlparser.jar doesn't do anything at the moment. A little GUI > application might be nice. I'm not talking a browser, but rather a demo > of the applications (i.e. a tree view of the links a la robot, a text > view a la StringExtractor, a list of mail addresses a la ripper etc. ). > This would utilize the beans mentioned above. Both are good ideas. Lets do this for 1.3. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-17 06:51:25
|
Dhaval Udani wrote: > Currently the parser does not store any tabs or newlines that may be present on > the HTML page. However if one wants to parse the page and reproduce it, it is > imperative that the formatting remains the same i.e. the look and feel of the > parsed page and the unparsed page do not have any difference(obviously unless > added during the parsing routine). > > I think it is worthwhile giving a thought to this. I may be very selfish in > suggesting it since my usage requires a production of the HTML page after > parsing it and adding some information depending on the tags. I agree with you. This should be added to our to-do list for 1.3. But you have to volunteer to help us do it :). Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-17 06:50:02
|
> Derrick Oswald wrote: > > > POST constructor. > > The basically two constructors that HTMLParser has either take a > > string URL or a HTMLReader. This shifts the onus on performing HTTP > > to the API user for POST operations. It might be good to have a > > HttpURLConnection or URLConnection argument constructor, where a > > primed and loaded connection is passed to the parser. Like Sam said - this sounds like HttpUnit. Are you using the parser for making tests ? Regards, Somik |
From: <dha...@or...> - 2002-12-17 06:47:15
|
> -----Original Message----- > From: somik [mailto:so...@ya...] > Sent: Tuesday, December 17, 2002 12:11 PM > To: htmlparser-developer > Cc: somik > Subject: Logging (Was: [Htmlparser-developer] version 1.3) > > > Great to have a discussion going! I'd like to branch off all > the issues into > seperate threads so that we could deal with them seperately. > > > > Logging > > > The use of a feedback object is adequate, but JDK version > 1.4 has a > > > rich API, java.util.logging, that we might want to > emulate (presuming > > > we don't want to force JDK 1.4 usage). > > > > I would be against forcing JDK 1.4 usage. I would recommend log4j > > http://jakarta.apache.org/log4j/docs/index.html > > > > Using either JDK 1.4 or log4j ties you down to a specific > logging API. The > latter will add to the weight of the parser. (I was actually > considering > log4j sometime back, but Claude Duguay convinced me otherwise) > > If however, more logging support is needed, I guess it could > be added using > a facade (or adapter) with JDK 1.4 (or log4j), externally. This is of > course open to discussion. > eah, I think that is a good idea. Have an interface for logging and allow people to plugin their own implementations. Default console-based, JDK 1.4 and Log4j based can be provided. |
From: Somik R. <so...@ya...> - 2002-12-17 06:46:55
|
Derrick Oswald wrote : > > charset > > Currently the charset directive within the HTML page is ignored. There > > may be a need to honour this parameter on the Content-Type field. I think this is the way to go. We're getting a nice to-do list for 1.3 :) Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-17 06:45:53
|
Derrick Oswald wrote : > > Tables > > The current version flattens tables, pushing the onus on the API user > > to syntactically walk through the table data to get to a certain table > > entry. It may be useful to nest table entries, similar to what the > > the FORM tag does now, but have it correctly generate rows and columns. Thats a good idea. We should have a table scanner next. This would be a good feature for 1.3. Sam Joseph wrote : > Have you looked at HTTPUnit? http://httpunit.sourceforge.net/ > > They have to deal with a lot of similar problems and there may be synergies. > I am curious to hear more about this - I am going to be using HttpUnit real soon - what sort of problems did you face ? It will be great (as always) if you can share your vision. Regards, Somik |