From: Mike B. <mb...@Ga...> - 2003-06-02 17:51:00
|
As you've probably already noticed, cvs commit message are now being directed to this list. New bug reports/feature requests should also be showing up here automatically as well but I haven't experimented with that yet. If anyone knows how to pretty up the title of the commit messages then please let me know. I tried a couple of things but everytime I changed the filter from the default value, it stopped sending out notifications. -- Mike Bowler Principal, Gargoyle Software Inc. Voice: (416) 822-0973 | Email : mb...@Ga... Fax : (416) 822-0975 | Website: http://www.GargoyleSoftware.com |
From: Christian S. <chr...@ne...> - 2003-06-20 13:59:01
|
Hello, has anybody ever considered implementing a dedicated HTML DOM for use with HtmlUnit? Right now HU uses Xerces/Neko to parse the input and create an internal DOM. However, that DOM is only used by HtmlUnit to lookup structural information. Additionally, a parallel DOM is maintained that keeps the extra information required by Htmlunit, and there is a constant mapping between the two. Heres what I find: 1. theres no need to let Xerces create a HTML DOM, as it currently does. It would suffice to use a simple XML DOM, because that is all HtmlUnit requires (and uses). This would improve effciency during parsing, and could be achieved by configuring Neko accordingly. 2. there would be a really significant performance improvement, and simplification of the code, if the 2 DOMs would be unified into one. Basically this would require implementing the whole DOM interface in HtmlUnit - which isnt that hard. I am considering to use HU in a load test scenario - thats why I am concerned about performance. I realize this may not be the case with many others. comments? Christian |
From: Mike B. <mb...@Ga...> - 2003-06-20 15:02:25
|
> has anybody ever considered implementing a dedicated HTML DOM for use > with HtmlUnit? Perhaps I'm just slow today but I'm not exactly sure what you're proposing. Could you elaborate a bit on this with some specifics? Part of what confuses me is the reference to two DOM's. There are three hierarchies (HtmlElement, Element, SimpleScriptable) but only one of them is a DOM. > I am considering to use HU in a load test scenario - thats why I am > concerned about performance. I realize this may not be the case with > many others. I *do* use HtmlUnit for load testing and yes, there are performance problems although I suspect that these are due more to poor memory usage than anything else. I get OutOfMemoryError's regularly when stress testing and this is a problem I've been trying to isolate. -- Mike Bowler Principal, Gargoyle Software Inc. Voice: (416) 822-0973 | Email : mb...@Ga... Fax : (416) 822-0975 | Website: http://www.GargoyleSoftware.com |
From: Christian S. <chr...@ne...> - 2003-06-21 14:35:14
|
Mike Bowler wrote: > > has anybody ever considered implementing a dedicated HTML DOM for use > > with HtmlUnit? > > Perhaps I'm just slow today but I'm not exactly sure what you're > proposing. Could you elaborate a bit on this with some specifics? > > Part of what confuses me is the reference to two DOM's. There are three > hierarchies (HtmlElement, Element, SimpleScriptable) but only one of > them is a DOM. I was using the term DOM a bit loosely - i.e. not in the strict sense of org.w3c.dom.*. I was referring to the HtmlElement and Element hierarchies. Basically, every Element gets a corresponding HtlmElement (lazily created). It seems combining in one Hierarchy these would yield great benefit. Christian |
From: Christian S. <chr...@ne...> - 2003-06-23 22:39:15
|
Hi, I'd like to get back to this question, as it was not answered. Would it make sense to unify the HtmlElement and Element hierarchies? It seems to me there would be some potential for both maintenance and resource optimization. Christian Mike Bowler wrote: > > has anybody ever considered implementing a dedicated HTML DOM for use > > with HtmlUnit? > > Perhaps I'm just slow today but I'm not exactly sure what you're > proposing. Could you elaborate a bit on this with some specifics? > > Part of what confuses me is the reference to two DOM's. There are three > hierarchies (HtmlElement, Element, SimpleScriptable) but only one of > them is a DOM. I was using the term DOM a bit loosely - i.e. not in the strict sense of org.w3c.dom.*. I was referring to the HtmlElement and Element hierarchies. Basically, every Element gets a corresponding HtlmElement (lazily created). It seems combining in one Hierarchy these would yield great benefit. Christian |
From: Mike B. <mb...@Ga...> - 2003-06-24 00:21:21
|
Christian Sell wrote: > I'd like to get back to this question, as it was not answered. I've actually been trying to respond but your ISP is blocking emails coming from my ISP, apparently because of "net abuse" which my ISP claims they can't do anything about. > Would it > make sense to unify the HtmlElement and Element hierarchies? It seems to > me there would be some potential for both maintenance and resource > optimization. I assume from your last private email that NekoHTML can be configured to build a DOM tree from HtmlElement classes rather than from the w3c DOM classes. Is this correct? If so, where would I find information on it? -- Mike Bowler Principal, Gargoyle Software Inc. Voice: (416) 822-0973 | Email : mb...@Ga... Fax : (416) 822-0975 | Website: http://www.GargoyleSoftware.com |
From: Christian S. <chr...@ne...> - 2003-06-24 08:25:42
|
Mike Bowler wrote: > Christian Sell wrote: > >> I'd like to get back to this question, as it was not answered. > > > I've actually been trying to respond but your ISP is blocking emails > coming from my ISP, apparently because of "net abuse" which my ISP > claims they can't do anything about. you must be on packard bell!? I had a lengthy discussion with my provider, whether their spam strategy was appropriate - simply blocking out half the internet community... Anyway, my intention was never to send private email - the reply-to setting of this list caused it. > >> Would it make sense to unify the HtmlElement and Element hierarchies? >> It seems to me there would be some potential for both maintenance and >> resource optimization. > > > I assume from your last private email that NekoHTML can be configured to > build a DOM tree from HtmlElement classes rather than from the w3c DOM > classes. Is this correct? If so, where would I find information on it? > well, I see 2 ways to achieve this: 1. Use Nekos SAX interface. This would allow you to build your own custom hierarchy without ever caring about the w3c interfaces. 2. Use the DOM interface. There is a property in neko/xerces that allows you to set the org.w3c.dom.Document implementation implementation class. That classes methods are called during parsing to create the individual nodes. Of course this would mean you have to implement the full org.w3c.dom interfaces. The property name is "http://apache.org/xml/properties/dom/document-class-name" regards, Christian |
From: Mike B. <mb...@Ga...> - 2003-06-24 10:15:14
|
Christian Sell wrote: > you must be on packard bell!? Rogers.com, a Canadian cable modem provider. > well, I see 2 ways to achieve this: > > 1. Use Nekos SAX interface. This would allow you to build your own > custom hierarchy without ever caring about the w3c interfaces > > 2. Use the DOM interface. There is a property in neko/xerces that > allows you to set the org.w3c.dom.Document implementation > implementation class. That classes methods are called during parsing > to create the individual nodes. Of course this would mean you have to > implement the full org.w3c.dom interfaces. > > The property name is > "http://apache.org/xml/properties/dom/document-class-name" Ok, I think I understand what you're proposing. The first approach seems cleaner (Not use DOM at all) but I've got this nagging thought that not having a DOM would break something important. I'll think about this see if I can figure out what it would break if we did this. In the meantime, comments from others would be welcome. This would be a fairly significant change. -- Mike Bowler Principal, Gargoyle Software Inc. Voice: (416) 822-0973 | Email : mb...@Ga... Fax : (416) 822-0975 | Website: http://www.GargoyleSoftware.com |
From: Christian S. <chr...@ne...> - 2003-06-24 10:32:56
|
Mike Bowler wrote: > Christian Sell wrote: > > you must be on packard bell!? > > Rogers.com, a Canadian cable modem provider. > > > well, I see 2 ways to achieve this: > > > > 1. Use Nekos SAX interface. This would allow you to build your own > > custom hierarchy without ever caring about the w3c interfaces > > > > 2. Use the DOM interface. There is a property in neko/xerces that > > allows you to set the org.w3c.dom.Document implementation > > implementation class. That classes methods are called during parsing > > to create the individual nodes. Of course this would mean you have to > > implement the full org.w3c.dom interfaces. > > > > The property name is > > "http://apache.org/xml/properties/dom/document-class-name" > > Ok, I think I understand what you're proposing. The first approach > seems cleaner (Not use DOM at all) but I've got this nagging thought > that not having a DOM would break something important. I'll think about > this see if I can figure out what it would break if we did this. I am not sure how writing to the DOM from within Javascript is done. If those routines depend on the w3c DOM, then would be a problem > > In the meantime, comments from others would be welcome. This would be a > fairly significant change. > |
From: Mike B. <mbr...@vi...> - 2003-06-24 15:33:01
|
> In the meantime, comments from others would be welcome. This would be a > fairly significant change. I don't understand the ramifications of this enough to have a useful opinion, however if possible it would be nice to make it possible to use Apache's XPath API on the resulting data structure. I assume this means that it must be a DOM, but I'm only assuming. Mike Bresnahan |
From: Christian S. <chr...@ne...> - 2003-06-24 16:04:39
|
Mike Bresnahan wrote: >>In the meantime, comments from others would be welcome. This would be a >>fairly significant change. > > > I don't understand the ramifications of this enough to have a useful > opinion, however if possible it would be nice to make it possible to use > Apache's XPath API on the resulting data structure. I assume this means > that it must be a DOM, but I'm only assuming. I agree that XPath is really useful. Almost all the XPath implementations I know (apache JXPath, Jaxen) support plugging in custom models. The XPath implementation that comes as part of Xerces (never used) probably doesnt. In any case one can assume that w3c DOM is already implemented, while everything else requires more work. That indeed speaks for either implementing the w3c DOM interfaces (why not?), or leaving everything as is, IMO. > > Mike Bresnahan > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: INetU > Attention Web Developers & Consultants: Become An INetU Hosting Partner. > Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! > INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php > _______________________________________________ > HtmlUnit-develop mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-develop > |
From: Mike B. <mbr...@vi...> - 2003-06-24 16:16:22
|
> I agree that XPath is really useful. Almost all the XPath > implementations I know (apache JXPath, Jaxen) support plugging in custom > models. The XPath implementation that comes as part of Xerces (never > used) probably doesnt. In any case one can assume that w3c DOM is > already implemented, while everything else requires more work. > > That indeed speaks for either implementing the w3c DOM interfaces (why > not?), or leaving everything as is, IMO. I guess I don't care which XPath implementation is used. I wasn't actually aware of any other than the one that comes with xerxes. How does the custom model plugging work with JXPath and Jaxen? How much work would it be to plug in HtmlUnit's current data structure? Mike |
From: Christian S. <chr...@ne...> - 2003-06-24 16:57:18
|
Mike Bresnahan wrote: >>I agree that XPath is really useful. Almost all the XPath >>implementations I know (apache JXPath, Jaxen) support plugging in custom >>models. The XPath implementation that comes as part of Xerces (never >>used) probably doesnt. In any case one can assume that w3c DOM is >>already implemented, while everything else requires more work. >> >>That indeed speaks for either implementing the w3c DOM interfaces (why >>not?), or leaving everything as is, IMO. > > > I guess I don't care which XPath implementation is used. I wasn't actually > aware of any other than the one that comes with xerxes. How does the > custom model plugging work with JXPath and Jaxen? How much work would it be > to plug in HtmlUnit's current data structure? implementing a custom navigator (Jaxen terminology) is quite easy (like 1 or 2 hours). JXPath should be no different. For using XPath with HtmlUnit as is there is no work required, because the w3c DOM is accessible from the HtmlPage. Of course, thre is always the mapping between Elements and HtmlElements which has to be done, depending which one you need. > > Mike > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: INetU > Attention Web Developers & Consultants: Become An INetU Hosting Partner. > Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! > INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php > _______________________________________________ > HtmlUnit-develop mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-develop > |
From: Christian S. <chr...@ne...> - 2003-07-01 16:31:50
|
Hello all, I would like to return to the subject of unifying the internal element hierarchies in HtmlUnit, in particular the HtmlElement and Element hierarchies. I am considering to take on this task. Our discussion showed that there would be advantages in having the HtmlElement hierarchiy implement the w3c.dom interfaces, as this would offer a standard API - in particular all XPath implementations will support w3c.dom out of the box. It would also enable us to simply configure the XML parser with the custom DOM implementation and thus have it construct the correct tree right away. I have looked into this, and come to the conclusion that cannibalizing an existing DOM implementation (I chose crimson), and subclassing the HtmlElements off from there would be the easiest way to go. Of course this is still an operation that goes through the internals of HtmlUnit. To avoid conflicts I would rename the packages of the cannibalized DOM implementation into the HtmlUnit namespace. As I said, I am willing to take this on. Does anyone have comments/recommendations/objections? Is there a likelihood that the result will be merged back into the code base? thanks, Christian |
From: Mike B. <mb...@Ga...> - 2003-07-01 20:57:08
|
Christian Sell wrote: > Our discussion showed that there > would be advantages in having the HtmlElement hierarchiy implement the > w3c.dom interfaces, as this would offer a standard API - in particular > all XPath implementations will support w3c.dom out of the box. I'm not convinced that there is any significant benefit in this respect. The DOM is already available which means that you can do XPath right now. There *are* benefits to removing one of the three hierarchies but offering a "standard api" isn't one of them IMO. The proposed change would affect just about every part of HtmlUnit so I'm reluctant to accept something like this without seeing significant benefits. The only significant benefit to this proposal that I can see is a reduction of objects required to represent one html page in memory. Am I overlooking something? > I chose crimson Why crimson? > Is there a likelihood that the > result will be merged back into the code base? If the change does not break backwards compatibility with existing users of HtmlUnit and the final result is better in memory usage or ease of maintainability then I expect it would be accepted. If it does require breaking backward compatibility then it would depend how serious that break was. Clearly I want to accept patches that make the product better but I will be very leery of accepting a patch that makes such fundamental changes to the way the product works. It would have to go through a much higher level of regression testing than most changes would. So the short answer is, yes it would be accepted if I was confident that it didn't break anything. Making me feel confident about such a drastic change may be difficult. -- Mike Bowler Principal, Gargoyle Software Inc. Voice: (416) 822-0973 | Email : mb...@Ga... Fax : (416) 822-0975 | Website: http://www.GargoyleSoftware.com |
From: Christian S. <chr...@ne...> - 2003-07-02 09:02:47
|
Mike Bowler wrote: > Christian Sell wrote: > >> Our discussion showed that there would be advantages in having the >> HtmlElement hierarchiy implement the w3c.dom interfaces, as this would >> offer a standard API - in particular all XPath implementations will >> support w3c.dom out of the box. > > > I'm not convinced that there is any significant benefit in this respect. > The DOM is already available which means that you can do XPath right now. the benefit here is over implementing a non-w3c DOM. With regard to HtmlUnit, the (small) benefit would be that you dont need to go through the "getXmlElement, applyXPath, getHtmlElement" steps when doing XPath. > > There *are* benefits to removing one of the three hierarchies but > offering a "standard api" isn't one of them IMO. The proposed change > would affect just about every part of HtmlUnit so I'm reluctant to > accept something like this without seeing significant benefits. I dont think the changes would be that sweeping. I see them mostly (if not completely) limited to the com.gargoylesoftware.htmlunit.html package > > The only significant benefit to this proposal that I can see is a > reduction of objects required to represent one html page in memory. Am > I overlooking something? and maintenance, as you also mention. And slightly simplified API (see above). Some performance as well. > >> I chose crimson > > > Why crimson? because it seemed small and easy to isolate (already done it). > >> Is there a likelihood that the >> result will be merged back into the code base? > > > If the change does not break backwards compatibility with existing users > of HtmlUnit and the final result is better in memory usage or ease of > maintainability then I expect it would be accepted. If it does require > breaking backward compatibility then it would depend how serious that > break was. Clearly I want to accept patches that make the product > better but I will be very leery of accepting a patch that makes such > fundamental changes to the way the product works. I think compatibililty is rather easy to achieve, as it would mostly amount to having both getXmlElement and getHtmlElement methods just return the parameter/callee as is. I would however prefer making both APIs deprecated and add a simple getElement. > It would have to go > through a much higher level of regression testing than most changes would. > > So the short answer is, yes it would be accepted if I was confident that > it didn't break anything. Making me feel confident about such a drastic > change may be difficult. > I certainly understand your reservations. It would be "open heart surgery", so much is true. And the benefits arent immediately obvious either. So, before I set out, it would be good to hear what exactly you require to make you confident. |
From: Mike B. <mb...@Ga...> - 2003-07-02 10:23:11
|
Christian Sell wrote: > So, before I set out, it would be good to hear what exactly you require > to make you confident. Obviously all the existing unit tests would have to pass (ignoring those tests that may become redundant due to the changes) and new tests would have to be added as needed. I'd want to do an "experimental" build with those changes to let users of HtmlUnit see if it broke any of their code. The answer I'd want to hear from these people is "nothing changed - all my code still just works fine" Normally just the unit tests would be enough but this is a large enough change that I'd want to take extra precautions. You've probably already seen this document but in case you haven't, please read it before submitting patches -> http://htmlunit.sourceforge.net/submittingPatches.html -- Mike Bowler Principal, Gargoyle Software Inc. Voice: (416) 822-0973 | Email : mb...@Ga... Fax : (416) 822-0975 | Website: http://www.GargoyleSoftware.com |
From: Christian S. <chr...@ne...> - 2003-07-15 15:45:57
|
Hello, does HtmlUnit honor the meta attribute below? If not, plans to do so? <meta http-equiv=refresh content="0; url='/ams/servlet/session.init'"> thanks, christian |
From: Mike B. <mb...@Ga...> - 2003-07-15 16:23:08
|
Christian Sell wrote: > does HtmlUnit honor the meta attribute below? If not, plans to do so? > > <meta http-equiv=refresh content="0; url='/ams/servlet/session.init'"> No, it doesn't. Please open a feature request. -- Mike Bowler Principal, Gargoyle Software Inc. Voice: (416) 822-0973 | Email : mb...@Ga... Fax : (416) 822-0975 | Website: http://www.GargoyleSoftware.com |
From: Mike B. <mb...@Ga...> - 2003-06-24 16:18:57
|
Mike Bresnahan wrote: > I don't understand the ramifications of this enough to have a useful > opinion, however if possible it would be nice to make it possible to > use Apache's XPath API on the resulting data structure. I assume this > means that it must be a DOM, but I'm only assuming. Good point. Whatever is done, I don't want to prevent xpath support. -- Mike Bowler Principal, Gargoyle Software Inc. Voice: (416) 822-0973 | Email : mb...@Ga... Fax : (416) 822-0975 | Website: http://www.GargoyleSoftware.com |