Thread: [Htmlparser-developer] Java Performance question
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-01-18 08:33:19
|
Hi Folks, I was tinkering around with instanceof and under the impression that = it causes a performance hit, I tried replacing it with a polymorphic = mechanism - by which HTMLNode has a method getType(), and so do the = other basic nodes. A match is then attempted like so : if (node.getType()=3D=3DHTMLTag.TYPE) instead of if (node instanceof HTMLTag) I have taken care that getType does not do object creation - it is a = static object. One would expect the former to be faster. But in a performance test (InstanceofPerformanceTest in = org.htmlparser.tests) - I find the opposite behaviour. Here's a graph showing the response of instanceof in blue and = getType()=3D=3DHTMLTag.TYPE in pink - http://htmlparser.sourceforge.net/design/pics/performance.gif Does anyone have explanations ? Regards, Somik |
From: Derrick O. <Der...@ro...> - 2003-01-18 14:29:44
|
Somik, I think there are a couple of reasons. First is your instanceof test is always immediately succeeding. The penalty for instanceof is when it has to walk the inheritance heirarchy (usually all the way up to Object) to determine failure, which would happen often if you were trying to determine what to do with an unknown node type. Second, your getType() involves a virtual method call that would normally not be done more than once. That is, you would typically get the unknown type once and compare it to each of the final types you are aware of, which would effectively move line 45 of the second dissassembly below (generated by "javap -c org.htmlparser.tests.InstanceofPerformanceTest") out of the kernel loop and replace it with a "lload 8" probably: Method void doInstanceofTest(long[], int, long) <snip> 35 lconst_0 // for (i = 0 36 lstore 7 38 goto 57 41 aload_0 // this 42 getfield #10 <Field org.htmlparser.HTMLNode node> // get InstancofPerformanceTest 'node' member variable 45 instanceof #21 <Class org.htmlparser.tags.HTMLTag> // node instanceof HTMLTag 48 ifeq 51 // { } 51 lload 7 // i++ 53 lconst_1 54 ladd 55 lstore 7 57 lload 7 // i < numTimes 59 lload_3 60 lcmp 61 iflt 41 // repeat </snip> Method void doGetTypeTest(long[], int, long) <snip> 35 lconst_0 // for (i = 0 36 lstore 7 38 goto 59 41 aload_0 // this 42 getfield #10 <Field org.htmlparser.HTMLNode node> // get InstancofPerformanceTest 'node' member variable 45 invokevirtual #23 <Method java.lang.String getType()> // getType() virtual method call 48 ldc #24 <String "NODE"> // 'retrieve' String "NODE" from HTMLNode, but since it's final it's a local copy 50 if_acmpne 53 // == 53 lload 7 // i++ 55 lconst_1 56 ladd 57 lstore 7 59 lload 7 // i < numTimes 61 lload_3 62 lcmp 63 iflt 41 // repeat </snip> A fairer test might be: type = node.getType(); for (... if (type == "BOGUS") {} else if (type == "FAKE") {} else if (type == "NODE") {} vs. for (... if (node instanceof HTMLFrameTag) {} else if (node instanceof HTMLFormTag) {} else if (node instanceof HTMLNode) {} Derrick Somik Raha wrote: > Hi Folks, > I was tinkering around with instanceof and under the impression > that it causes a performance hit, I tried replacing it with a > polymorphic mechanism - by which HTMLNode has a method getType(), and > so do the other basic nodes. A match is then attempted like so : > if (node.getType()==HTMLTag.TYPE) > > instead of > > if (node instanceof HTMLTag) > > I have taken care that getType does not do object creation - it is a > static object. One would expect the former to be faster. > But in a performance test (InstanceofPerformanceTest in > org.htmlparser.tests) - I find the opposite behaviour. > Here's a graph showing the response of instanceof in blue and > getType()==HTMLTag.TYPE in pink - > http://htmlparser.sourceforge.net/design/pics/performance.gif > > Does anyone have explanations ? > > Regards, > Somik |
From: Somik R. <so...@ya...> - 2003-01-20 07:19:32
Attachments:
test2.gif
|
Hi Derrick, It was really nice to read your reply. I tried a more accurate test (no, I didnt include instanceof HTMLNode, as our matches are at most one level up). The results (attached graph) show that it is almost the same - there is no perceivable improvement in this case. I guess if one goes a couple of layers up, the benefits would start to show. Which brings me to the next question - knowing that we have no perceptible improvement to gain, should we recommend the use of the object-oriented way ? Regards, Somik ----- Original Message ----- From: "Derrick Oswald" <Der...@ro...> To: <htm...@li...> Sent: Saturday, January 18, 2003 6:36 AM Subject: Re: [Htmlparser-developer] Java Performance question > Somik, > > I think there are a couple of reasons. First is your instanceof test is > always immediately succeeding. The penalty for instanceof is when it has > to walk the inheritance heirarchy (usually all the way up to Object) to > determine failure, which would happen often if you were trying to > determine what to do with an unknown node type. Second, your getType() > involves a virtual method call that would normally not be done more than > once. That is, you would typically get the unknown type once and compare > it to each of the final types you are aware of, which would effectively > move line 45 of the second dissassembly below (generated by "javap -c > org.htmlparser.tests.InstanceofPerformanceTest") out of the kernel loop > and replace it with a "lload 8" probably: > > Method void doInstanceofTest(long[], int, long) > <snip> > 35 lconst_0 // for (i = 0 > 36 lstore 7 > 38 goto 57 > 41 aload_0 // this > 42 getfield #10 <Field org.htmlparser.HTMLNode node> // get > InstancofPerformanceTest 'node' member variable > 45 instanceof #21 <Class org.htmlparser.tags.HTMLTag> // node > instanceof HTMLTag > 48 ifeq 51 // { } > 51 lload 7 // i++ > 53 lconst_1 > 54 ladd > 55 lstore 7 > 57 lload 7 // i < numTimes > 59 lload_3 > 60 lcmp > 61 iflt 41 // repeat > </snip> > > > Method void doGetTypeTest(long[], int, long) > <snip> > 35 lconst_0 // for (i = 0 > 36 lstore 7 > 38 goto 59 > 41 aload_0 // this > 42 getfield #10 <Field org.htmlparser.HTMLNode node> // get > InstancofPerformanceTest 'node' member variable > 45 invokevirtual #23 <Method java.lang.String getType()> // > getType() virtual method call > 48 ldc #24 <String "NODE"> // 'retrieve' String "NODE" from > HTMLNode, but since it's final it's a local copy > 50 if_acmpne 53 // == > 53 lload 7 // i++ > 55 lconst_1 > 56 ladd > 57 lstore 7 > 59 lload 7 // i < numTimes > 61 lload_3 > 62 lcmp > 63 iflt 41 // repeat > </snip> > > A fairer test might be: > > type = node.getType(); > for (... > if (type == "BOGUS") > {} > else if (type == "FAKE") > {} > else if (type == "NODE") > {} > > vs. > > for (... > if (node instanceof HTMLFrameTag) > {} > else if (node instanceof HTMLFormTag) > {} > else if (node instanceof HTMLNode) > {} > > Derrick > > > Somik Raha wrote: > > > Hi Folks, > > I was tinkering around with instanceof and under the impression > > that it causes a performance hit, I tried replacing it with a > > polymorphic mechanism - by which HTMLNode has a method getType(), and > > so do the other basic nodes. A match is then attempted like so : > > if (node.getType()==HTMLTag.TYPE) > > > > instead of > > > > if (node instanceof HTMLTag) > > > > I have taken care that getType does not do object creation - it is a > > static object. One would expect the former to be faster. > > But in a performance test (InstanceofPerformanceTest in > > org.htmlparser.tests) - I find the opposite behaviour. > > Here's a graph showing the response of instanceof in blue and > > getType()==HTMLTag.TYPE in pink - > > http://htmlparser.sourceforge.net/design/pics/performance.gif > > > > Does anyone have explanations ? > > > > Regards, > > Somik > > > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: Thawte.com - A 128-bit supercerts will > allow you to extend the highest allowed 128 bit encryption to all your > clients even if they use browsers that are limited to 40 bit encryption. > Get a guide here:http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0030en > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2003-01-20 13:20:35
|
Somik, My instincts would say to use the simplest mechanism possible. In this case it would be instanceof, since the getType() way involves extra fields and accessor methods. But what problem are you trying to solve? Is it the "if (node instanceof HTMLLinkTag)" that seems to be needed everywhere? Perhaps HTMLNode should have a "getLink()" method that returns null but is overridden in HTMLLinkTag? Similarly, rationalization of toString(), getPlainTextString(), getHTML() and any required new methods to return appropriate renditions of the text within the node could eliminate the instanceof operations in StringExtractor and elsewhere. My $0.02 worth. Derrick Somik Raha wrote: >Hi Derrick, > It was really nice to read your reply. I tried a more accurate test (no, >I didnt include instanceof HTMLNode, as our matches are at most one level >up). The results (attached graph) show that it is almost the same - there is >no perceivable improvement in this case. I guess if one goes a couple of >layers up, the benefits would start to show. > > Which brings me to the next question - knowing that we have no >perceptible improvement to gain, should we recommend the use of the >object-oriented way ? > >Regards, >Somik > >----- Original Message ----- >From: "Derrick Oswald" <Der...@ro...> >To: <htm...@li...> >Sent: Saturday, January 18, 2003 6:36 AM >Subject: Re: [Htmlparser-developer] Java Performance question > > |
From: Joshua K. <jo...@in...> - 2003-01-20 17:19:13
|
> Similarly, rationalization of toString(), getPlainTextString(), > getHTML() and any required new methods to return appropriate renditions > of the text within the node could eliminate the instanceof operations in > StringExtractor and elsewhere. Speaking of the StringExtractor, are there no tests for that class? I couldn't find any. thanks, jk |
From: Somik R. <so...@ya...> - 2003-01-21 21:51:31
|
Hi Derrick, > But what problem are you trying to solve? My sole intention was to get a performance improvement if "instanceof" turned out to be guilty of being a bottleneck. But it looks like the performance of instanceof is really good - so I'm now thinking of sticking to that (as you suggested as well). Regards, Somik |