[Htmlparser-developer] Solution of Stack Overflow bug
Brought to you by:
derrickoswald
|
From: Somik R. <so...@ya...> - 2003-04-28 03:39:28
|
Hi Team,
First, Derrick - thanks for taking over. I cannot tell you how =
relieved I am.
Second, I've fixed a Stack Overflow bug - but not in time for this =
release. I thought I should share the solution as it could be important =
for future work in this area. It took me quite some time to fix this =
actually.
The problem:
**************
When faced with tags like :
<ul>
<li>
<ul>
<li>
<li>
<li>
... 200 more <li> tags
</ul>
</li>
</ul>
we'd end up with a stack overflow.
There were multiple problems. The first big problem was - ability to =
close tags on encountering "endtags". Prior to this, CompositeTagScanner =
was only tackling begin tags.
But the second and more dangerous problem was the correction algo =
itself. The decision to put in an end tag would happen after the next =
tag was parsed. This would cause recursion till the stack limit was =
reached - we got to see it bcos of a good bug report about a page with =
tons of li tags. After trying all sorts of ideas, I was about to settle =
on the necessity of a tree holding the stack trace, when I figured that =
the relationship can be simply represented within the Bullet, =
BulletListScanners with a stack.=20
=20
This was a special case as there were rules like :
[1] <ul> can have <li> children
[2] <li> can have <ul> children
[3] <li> cannot have <li> children
You can look at the code in BulletScanner, BulletListScanner.
Regards,
Somik |