[Htmlparser-developer] Solution of Stack Overflow bug
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-04-28 03:39:28
|
Hi Team, First, Derrick - thanks for taking over. I cannot tell you how = relieved I am. Second, I've fixed a Stack Overflow bug - but not in time for this = release. I thought I should share the solution as it could be important = for future work in this area. It took me quite some time to fix this = actually. The problem: ************** When faced with tags like : <ul> <li> <ul> <li> <li> <li> ... 200 more <li> tags </ul> </li> </ul> we'd end up with a stack overflow. There were multiple problems. The first big problem was - ability to = close tags on encountering "endtags". Prior to this, CompositeTagScanner = was only tackling begin tags. But the second and more dangerous problem was the correction algo = itself. The decision to put in an end tag would happen after the next = tag was parsed. This would cause recursion till the stack limit was = reached - we got to see it bcos of a good bug report about a page with = tons of li tags. After trying all sorts of ideas, I was about to settle = on the necessity of a tree holding the stack trace, when I figured that = the relationship can be simply represented within the Bullet, = BulletListScanners with a stack.=20 =20 This was a special case as there were rules like : [1] <ul> can have <li> children [2] <li> can have <ul> children [3] <li> cannot have <li> children You can look at the code in BulletScanner, BulletListScanner. Regards, Somik |