CheckNodeIntegrity() has O(N^2) performance in the
number of children of the given node. We observed this
issue on wikipedia.org manifest itself by seeing the
site crash every time some spammer posted 500 KB of
links. Because of its shocking time order,
CheckNodeIntegrity() will likely be the dominant part
of any parsing operation on a large document. For my
test case with 10,000 links, it used about 98% of the
execution time.
Luckily, the problem is easily fixed, and I've done so,
see the attached patch. It's also available at:
http://wikimedia.org/~tstarling/tidy/faster_integrity_check.patch
It's also possible to fix the problem by defining
NO_NODE_INTEGRITY_CHECK, but I thought you might be
happier with optimisation rather than removal.
-- Tim Starling
Nobody/Anonymous
HTML/XHTML Parser
Current - all platforms
Public
|
Date: 2008-12-12 02:34 comment6, <a |
|
Date: 2008-12-12 02:19 comment6, <a |
|
Date: 2008-12-12 02:05 comment1, <a |
|
Date: 2008-12-11 20:37 comment6, <a href="http://breakstuff.awardspace.com/grace-slick.html">grace |
|
Date: 2008-12-11 20:23 comment3, <a href="http://faithlb.awardspace.com/bridges.html">bridges</a>, |
|
Date: 2008-12-11 20:09 comment5, <a |
|
Date: 2008-12-11 19:58 comment2, <a |
|
Date: 2008-12-11 19:47 comment3, <a |
|
Date: 2008-12-11 19:35 comment1, <a |
|
Date: 2008-12-11 15:11 comment2, <a |
|
Date: 2008-12-11 15:00 comment3, <a |
|
Date: 2008-12-11 14:50 comment2, <a |
|
Date: 2008-12-11 07:07 uRtDmZ <a href="http://zhdhhjffoqzz.com/">zhdhhjffoqzz</a>, |
|
Date: 2008-11-26 18:51 comment4, <a |
|
Date: 2008-11-26 16:55 comment2, <a |
|
Date: 2008-11-26 16:40 , <a href="http://www.lahoria.com/forums/member.php?u=323">koeb viagra</a>, |
|
Date: 2008-11-26 14:59 comment4, <a href="http://www.vbdesigns.de/member.php?u=39251">viagra ohne |
|
Date: 2008-11-26 14:44 , <a href="http://www.liveforum.dk/forum/member.php?u=7449">buy viagra |
|
Date: 2008-11-26 13:00 comment2, <a href="http://forum.lasante.net/members/viagra.html">viagra |
|
Date: 2008-11-26 12:46 , <a href="http://www.lahoria.com/forums/member.php?u=323">viagra</a>, |
|
Date: 2008-11-26 11:02 comment6, <a |
|
Date: 2008-11-26 10:50 , <a href="http://www.liveforum.dk/forum/member.php?u=7449">generic |
|
Date: 2008-11-26 09:08 comment5, <a href="http://www.liveforum.dk/forum/member.php?u=7450">koeb |
|
Date: 2008-11-26 08:57 , <a href="http://www.lahoria.com/forums/member.php?u=323">koeb viagra</a>, |
|
Date: 2008-11-26 07:11 comment5, <a href="http://www.lahoria.com/forums/member.php?u=324">kГёb |
|
Date: 2008-11-26 05:17 comment5, <a href="http://www.lahoria.com/forums/member.php?u=324">kГёb |
|
Date: 2008-11-24 19:19 LeIP0C <a href="http://nhjcexklhvil.com/">nhjcexklhvil</a>, |
|
Date: 2008-11-13 10:06 t7BKAU <a href="http://pllgiecmhbcz.com/">pllgiecmhbcz</a>, |
|
Date: 2008-11-02 01:30 6zW6Bg <a href="http://zjhthbedwccp.com/">zjhthbedwccp</a>, |
|
Date: 2008-10-28 09:01 I know that you believe that you understood what you think I said, but I am |
|
Date: 2008-10-10 06:27 zGp6pB <a href="http://lypmztnobpzw.com/">lypmztnobpzw</a>, |
|
Date: 2007-12-05 20:55 Logged In: NO |
|
Date: 2007-11-28 19:35 Logged In: NO |
|
Date: 2005-12-03 03:20 Logged In: YES |
|
Date: 2005-11-02 09:31 Logged In: YES |
|
Date: 2005-11-02 07:30 Logged In: YES |
| Filename | Description | Download |
|---|---|---|
| faster_integrity_check.patch | patch | Download |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Pending | 2005-12-03 03:20 | sf-robot |
| close_date | 2005-11-02 09:31 | 2005-12-03 03:20 | sf-robot |
| close_date | - | 2005-11-02 09:31 | arnaud02 |
| status_id | Open | 2005-11-02 09:31 | arnaud02 |
| resolution_id | None | 2005-11-02 09:31 | arnaud02 |
| File Added | 154732: faster_integrity_check.patch | 2005-11-02 06:58 | timstarling |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use