Share

PHP Simple HTML DOM Parser

Tracker: Patches

5 Allows to handle pages with a loto of comments - ID: 2892467
Last Update: Comment added ( guglielmocelata )

When a page with a lot of comments is parsed, the remove_noise and
restore_noise functions only deal well with the first 1000 patterns met.
Whenever the number of *noise* patterns is greater, weird results are
obtained and html code containing comments is misplaced.

The patch moves the limit to 100 thousands.


Guglielmo Celata ( guglielmocelata ) - 2009-11-05 09:27

5

Open

None

Nobody/Anonymous

None

None

Public


Comment ( 1 )




Date: 2009-11-05 09:31
Sender: guglielmocelata

The issue is described in a more detailed way here:
http://dlogging.wordpress.com/2009/11/05/patching-simple-html-dom-php-library-to-have-it-work-with-files-with-a-lot-of-noise/



Log in to comment.

Attached File ( 1 )

Filename Description Download
simple_html_dom.patch Patch to correctly parse pages with a lot of comments. Download

Change ( 1 )

Field Old Value Date By
File Added 349609: simple_html_dom.patch 2009-11-05 09:27 guglielmocelata