Menu

#95 Minimizing pcre recursion limit

open
nobody
None
5
2016-01-06
2016-01-06
R22
No

Thank You for Your great work!
I just added new topic on forum https://sourceforge.net/p/phpcrawl/discussion/307696/thread/c3966b6a/
Could You make a feature, that can strip not needed tags from html source before finding references in it. For example in PHPCrawlerLinkFinder.class.php the preg_match_all() can have a lot of iterations to find references, but if we make something like $html_source = strip_tags($html_source, ''); before it - the number of iteration will be much more lower. Of course '' is user configuration, e.g. $crawler->setAllowedTags(['a', 'img']);

In my case setLinkExtractionTags(), enableAggressiveLinkSearch(false), excludeLinkSearchDocumentSections(\PHPCrawlerLinkSearchDocumentSections::ALL_SPECIAL_SECTIONS) does not help.
Best regards.

Discussion

Anonymous
Anonymous

Add attachments
Cancel