Menu

#95 Minimizing pcre recursion limit

open
nobody
None
5
2016-01-06
2016-01-06
R22
No

Thank You for Your great work!
I just added new topic on forum https://sourceforge.net/p/phpcrawl/discussion/307696/thread/c3966b6a/
Could You make a feature, that can strip not needed tags from html source before finding references in it. For example in PHPCrawlerLinkFinder.class.php the preg_match_all() can have a lot of iterations to find references, but if we make something like $html_source = strip_tags($html_source, ''); before it - the number of iteration will be much more lower. Of course '' is user configuration, e.g. $crawler->setAllowedTags(['a', 'img']);

In my case setLinkExtractionTags(), enableAggressiveLinkSearch(false), excludeLinkSearchDocumentSections(\PHPCrawlerLinkSearchDocumentSections::ALL_SPECIAL_SECTIONS) does not help.
Best regards.

Discussion

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.