yes, your fix (stripping away all html-comments) will work fine im most cases.
But you may get problems if an html-comment is really long because of the way phpcrawl
searchers for links.
The crawler searches for links "in the html-stream", that means it searches for links just in
portions of the html-code (10kb-portions i think).
So if a html-comment starts in one portion and ends in the next one, your fix will fail.
I don't have a solution for that right now but i will think about it.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there any way to skip over comments within the html and not crawl any links within them?
Thanks
Steve
I think i've solved the problem
add
just above the line
in the page request file.
Steve
Hi Steve,
yes, your fix (stripping away all html-comments) will work fine im most cases.
But you may get problems if an html-comment is really long because of the way phpcrawl
searchers for links.
The crawler searches for links "in the html-stream", that means it searches for links just in
portions of the html-code (10kb-portions i think).
So if a html-comment starts in one portion and ends in the next one, your fix will fail.
I don't have a solution for that right now but i will think about it.
Thanks!
Hi,
I want to skip searching for links within <script></script> tags( bug 3565565)
Where can I add proper regexp ? I use PHPCrawl 0.80 beta.
Regards,