Menu

Skipping HTML Comments <!-- -->

Help
2012-08-07
2013-04-09
  • Steve Jones

    Steve Jones - 2012-08-07

    Is there any way to skip over comments within the html and not crawl any links within them?

    Thanks
    Steve

     
  • Steve Jones

    Steve Jones - 2012-08-08

    I think i've solved the problem

    add

        $source_read = preg_replace('/(?s)<!--.*?-->/', '', $source_read);
    

    just above the line

        if ($stream_to_memory == true)
    

    in the page request file.

    Steve

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2012-08-08

    Hi Steve,

    yes, your fix (stripping away all html-comments) will work fine im most cases.
    But you may get problems if an html-comment is really long because of the way phpcrawl
    searchers for links.
    The crawler searches for links "in the html-stream", that means it searches for links just in
    portions of the html-code (10kb-portions i think).
    So if a html-comment starts in one portion and ends in the next one, your fix will fail.

    I don't have a solution for that right now but i will think about it.

    Thanks!

     
  • Nobody/Anonymous

    Hi,
    I want to skip searching for links within <script></script> tags( bug 3565565)
    Where can I add proper regexp ? I use PHPCrawl 0.80 beta.

    Regards,

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.