Menu

#52 PHPCrawl collect strange link

open
None
5
2015-03-01
2013-08-14
Anonymous
No

Hello !

Currently i see a lot of strange URL in my crawled URL logs

http://raredevice.net/collections/living?page=5.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed
http://raredevice.net/collections/living?page=4.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed.oembed

Discussion

  • Anonymous

    Anonymous - 2013-08-14

    Hi!

    Did you use the currant version 0.81?

     
  • Anonymous

    Anonymous - 2013-08-29

    yes i did ,

     
  • CFlorin

    CFlorin - 2014-01-09

    seems that where exists onclick="..." on a anchor, then collect strange links...

     
  • Anonymous

    Anonymous - 2014-01-09

    Hi!

    I'm just not able to reproduce this bug.
    I crawled http://raredevice.net for a whole while with priority for links containing "oembed", but the mentioned links in the first post don't appear over here.

    It would be very usefutl if you (someone) could post the referer-URL containing these links
    ($DocInfo->referer_url).

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2015-03-01

    The problen described here should get fixed by using the new setting "excludeLinkSearchDocumentSections()" availabel since phpcrawl 0.83:

    $crawler->excludeLinkSearchDocumentSections(PHPCrawlerLinkSearchDocumentSections::ALL_SPECIAL_SECTIONS);
    

    This prevents the crawler from finding links in script-sections and javascript-triggering attributes like "onClick=".

     

    Last edit: Uwe Hunfeld 2015-03-01

Anonymous
Anonymous

Add attachments
Cancel