Menu

External links

Help
Anonymous
2013-10-29
2013-10-29
  • Anonymous

    Anonymous - 2013-10-29

    Hi,

    I've got a question. How to configure PHPCrawl to do this kind of task:
    I want to crawl one whole site with given URL and find all URL's to external sites, but only these. I dont't want to list internal links, but only external. Then the whole proccess should stop (crawler should not follow those external links, just list them).

    Thanks for reply

     
  • Anonymous

    Anonymous - 2013-10-29

    Hi!

    The crawler returns all found links within a site in the array PHPCrawlerDocumentInfo->links_found_url_descriptors.

    This array contains ALL found links, internal end external links.

    So you simply have to iterate over this array and filter out all external links (by using a regular expression against the hostename of the site you are crawling e.g.).

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.