Menu

Is there a way to get the number of redirects the crawler followed to reach a given page?

Help
Ed Eliot
2013-06-07
2013-06-29
  • Ed Eliot

    Ed Eliot - 2013-06-07

    I couldn't see the relevant info in the headerInfo or pageInfo objects.

     
  • Anonymous

    Anonymous - 2013-06-07

    Hi Ed Eliot,

    no, there isn't such a property.

    But you can do it easily by yourself in your extended cralwer class, something like this:

    class MyCrawler extends PHPCrawler 
    {
      protected $redirect_count = 0;
    
      function handleDocumentInfo($DocInfo) 
      {
        // ...
        if ($DocInfo->http_status_code == "301" || $DocInfo->http_status_code == "302")
        {
          $this->redirect_count++;
        }
    
        if ($DocInfo->http_status_code == "200")
        {
          echo "Redirects to this URL: ".$this->redirect_count;
        }
        // ...
      } 
    }
    
     

    Last edit: Anonymous 2013-06-07
  • Ed Eliot

    Ed Eliot - 2013-06-11

    Fabulous, thanks.

     
  • Anonymous

    Anonymous - 2013-06-28

    Isn't the number of redirects just going to increment per page request giving you an invalid result i.e

    Start
    Get page A, 2 redirects
    $redirect_count eq 2
    Follow link to page B, 1 redirect
    $redirect_count eq 3 (bad)

    On the second page the count should be 1

     

    Last edit: Pan European 2013-06-28
  • Anonymous

    Anonymous - 2013-06-29

    Yes, your are right.

    I thought the question is how many redirects it took to get to the FIRST real page.

    If you are going in single process mode, you could simply reset the counter aftert a status-code of 200 occured (am i right?).

    If you are using the multi proces mode this is getting difficult since the pages don't get crawled in a "straight" order (ie. process 1 get's the first redirect, process 4 get's the second redirect and process 2 get's the final page after other processes requested completely other pages meanwhile).

    Right now there isn't a "easy" solution for the last situation that comes into my mind, sorry.

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.