Menu

Fatal error: Call to undefined method stdClas

Help
Alan
2010-06-15
2013-04-09
  • Alan

    Alan - 2010-06-15

    Hello,

    I seem to be getting the following error when I have multiple objects:

    "Fatal error: Call to undefined method stdClass::receivePage()"

    The $crawler object crawls fine, however the above error message is received when $anothercrawler->go(); is executed.

    Can someone please help? Thank you.

    $crawler = &new MyCrawler();
    $crawler->setURL($url);
    $crawler->setFollowMode(1);
    $crawler->addReceiveContentType("/text\/html/");
    $crawler->addNonFollowMatch("/.(jpg|gif|png)$/ i");
    $crawler->setCookieHandling(true);
    $crawler->go();

    $anothercrawler = &new MyCrawler();
    $anothercrawler->setURL($url);
    $anothercrawler->setFollowMode(1);
    $anothercrawler->addReceiveContentType("/text\/html/");
    $anothercrawler->addNonFollowMatch("/.(jpg|gif|png)$/ i");
    $anothercrawler->setCookieHandling(true);
    $anothercrawler->go();

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2010-06-21

    Hi alanchau,

    i can confirm that issue here, this is a bug!

    To fix it, simply change the followig lines of code in the file "classes/phpcrawler.class.php"
    (from line 120):

    Original code:

    // PageRequest-class
    if (!class_exists("PHPCrawlerPageRequest"))
    {
      include_once($classpath."/phpcrawlerpagerequest.class.php");
      
      // Initiate a new PageRequestor
      $this->pageRequest = new PHPCrawlerPageRequest();
    }
    

    Fixed code:

    // PageRequest-class
    if (!class_exists("PHPCrawlerPageRequest"))
    {
      include_once($classpath."/phpcrawlerpagerequest.class.php");
    }
    // Initiate a new PageRequestor
    $this->pageRequest = new PHPCrawlerPageRequest();
    

    This should word!
    And thanks for the report!

     
  • Alan

    Alan - 2010-06-21

    Hello huni,

    Thank you for your reply. Please do let me know if you accept donations; I would love to support this project.

    Also, I have a quick question: if I want crawl more than one URL in the same script, is it OK to set the URL again and then $crawler->go(); again? i.e. Are all the old variables properly destroyed? For example, the following:

    $crawler = &new MyCrawler();
    $crawler->setURL("google.com");
    $crawler->go();
    $crawler->setURL("yahoo.com");
    $crawler->go();
    

    Also, I would like to mention that I have encountered some websites that give a segmentation fault.
    One of them is airsilver.net. For reference, the OS is CentOS 5.4 and the PHP version is 5.2.12. I've noticed that whenever I get a segmentation fault, the URL which I am crawling is in a foreign language; this may or may not be causing it.

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2010-06-23

    Hello alanchau again,

    thannk you very much for your will to donate to this project, i appreciate that!
    I will enable the donate-option when i'm gonna release a new version of phpcrawl that
    is up to date. Thanks!

    I dont't recommend to use the same instance of the crawler-class for crawling more than URL.
    You better create a new instance, so you get a really clean object.

    And i just did a quick test and crawled some sites from "airsilver.net". I don't get any problems or segfaults over here.
    (Old Ubuntu 8.04.4, PHP 5.2.4). But i will do some tests within some newer environment soon.

    Maybe there is a crawler-setting you are using that's causing the problem?!
    Could you post your setup?

    Thanks!

     
  • Alan

    Alan - 2010-06-23

    Hello,

    This is the script that I am using and I am running it in command line:

    // php crawler.php airsilver.net
    set_time_limit(0);
    $domain = $argv[1];
    include ("classes/phpcrawler.class.php");
    class MyCrawler extends PHPCrawler
    {
        public $pageCount = 0;
        function handlePageData(&$page_data)
        {
            $this->pageCount++;
            echo $page_data['url'] . ' ' . $this->pageCount . "\n";
        }
    }
    $crawler = &new MyCrawler();
    $crawler->setURL($domain);
    $crawler->setFollowMode(1);
    $crawler->setPageLimit(100);
    $crawler->setContentSizeLimit(1024 * 10000);
    $crawler->setCookieHandling(true);
    $crawler->disableExtendedLinkInfo(true);
    $crawler->addReceiveContentType("/text\/html/");
    $crawler->addNonFollowMatch("/.(jpg|jpeg|gif|png|bmp|js|css|swf)$/ i");
    $crawler->go();
    

    The following is the output of the above command:

    http://airsilver.net/ 1
    http://airsilver.net/lor.html 2
    http://www.airsilver.net/cold.html 3
    http://www.airsilver.net/crags.html 4
    http://www.airsilver.net/throat.2.html 5
    http://www.airsilver.net/throat.1.html 6
    http://www.airsilver.net/snor.html 7
    http://www.airsilver.net/throat.html 8
    http://www.airsilver.net/ears.2.html 9
    http://www.airsilver.net/ears.3.html 10
    http://www.airsilver.net/noses.html 11
    http://www.airsilver.net/silve.html 12
    http://www.airsilver.net/crags.1.html 13
    http://www.airsilver.net/ears.1.html 14
    http://www.airsilver.net/cold.1.html 15
    http://www.airsilver.net/Binfl.html 16
    http://www.airsilver.net/index-A.html 17
    http://www.airsilver.net/index-B.html 18
    http://www.airsilver.net/ch13A1.html 19
    http://www.airsilver.net/ch13B.html 20
    http://airsilver.net/ear.html 21
    http://airsilver.net/crag.html 22
    http://airsilver.net/nose.html 23
    http://airsilver.net/silver.html 24
    http://airsilver.net/operation.html 25
    http://airsilver.net/sendrequest.php 26
    http://airsilver.net/index-A.html 27
    http://airsilver.net/index-B.html 28
    http://airsilver.net/ch13A1.html 29
    http://airsilver.net/ch13B.html 30
    http://airsilver.net/admin/avemaria 31
    http://www.airsilver.net/lor.html 32
    http://www.airsilver.net/admin/avemaria 33
    http://www.airsilver.net/sendrequest.php 34
    http://www.airsilver.net/ch3AB.html 35
    http://www.airsilver.net/ 36
    http://www.airsilver.net/dr.html 37
    http://www.airsilver.net/ch6.html 38
    http://www.airsilver.net/ch2A.html 39
    http://www.airsilver.net/ch2.html 40
    http://www.airsilver.net/ch25.html 41
    http://www.airsilver.net/ch27.html 42
    http://www.airsilver.net/ch27A.html 43
    http://www.airsilver.net/ch13.html 44
    http://www.airsilver.net/ch12.html 45
    Segmentation fault
    
     
  • Uwe Hunfeld

    Uwe Hunfeld - 2010-06-25

    Hi alanchau ,

    yes, i'm able to repoduce the segfault now!

    With PHP 5.2.4 (Ubuntu 8.04.4) and PHP 5.2.3 (Debian 4.0) the segfault occurs over here when phpcrawl tries to get containing links from the html-source of the site "http://www.airsilver.net/ch27A.html".

    The PCRE (preg_match_all) used for doing that exits with a segmentation fault on that site.

    It seems that this was a bug in PHP and/or in the bundled PCRE-library.
    (http://bugs.php.net/bug.php?id=41796 / http://bugs.php.net/bug.php?id=45735 …)

    Running the samt script using PHP 5.2.10 (Ubuntu 9.10) the segfault does NOT occur anymore.

    I'm not sure in which version of PHP the bug was fixed exactly, but just try to upgrade PHP to a newer version
    if possible.

    Thanks for the report again!

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.