Menu

setPageLimit and setFollowRedirectsTillContent together

Help
Anonymous
2013-11-17
2013-11-18
  • Anonymous

    Anonymous - 2013-11-17

    First, amazing project... thanks so much!

    In a certain part of my project, I only need to scrape 1 page from a site so am using setPageLimit to set the limit to 1. However I have noticed that if I am attempting to scrape 1 page from a site that used a redirect on that page, I receive an error even though setFollowRedirectsTillContent is set to true.

    One such site I have come across is http://www.marketingpower.com/. Is there a workaround for this?

     
  • Anonymous

    Anonymous - 2013-11-18

    Hi!

    Semms like it's the same problem as described here:
    http://sourceforge.net/p/phpcrawl/bugs/55/, right?

    The problem is that "setPageLimit" is more a "setRequestLimit". So if a page uses a redirect and you set setPageLimit to 1, the crawler stops because it already made ONE request.

    In the next version there will be a REAL "setPageLimit" and a "setRequestLimit".

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.