PHPCrawl / Forum / Help: setPageLimit and setFollowRedirectsTillContent together

Comment has been marked as spam.
Undo

View and moderate all "Help" comments posted by this user

Mark all as spam, and block user from posting to "Forum"

Anonymous - 2013-11-17

First, amazing project... thanks so much!

In a certain part of my project, I only need to scrape 1 page from a site so am using setPageLimit to set the limit to 1. However I have noticed that if I am attempting to scrape 1 page from a site that used a redirect on that page, I receive an error even though setFollowRedirectsTillContent is set to true.

One such site I have come across is http://www.marketingpower.com/. Is there a workaround for this?

First, amazing project... thanks so much! In a certain part of my project, I only need to scrape 1 page from a site so am using setPageLimit to set the limit to 1. However I have noticed that if I am attempting to scrape 1 page from a site that used a redirect on that page, I receive an error even though setFollowRedirectsTillContent is set to true. One such site I have come across is http://www.marketingpower.com/. Is there a workaround for this?

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Help" comments posted by this user

Mark all as spam, and block user from posting to "Forum"

Anonymous - 2013-11-18

Hi!

Semms like it's the same problem as described here:
http://sourceforge.net/p/phpcrawl/bugs/55/, right?

The problem is that "setPageLimit" is more a "setRequestLimit". So if a page uses a redirect and you set setPageLimit to 1, the crawler stops because it already made ONE request.

In the next version there will be a REAL "setPageLimit" and a "setRequestLimit".

Hi! Semms like it's the same problem as described here: http://sourceforge.net/p/phpcrawl/bugs/55/, right? The problem is that "setPageLimit" is more a "setRequestLimit". So if a page uses a redirect and you set setPageLimit to 1, the crawler stops because it already made ONE request. In the next version there will be a REAL "setPageLimit" and a "setRequestLimit".

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

setPageLimit and setFollowRedirectsTillContent together

Forums

Help

setPageLimit and setFollowRedirectsTillContent together document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

setPageLimit and setFollowRedirectsTillContent together