PHPCrawl / Forum / Help: PHPCrawl and Mysql Database

Comment has been marked as spam.
Undo

View and moderate all "Help" comments posted by this user

Mark all as spam, and block user from posting to "Forum"

Anonymous - 2013-12-23

I have setup a cron job which runs the example.php every 20 mins. What I want to do is, after crawling, save the retrieved links to a database.

How do I stop the script from crawling the urls already in the database on the next run?

Basically, I want to check my database before crawling any url the script finds.

Last edit: Anonymous 2013-12-23

I have setup a cron job which runs the example.php every 20 mins. What I want to do is, after crawling, save the retrieved links to a database. How do I stop the script from crawling the urls already in the database on the next run? Basically, I want to check my database before crawling any url the script finds.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Help" comments posted by this user

Mark all as spam, and block user from posting to "Forum"

Anonymous - 2013-12-31

What I did, was adding the crawl data into an array before placing it in the database. Then I used in_array to check if the link was already in the array.
If not, I added the link and next time it would not crawl it again. Or at least, do not place double entries in the database.

What I did, was adding the crawl data into an array before placing it in the database. Then I used in_array to check if the link was already in the array. If not, I added the link and next time it would not crawl it again. Or at least, do not place double entries in the database.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Help" comments posted by this user

Mark all as spam, and block user from posting to "Forum"

Anonymous - 2014-01-02

Hi!

I know what your problem is, there's a callback-method missing within phpcawl that will be called just BEFORE a request will be done. Inside this method, you then could check if the request-URL already was requested before (through your mysql-table or something else) and possibly abort/skip this request.

This is still on the list of feature-requests and hopefully will get implemented on one of the next versions
(http://sourceforge.net/p/phpcrawl/feature-requests/16/)

Hi! I know what your problem is, there's a callback-method missing within phpcawl that will be called just BEFORE a request will be done. Inside this method, you then could check if the request-URL already was requested before (through your mysql-table or something else) and possibly abort/skip this request. This is still on the list of feature-requests and hopefully will get implemented on one of the next versions (http://sourceforge.net/p/phpcrawl/feature-requests/16/)

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

PHPCrawl and Mysql Database

Forums

Help

PHPCrawl and Mysql Database document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

PHPCrawl and Mysql Database