PHPCrawl / Forum / Help: Get the content for the whole domain

Comment has been marked as spam.
Undo

View and moderate all "Help" comments posted by this user

Mark all as spam, and block user from posting to "Forum"

Anonymous - 2013-08-13

Hi,

I need to extract the whole content of one domain. This mean that I have to get the $DocInfo->links_found and then I have to iterate through this array to get every $DocInfo->content.
The problem that I have is that always I get the first page content. If I try to execute the method go again I get a mistake.
Many thanks in advance

Hi, I need to extract the whole content of one domain. This mean that I have to get the $DocInfo->links_found and then I have to iterate through this array to get every $DocInfo->content. The problem that I have is that always I get the first page content. If I try to execute the method go again I get a mistake. Many thanks in advance

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Help" comments posted by this user

Mark all as spam, and block user from posting to "Forum"

Anonymous - 2013-08-13

Hi!

There's no need to iterate over the links_found-array.
Phpcrawl automatically follows every link of the domain and returns it's content, so you only have to handle it.

Just get sure you set the follow-mode to 1 (stay in domain) or 2 (stay in host, that's the default).
http://cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_setFollowMode.htm

Hi! There's no need to iterate over the links_found-array. Phpcrawl automatically follows every link of the domain and returns it's content, so you only have to handle it. Just get sure you set the follow-mode to 1 (stay in domain) or 2 (stay in host, that's the default). http://cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_setFollowMode.htm

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Comment has been marked as spam.
  Undo
  
  View and moderate all "Help" comments posted by this user
  
  Mark all as spam, and block user from posting to "Forum"
  
  Anonymous - 2013-08-14
  
  Hi!!,
  Thanks a lot for your answer. But I am a little bit confuse. If I understand well the "PHPCrawlerDocumentInfo" is a class that is returned when you execute the method "go" and this class has as one of its properties the "content" but this content is the content of the url that you have set. In adition the "PHPCrawlerDocumentInfo" has an array with all the links that the website has (links_found).
  I need to get the content of the url that I have set and all contents of the url that the crawler return in the array "links_found".
  So you say that "Phpcrawl automatically follows every link of the domain and returns it's content". My question is where I receive the content of the additional url found in the website?.
  In advance thanks!!!
  
  Hi!!, Thanks a lot for your answer. But I am a little bit confuse. If I understand well the "PHPCrawlerDocumentInfo" is a class that is returned when you execute the method "go" and this class has as one of its properties the "content" but this content is the content of the url that you have set. In adition the "PHPCrawlerDocumentInfo" has an array with all the links that the website has (links_found). I need to get the content of the url that I have set and all contents of the url that the crawler return in the array "links_found". So you say that "Phpcrawl automatically follows every link of the domain and returns it's content". My question is where I receive the content of the additional url found in the website?. In advance thanks!!!
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
  
  New Attachment:
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Help" comments posted by this user

Mark all as spam, and block user from posting to "Forum"

Anonymous - 2013-08-14

The handeDocumentInfo method will get called SEVERAL times after you executed the "go"-method once, for EVERY URL the crawler finds on it's way.
The crawler starts with the root-URL, gets all links from it and follows all of them (depending on your settings), and then again it follows all URLs it found on this URLs and so on. And for every of these URLs/links it calls the handeDocumentInfo method.

Just take a look at the example and execute it, then it gets clear.

Hope i could help.

The handeDocumentInfo method will get called SEVERAL times after you executed the "go"-method once, for EVERY URL the crawler finds on it's way. The crawler starts with the root-URL, gets all links from it and follows all of them (depending on your settings), and then again it follows all URLs it found on this URLs and so on. And for every of these URLs/links it calls the handeDocumentInfo method. Just take a look at the example and execute it, then it gets clear. Hope i could help.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Comment has been marked as spam.
  Undo
  
  View and moderate all "Help" comments posted by this user
  
  Mark all as spam, and block user from posting to "Forum"
  
  Anonymous - 2013-08-14
  
  Hi Dear Collegue,
  
  Your support have been very useful for me. Thanks a lot!!!
  
  Hi Dear Collegue, Your support have been very useful for me. Thanks a lot!!!
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
  
  New Attachment:
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Get the content for the whole domain

Forums

Help

Get the content for the whole domain document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Get the content for the whole domain