PHP Simple HTML DOM Parser / Feature Requests / #23 follow link

Comment has been marked as spam.
Undo

View and moderate all "feature-requests Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Feature Requests"

Anonymous - 2011-06-18

There is no need to modify parser, you need to use CURL or other libraries which can follow redirects and save cookies.
See http://www.php.net/manual/en/book.curl.php

Last edit: Anonymous 2015-02-09

There is no need to modify parser, you need to use CURL or other libraries which can follow redirects and save cookies. See http://www.php.net/manual/en/book.curl.php

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

nickl- - 2012-09-11

Unless this has changed drastically the site is not trying to set a cookie that is merely a modal overlay that is a pain when you''re trying to view the site but for simple_htmll_dom this is not an issue.

You see:

php> $html = new simple_html_dom();

php> $html->load_file('http://e-redus.ro/');
This is probably the image that needs to be clicked the modal
php> echo $html->find('img', 0)->outertext;

But the rest of the html is there as well and we have no problem getting the images underneath, see:
php> echo $html->find('img', 1)->outertext;

php> echo $html->find('img', 2)->outertext;

php> echo $html->find('img', 3)->outertext;

php> echo $html->find('img', 4)->outertext;

php> echo $html->find('img', 5)->outertext;

php> echo $html->find('img', 6)->outertext;

php> echo $html->find('img', 7)->outertext;

php> echo $html->find('img', 8)->outertext;

php> echo $html->find('img', 9)->outertext;

php> echo $html->find('img', 10)->outertext;

That said I could see a use for follow link though, not onclick perse as we cannot execute javascript but we could retrieve href or even src attributes.

I'm thinking something like

$html2 = $html->find('a',0)->follow();
or
$image = $html->find('img',0)->retrieve();

which saves you from doing the following instead

$html2 = new simple_html_dom();
$html->load_file($html->find('a',0)->href);

Does that look like something you would use?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Schlick - 2012-10-10

To be honest, I am not that interested in implementing a ->follow function in the dom, and what I see as the original issue is more to the point where the user can't get the dom to load in the first place.

I have seen this when there are some strange redirects that file_get_contents can't follow, and it's often not about cookies.

In this case, you will have to write your own routine that is capable of getting the url contents. Please comment line 76 of the simple_html_dom.php file and uncomment line 78, adn then write a curl routine that does exactly what you want for the site in question.

status: open --> closed

milestone: --> Next_Release
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

follow link

A php based DOM parser.

Group

Searches

Help

#23 follow link

Discussion