Hi, sorry for my english, I use this great php projet in mani web apps, but I have a problem with a website.
I want to parse an website but when i enter first time on this site i need to follow an link because the site want to set some cookies .
What i need is theat sa script to follow an link and then parse the html.
The page is http://e-redus.ro/ and first time you enter in page you get that message.
How can I jump this step in php?
TX.
View and moderate all "feature-requests Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Feature Requests"
There is no need to modify parser, you need to use CURL or other libraries which can follow redirects and save cookies.
See http://www.php.net/manual/en/book.curl.php
Last edit: Anonymous 2015-02-09
Unless this has changed drastically the site is not trying to set a cookie that is merely a modal overlay that is a pain when you''re trying to view the site but for simple_htmll_dom this is not an issue.
You see:
php> $html = new simple_html_dom();
php> $html->load_file('http://e-redus.ro/');
This is probably the image that needs to be clicked the modal
php> echo $html->find('img', 0)->outertext;
But the rest of the html is there as well and we have no problem getting the images underneath, see:
php> echo $html->find('img', 1)->outertext;
php> echo $html->find('img', 2)->outertext;
php> echo $html->find('img', 3)->outertext;
php> echo $html->find('img', 4)->outertext;
php> echo $html->find('img', 5)->outertext;
php> echo $html->find('img', 6)->outertext;
php> echo $html->find('img', 7)->outertext;
php> echo $html->find('img', 8)->outertext;
php> echo $html->find('img', 9)->outertext;
php> echo $html->find('img', 10)->outertext;
That said I could see a use for follow link though, not onclick perse as we cannot execute javascript but we could retrieve href or even src attributes.
I'm thinking something like
$html2 = $html->find('a',0)->follow();
or
$image = $html->find('img',0)->retrieve();
which saves you from doing the following instead
$html2 = new simple_html_dom();
$html->load_file($html->find('a',0)->href);
Does that look like something you would use?
To be honest, I am not that interested in implementing a ->follow function in the dom, and what I see as the original issue is more to the point where the user can't get the dom to load in the first place.
I have seen this when there are some strange redirects that file_get_contents can't follow, and it's often not about cookies.
In this case, you will have to write your own routine that is capable of getting the url contents. Please comment line 76 of the simple_html_dom.php file and uncomment line 78, adn then write a curl routine that does exactly what you want for the site in question.