Menu

#23 follow link

Unassigned
closed
nobody
Functions (10)
2019-04-18
2011-01-15
Anonymous
No

Hi, sorry for my english, I use this great php projet in mani web apps, but I have a problem with a website.
I want to parse an website but when i enter first time on this site i need to follow an link because the site want to set some cookies .
What i need is theat sa script to follow an link and then parse the html.

The page is http://e-redus.ro/ and first time you enter in page you get that message.

How can I jump this step in php?
TX.

Discussion

  • nickl-

    nickl- - 2012-09-11

    Unless this has changed drastically the site is not trying to set a cookie that is merely a modal overlay that is a pain when you''re trying to view the site but for simple_htmll_dom this is not an issue.

    You see:

    php> $html = new simple_html_dom();

    php> $html->load_file('http://e-redus.ro/');
    This is probably the image that needs to be clicked the modal
    php> echo $html->find('img', 0)->outertext;

    But the rest of the html is there as well and we have no problem getting the images underneath, see:
    php> echo $html->find('img', 1)->outertext;

    php> echo $html->find('img', 2)->outertext;

    php> echo $html->find('img', 3)->outertext;

    php> echo $html->find('img', 4)->outertext;

    php> echo $html->find('img', 5)->outertext;

    php> echo $html->find('img', 6)->outertext;

    php> echo $html->find('img', 7)->outertext;

    php> echo $html->find('img', 8)->outertext;

    php> echo $html->find('img', 9)->outertext;

    php> echo $html->find('img', 10)->outertext;

    That said I could see a use for follow link though, not onclick perse as we cannot execute javascript but we could retrieve href or even src attributes.

    I'm thinking something like

    $html2 = $html->find('a',0)->follow();
    or
    $image = $html->find('img',0)->retrieve();

    which saves you from doing the following instead

    $html2 = new simple_html_dom();
    $html->load_file($html->find('a',0)->href);

    Does that look like something you would use?

     
  • John Schlick

    John Schlick - 2012-10-10

    To be honest, I am not that interested in implementing a ->follow function in the dom, and what I see as the original issue is more to the point where the user can't get the dom to load in the first place.

    I have seen this when there are some strange redirects that file_get_contents can't follow, and it's often not about cookies.

    In this case, you will have to write your own routine that is capable of getting the url contents. Please comment line 76 of the simple_html_dom.php file and uncomment line 78, adn then write a curl routine that does exactly what you want for the site in question.

    • status: open --> closed
    • milestone: --> Next_Release
     

Log in to post a comment.