Menu

Crawling Data after form has been send

Help
MathP
2015-01-06
2015-01-12
  • MathP

    MathP - 2015-01-06

    Hey,

    I am trying to crawl some data after a search form has been send..

    -- search form

    <form name="suche" action="./search.html?0-1.IFormSubmitListener-html-body-searchForm" method="post" enctype="application/x-www-form-urlencoded">
    
    <input type="hidden" name="ausschreibungssuche_hf_0" id="ausschreibungssuche_hf_0"></div>
    <input id="searchString" type="text" value="" name="searchString">
    
    <select name="publishDateRange">
    <option selected="selected" value="ALL">Alle</option>
    ...
    </select>
    <input type="submit" name="submitButton" id="id3" title="Suche ausführen" value="Suchen">
    </form>
    

    After the form has been send the result table shows up..

    $crawler = new MyCrawler();
    $crawler->setURL("xy");
    $post_data = array("ausschreibungssuche_hf_0" => "", "searchString" => "", "publishDateRange" => "ALL","submitButton" => 1);
    $crawler->addPostData("#http://www.xy.de/search.html?4-1.IBehaviorListener.0-html-body-searchForm-submitButton#", $post_data);

    $crawler->addContentTypeReceiveRule("#text/html#");
    $crawler->addURLFilterRule("#.(jpg|jpeg|gif|png)$# i");
    $crawler->addURLFilterRule("#.(css|js)$# i");
    $crawler->enableCookieHandling(true);
    $crawler->setUserAgentString('Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0');
    $crawler->setFollowMode(0);
    $crawler->setFollowRedirects(TRUE);
    $crawler->setTrafficLimit(1000 * 1024);
    $crawler->go();

    The Crawler recieves some data but not from the result table. Something is wrong but I have no idea.
    Any Ideas? If you need the full url I will send it by email or pn..

     
    • Anonymous

      Anonymous - 2020-11-13
      Post awaiting moderation.
  • Uwe Hunfeld

    Uwe Hunfeld - 2015-01-06

    Hi!

    Could you send me the URL for a short test? (pn)

     
  • Anonymous

    Anonymous - 2015-01-09

    Hi!

    I got your PN, but didn't come to run a test so far (lot's to do), but i will the next days. Just wanted you to know.

     
  • Anonymous

    Anonymous - 2015-01-12

    Hi again,

    i just tested the site you mentioned.

    The site you are tallking about is doing it's post via AJAX (javacript).
    What you get in your script is an XML-document containing the search result.

    The javascript on the page is getting this xml after the post and puts it's content into the DOM of the search-page and displays it in a html-table.

    So there is no way for a crawler to get the contents of the result-page directly (as far as i can see), but you get the results as XML, that's even better i'd say.

    Hope i could help!

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.