Menu

http: fill application/x-www-form-urlencoded

Help
Greenpine
2008-02-10
2012-09-04
  • Greenpine

    Greenpine - 2008-02-10

    I'm using web-harvest to extract data and I need to use POST to fill a form.
    So the content type is set to application/x-www-form-urlencoded and the HTTP field
    Line-based text data: application/x-www-form-urlencoded need to be filled. For some reason, the special characters such as '=', '&' are always translated into %3D, etc.
    Then sever rejects my post.
    How can I disable this kind of translation, and just put the following
    "center_name=&ddlbCounty=ALL&city_name=&zip_code=&star_level=All&type_facility=both&ebt_storesbyzip=Search+for+Child+Care" into the field?

     
    • Vladimir Nikic

      Vladimir Nikic - 2008-02-10

      I'm not quite sure where the problem is.
      Can you post part of your configuration XML that is making post request?

      Vladimir.

       
    • Greenpine

      Greenpine - 2008-02-11

      Actually it happens to both the get and post. I try download the page

      http://www.dss.virginia.gov/facility/search/licensed.cgi?rm=Search;search_require_client_code-2106=1;search_require_client_code-2102=1;search_require_client_code-2101=1;Start=26

      It's fine with IE. When I use web-harvest, the page cannot be downloaded.
      I use Wireshark to capture the HTTP request, and find the difference:

      The GET field sent by web-harvest:
      GET /facility/search/licensed.cgi?rm=Search%3Bsearch_require_client_code-2106%3D1%3Bsearch_require_client_code-2102%3D1%3Bsearch_require_client_code-2101%3D1%3BStart%3D26 HTTP/1.1\r\n

      The GET field sent by IE6:
      GET /facility/search/licensed.cgi?rm=Search;search_require_client_code-2106=1;search_require_client_code-2102=1;search_require_client_code-2101=1;Start=26 HTTP/1.1\r\n

      I find that this behavior has something to do with the HTML encode. Here is the link

      http://www.w3schools.com/tags/ref_urlencode.asp

      Is any way to disable this encoding?

      But the strange thing is that I used the same code to download a page from google,
      it works fine, not encoding.

      ===============

      <?xml version="1.0" encoding="UTF-8"?>

      <config>
      <var-def name="gurl">
      <![CDATA[http://maps.google.com/maps?f=l&hl=en&geocode=&q=daycare&near=oregon&ie=UTF8&z=7&om=0]]>
      </var-def>
      <var-def name="page">
      <http url="${gurl}"></http>
      </var-def>
      <file action="write" path="gtest.xml">
      <html-to-xml>
      <var name="page"></var>
      </html-to-xml>
      </file>

      &lt;var-def name=&quot;myurl&quot;&gt;
          &lt;![CDATA[http://www.dss.virginia.gov/facility/search/licensed.cgi?rm=Search;search_require_client_code-2106=1;search_require_client_code-2102=1;search_require_client_code-2101=1;Start=26]]&gt;
      &lt;/var-def&gt;
      &lt;var-def name=&quot;page&quot;&gt;
          &lt;http url=&quot;${myurl}&quot;&gt;&lt;/http&gt;
      &lt;/var-def&gt;    
      &lt;file action=&quot;write&quot; path=&quot;test1.xml&quot;&gt;    
          &lt;html-to-xml&gt;
          &lt;var name=&quot;page&quot;&gt;&lt;/var&gt;
          &lt;/html-to-xml&gt;
      &lt;/file&gt;
      

      </config>

       

Log in to post a comment.