Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Binary error downloading pdf using httpunit

Help
2009-06-24
2013-04-26
  • Oliver Wahlen
    Oliver Wahlen
    2009-06-24

    I want to use httpunit to log into a web application, do some navigation and download an Excel list.
    As a first test I have created the code below to search a certain pdf document in google and download it.
    However the downloaded file is not binary equal to the file that one can download using Firefox.
    The file downloaded via httpunit cannot even be opened using Acroread.
    It seems that certain bytes in the downloaded file are modified by httpunit.
    Is this a bug or do I have to modify my code somehow?

    ---------- Code Example ----------
      WebConversation conversation = new WebConversation();
      WebResponse response = conversation.getResponse("http://www.google.de");
      WebForm form = response.getForms()[0];
      form.setParameter("q", "filetype:pdf C Compiler Aided Design of Application-Specific Instruction-Set Architectures");
      response = form.submit(form.getSubmitButton("btnG"));
      WebLink link = response.getLinkWith("C Compiler Aided Design of Application-Specific");
      WebRequest request = link.getRequest();
      response = conversation.getResponse(request);
      String content = response.getText();  // using a debugger I found out that this String already contains the wrong bytes
      FileOutputStream fos = new FileOutputStream("diss.pdf");
      fos.write(content.getBytes());
      fos.close();

    p.s.:
    I have also tried without success to prepend the following code to the code example above:
      HttpUnitOptions.setDefaultContentType("application/octet-stream");
      HttpUnitOptions.setDefaultCharacterSet("utf-8");