HtmlUnit is a pure java GUI-Less browser, which allows high-level manipulation of web pages, such as filling forms, clicking links, accessing attributes and values of specific elements within the pages, you do not have to create lower-level requests of TCP/IP or HTTP, but just getPage(url), find a hyperlink, click() and you have all the HTML, JavaScript, and Ajax are automatically processed.
The most common use of HtmlUnit is test automation of web pages, but sometimes it can be used for web scraping, or downloading website content.
2009 JavaOne, the biggest Java conference in San Francisco (May 31 - June 05), is going to include a session titled "HtmlUnit: An Efficient Approach to Testing Web Applications", presented by committers Daniel Gredler and Ahmed Ashour.
Attendees will learn about
- The two approaches to Web app integration testing: browser simulation and browser driving
- The cons of the browser simulation approach
- The pros of the browser simulation approach
- Key extension points provided by HtmlUnit
- Wrappers that enable you to hedge your bets and switch between the two approaches
More information can be found in http://java.sun.com/javaone/2009/sessions.jsp