jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.

Features

  • Scrape and parse HTML from a URL, file, or string
  • Find and extract data, using DOM traversal or CSS selectors
  • Manipulate the HTML elements, attributes, and text
  • Clean user-submitted content against a safe white-list, to prevent XSS attacks
  • Output tidy HTML
  • Parse and traverse documents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow jsoup

jsoup Web Site

You Might Also Like
Easy management of simple and complex projects Icon
Easy management of simple and complex projects

We help different businesses become digital, manage projects, teams, communicate effectively and control tasks online.

Plan more projects with Worksection. Use Gantt chart and Kanban boards to organize your projects, get your team onboard and assign tasks and due dates.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of jsoup!

Additional Project Details

Programming Language

Java

Related Categories

Java HTML XHTML, Java Libraries, Java Web Scrapers

Registered

2021-06-29