Audience
Anyone searching for a library to extract data from HTML and XML using XPath and CSS selectors
About parsel
Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with regular expressions. Create a selector object for the HTML or XML text that you want to parse. Then use CSS or XPath expressions to select elements. CSS is a language for applying styles to HTML documents. It defines selectors to associate those styles with specific HTML elements. XPath is a language for selecting nodes in XML documents, which can also be used with HTML. You can use either CSS or XPath. CSS is usually more readable, but some things can only be done with XPath. Being built atop lxml, parsel selectors support some EXSLT extensions and come with pre-registered namespaces to use in XPath expressions. Parsel selectors allow you to chain selectors, so most of the time you can just select by class using CSS and then switch to XPath when needed.