Lexbor is development of an open source HTML Renderer library
Dominate is a Python library for creating and manipulating HTML docs
A large annotated semantic parsing corpus for developing NL interfaces
A python package for building DOM of the HTML documents
HTML parser which can be used for screen-scraping applications