htmlcxx is a simple non-validating html parser library for C++. It allows to fully dump the original html document, character by character, from the parse tree. It also has an intuitive tree traversal API.
very easy to use
A fast, solid, robust and easy to use HTML parser in C++. Nice usage of the tree.hh, STL like tree class, makes this parser really easy to use. Great work!