Dave Raggett's excellent HTML Tidy lives here at SourceForge! We collect all the bugs and patches and have refactored Tidy into a free-standing C library.
This project worked nice in the past with Html 4 but it has a very severe design error: It internally maintains detailed information about each and every Html tag and it's attributes. If it encounters an unknown tag, it eliminates it. This means that it stumbles with every new Html tag in the future. You can define user defined tags but this is cumbersome and will be an eternal work that never ends! The last change in the code was 2009. This is a dead project.