#103 Fixing document errors

Jean-Marc Molina


I'm trying to clean a HTML page exported from MS
Publisher and tidy outputs :
line 418 column 97 - Warning: missing </span> before
line 429 column 189 - Error: <o:p> is not recognized!
line 429 column 189 - Warning: discarding unexpected <o:p>
60 warnings, 19 errors were found! Not all
warnings/errors were shown.

This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

However that last message and Tidy help don't explain
how the document errors can be fixed. Isn't Tidy
supposed to fix them ? How can I ask it to remove or
ignore the <o:p> unrecognized elements ?

My goal is to remove all tables and magic markups from
MS Publisher.

Kind regards,


    Please check the quick reference on http://tidy.sf.net for the various word options, using them you might be able to have Tidy clean up the document. You could also try the --force-output yes option and check whether that produces acceptable results.

