From: David G. <go...@py...> - 2002-12-07 03:32:01
|
David Abrahams wrote: > I just ran HTMLTidy over the output of > > python html.py test.txt test.html > > in the tools directory. It had lots of complaints, many of which > look serious, though it only flagged them as warnings. ... | 67 warnings, 0 errors were found! Warnings mean "I'm not sure, but I think something may be wrong here" and it's left to the user to judge. Many of these cases are not really problems, just HTMLTidy being overly critical. > In particular, I notice lots of characters which appear to be > invalid (possibly nuls). ... | Character codes 128 to 159 (U+0080 to U+009F) are not allowed in HTML That's because test.html is encoded in UTF-8. Looks like HTMLTidy doesn't understand the ``<?xml version="1.0" encoding="utf-8" ?>`` processing instruction at the beginning of the file. Try the "-utf8" option. | even if they were, they would likely be unprintable control | characters. Tidy assumed you wanted to refer to a character with the | same byte value in the Windows-1252 encoding and replaced that | reference with the Unicode equivalent. Dangerous assumption. > I'm not an HTML expert, which is why I use Tidy. Are these > worth doing something about? Some are, some aren't. | : Doctype given is "-//W3C//DTD XHTML 1.0 Transitional//EN" | : Document content looks like XHTML 1.0 Strict HTMLTidy doesn't understand the transitional DTD? Seems odd. | test.html:16:1: Warning: <meta> element not empty or not closed | :17:1: Warning: <meta> element not empty or not closed These are real errors, now corrected. | :23:1: Warning: <table> lacks "summary" attribute ... | The table summary attribute should be used to describe | the table structure. It is very helpful for people using | non-visual browsers. The scope and headers attributes for | table cells are useful for specifying which headers apply | to each table cell, enabling non-visual browsers to provide | a meaningful context for each cell. The "summary" attribute is not required by the HTML 4 spec, just recommended. While I sympathize with its aim, I don't know of any way to automatically generate a summary of a table. Eventually a formal "table" directive may be written, and it could have a "summary" option. | :85:24: Warning: <a> Anchor "table-of-contents" already defined This seems to be because both the "id" attribute of the container element and the "name" attribute of "<a>" elements are set to the same thing (as specified in Appendix C of the XHTML spec, http://www.w3.org/TR/xhtml1). HTML 4 and XHTML want elements to use the "id" attribute, but Netscape 4 only works with the "name" attribute on "<a>" tags. Perhaps the id and name attributes ought to be on the same element though... HTML is a mishmash; can't win. Unless there's a problem with a real browser (not just a tool like tidy), I don't see the need to fix this. Thanks for taking the time to run HTMLTidy and bring these to our attention. -- David Goodger <go...@py...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |