[Epydoc-devel] Valid XHTML patch

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I am using Epydoc as the documentation generator for my project. It is
great in extracting the documentation, and the output looks nice (when
frames are disabled), but... I have checked the HTML output with
validating parser. And it was invalid.

I have looked into the HTML code, and although it was marked as XHTML
1.0 Transitional it was far from being any HTML (according to W3C
specifications). The code was invalid and ugly. Good example how not to
write HTML. I would be ashamed to publish such documents on my project's
page (I guess I am not the only one). So I have filled the bug report
#1039049 on the sourceforge.

Today I looked into the Epydoc code and fixed the HTML generation.
I have attached the patch to the bug report.

What the patch does:
- makes the HTML generated valid XHTML 1.0 Transitional and XHTML 1.0 Frameset,
- replaces deprecated elements (like <font/> and <center/>) with
  structural elements and/or proper style,
- replaces <b/> and <i/> (which are not recommended by W3C) with
  <strong/> and <em/> or other elements with semantics matching the
  usage (eg. <h1>). Wherever I could guess what the markup is I added
  a "class" attribute so the style of the element may be further
  changed.
- separates layout definition from predefined styles (which differ only 
  in colors) to a single string, so the same definitions are not
  repeated in css.py and may be modified in one place.
- escapes control characters in colorized regular expressions. Without
  that regexp like r'[\x00-1f]' would result with 0 byte included in the
  HTML output which is invalid and makes the rendering of the page
  inpredictable (some browser will treat the byte as EOF)

I tried to make the generated not differ from the ones generated without
the patch. And they should not differ much in any modern
standard-compliant browser. If they do -- it is a bug (unless the look
better now, of course). If they look worse in some
non-standard-compliant, but "important" browser (read: IE), than some
hack may be needed. But there is no reason to use invalid HTML (means to
do that with valid HTML may be found in the Net).

Some more things that could be done:
- Include alternate stylesheets in the output, so they can be chosen
  while browsing the documentation. That seems easy and I will probably 
  do that soon.
- XHTML Strict generation. That would probably need much more code
  changes and removal of frames support (or making it and option).
- drop using tables for layout. Epydoc doesn't do that much, as most of
  its output are real tables.
- Unicode support. The output could be alway UTF-8. But, I guess, a lot
  of Epydoc code would have to be updated, not only the HTML generation.
  Fortunately most code documentation is English only, even in
  international projects.

I hope you will find my patch usefull and it will be applied to the
Epydoc code.

Greets,
        Jacek