Re: [Doxygen-develop] strategies for XHTML support

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

do...@ke... ha scritto:
> On Sun, Mar 02, 2008 at 02:00:35PM +0100, Francesco Montorsi wrote:
> 
>> btw if you are not interested to reach 100% well-formness in a single 
>> patch, then the one I've attached seems to work quite well in terms of 
>> output rendering (i.e. there are no big differences to the std doxygen 
>> HTML4, just some spacing differences). I'm not sure however it 
>> well-behaves respect the other output formats...
> 
> It might also break some post-processing some places use.  I know
> several companies which post-process Doxygen output to create their own
> documentation, but I don't know how robust their processing is.
I think that the postprocessing of the HTML output will be much 
simplified if doxygen starts outputting XHTML instead of HTML4, which is 
not valid XML.
Companies doing this kind of postprocessing will eventually need some 
changes to their scripts but this is probably true after all doxygen 
releases since the structure of the generated HTML is not granted to 
remain the same and in fact, most times it changes from a release to 
another.

>> (*) = I wanted to enable XHTML output in order to use XSLT stylesheets 
>> over it, instead of doing it over the doxygen XML output.
> 
> Have you tried processing it with the W3C 'tidy' program?  That usually
> does a pretty good job of producing XHTML from HTML with close tags
> missing (what lynx calls "tag soup"), and will produce XML as well as
> XHTML output.  (Doing it on the number of files Doxygen creates is a
> pain and slow, though, and you need to disable its comments about how
> 'bad' the original is.)
tidy does a good job but I think it's a "dirty" solution: its output is 
not granted to be the "right" one (it repairs the HTML as best as it can 
but it's still a machine and can't look at the context to understand 
what's the right fix) and may generate rendering artefacts (caused by 
syntatically correct but semanthically wrong markup).

It's true that cleaning with 'tidy' the generated XHTML of the doxygen 
samples (I'm testing it with my patch applied) it shrinks the validation 
  errors from about 700 to about 30 (great!!) but still those 30 needs 
human revision. In the bigger project which I'm trying to convert to 
Doxygen (FYI it's wxWidgets), there would be still hundreds of errors to 
handle by hand. Not feasible.

It's the doxygen output which should be correct without any further 
processing.

Doxygen cannot continue to produce HTML4 forever (*)!
Technologies are evolving and the switch from HTML4 to XHTML I think is 
worth some troubles/regressions.

It's just that sometimes I think that all doxygen sources should be 
entirely rewritten and reorganized (with more comments!!) in order to 
fix all of these errors.

In conclusion: I need a pause and some help to complete this patch :)

What's your (doxygen team) interest toward XHTML?
Isn't it one of your priorities?

Francesco

(*) = I also strongly doubt it produces VALID html4 now; testing it is 
not easy as doing an HTML4 validation test is much more difficult than 
doing an XHTML validation test and requires for me to upload file by 
file the generated output to the w3c validator.