Re: [Doxygen-develop] strategies for XHTML support

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

do...@ke... ha scritto:
> On Sun, Mar 02, 2008 at 04:00:10PM +0100, Francesco Montorsi wrote:
> 
>> I think that the postprocessing of the HTML output will be much 
>> simplified if doxygen starts outputting XHTML instead of HTML4, which is 
>> not valid XML.
> 
> Certainly it should be easier to parse if it is valid XML, if they were
> starting from scratch.  The problem is wiht existing translators
> expecting the non-valid format.
this is not IMO a good reason to continue generating HTML4 instead XHTML...

>> Companies doing this kind of postprocessing will eventually need some 
>> changes to their scripts but this is probably true after all doxygen 
>> releases since the structure of the generated HTML is not granted to 
>> remain the same and in fact, most times it changes from a release to 
>> another.
> 
> I wonder how much it does.  I don't know, I'm not directly in touch with
> those places which do that sort of transformation.
I'm not, too so I don't know for sure...

>> Doxygen cannot continue to produce HTML4 forever (*)!
>> Technologies are evolving and the switch from HTML4 to XHTML I think is 
>> worth some troubles/regressions.
>>
>> It's just that sometimes I think that all doxygen sources should be 
>> entirely rewritten and reorganized (with more comments!!) in order to 
>> fix all of these errors.
> 
> I have had the feeling that what it should be producing is just XML, and
> then have back-ends which produce whatever other formats people want
> (XSLT could do most of them).
does any backend based on the doxygen XML output exist?
I fear that using just XSLT it's going to be very difficult to generate 
something which resembles the current doxygen HTML output.

> "Rewrite from scratch" is my mantra with almost everything (especially
> my own code), but the time and effort to do that tends to be
> prohibitive.  Especially when what's there is 'almost' right.
however I think that with the current startXXX()/endXXX() paradigm it's 
too easy to make errors and forget e.g. a closing tag somewhere.

If doxygen used a more object-oriented approach:

   OutputNode *n = outputList->appendRootNode();
   OutputNode *p = n->appendParagraph();
   p->writeClassMemberList();
   ...

   outputList->dumpOutputTree();

it would be impossible to forget closing tags (each output node would 
write for HTML <myself>[children nodes]</myself>) or to generate invalid 
trees. Obviously this approach works well only for tree-structured docs 
like (X)HTML. I fear that Doxygen (at least when it was started) placed 
too emphasis on the generation of other formats like man or latex.
now HTML is by far the most important format it generates and if the 
*def.cpp files were coded in a way like that mentioned above, the 
generated HTML would be of higher quality with less programming efforts 
(shorter and more readable code).

>> (*) = I also strongly doubt it produces VALID html4 now; testing it is 
>> not easy as doing an HTML4 validation test is much more difficult than 
>> doing an XHTML validation test and requires for me to upload file by 
>> file the generated output to the w3c validator.
> 
> Don't they do a stand-alone validator?  Most people do want to validate
> entire sites or at least sets of pages.
there's no free command-line validator for HTML4 AFAIK.
w3c publishes the sources of his validator but you can install it 
locally only setting up an apache installation. And anyway you still 
have to validate each file by hand.

Not feasible for projects with big documentation file sets.

XHTML is way easier to validate. In the patch I proposed I attached an 
archive which contains a simple script which allows to validate from 
command-line an arbitrary number of HTML files and nicely reports all 
erors in a log file.

Francesco