Re: [Epydoc-devel] XML Docbook output

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 2/1/07, Saint Germain <sai...@gm...> wrote:
> Hello,
>
> I'm currently an active user of Python and XML Docbook and would
> like to have my Python documentation generated in XML Docbook.
>
> Epydoc seems to be a really fine documentation generator and I would
> like to help you add an XML Docbook output.
>
> I know Python, XML Docbook and LaTeX and I'm a heavy user of the
> dblatex project (Docbook to LaTeX, and it's written in Python !) :
> http://dblatex.sourceforge.net
>
> Would you like me to add such a feature and if yes can someone help to
> guide me through Epydoc ?
>
> If the LaTeX output is OK, I can just analyse this part and convert it
> in XML Docbook ?

If you want to create a Docbook generator, the best starting point is
largely the LaTeX generator.

Honestly i've not been using the latter generator as often as the HTML
generator, but it is stable enough to generate at least documentation
for Epydoc itself. Furthermore most of the hard work (parsing the
distinct pieces and merging stuff together) is carried out before even
reaching the docwriter.

A sketch of the epydoc structure and where you may want to put your
hands if you want to implement is the following::

             Markups                 Writers

             epytext |             | LaTeX
             javadoc | -> build -> | HTML
           plaintext |             | plaintext
    restructuredtext |

Each markup is implemented by a ParsedDocstring subclass. Writers are
more free-form beasts: there is no base class for writers, whose
interface is a single function called by the cli.py module after
options have been parsed and a docindex generated.

The ParsedDocstring interface offers methods to retrieve a docstring
in the target output format: currently methods to_plaintext(),
to_html(), to_latex() methods are exposed. The concrete subclasses are
responsible to implement such methods (while a fallback must be
provided by the base class).

Writing a Docbook generator should require the following steps:
1. add a new writer implementing the sequences of calls to observe the
built set of documents;
2. add a new method ParsedDocstring.to_docbook() implementing a
default behavior, which typically is to return a a few more than the
text version;
3. implement the details of converting the specific markups into Docbook format.

2. should be easy. You may wrap the to_plaintext() output into
<programlisting> tags, i guess; see ParsedDocstring.to_html() for an
example.
1. can be cargo-culted from the LaTeX writer, but this probably would
lead to more maintenance burden. Probably a nicer work would be to
refactor the LaTeX writer creating a base class implementing the
strategy (e.g. write the start; for each class: write it; write the
end...) of creating a document from the parsed documentation (as
"document" i mean something that can be read from the beginning to the
end, which is very different from the hypertext created by the html
writer) but delegating the details of how to write the leaves (such a
paragraph or a string) to concrete subclasses (cfr. the strategy
pattern). Such base class may be subclasses into concrete writers for
LaTeX, Docbook, plain text (which is currently less maintained than
the other writers), and going on with single page html, reST...
The step 3. could require more knowledge of the single markups. On the
pro side it can be accomplished gradually, because there is always a
fallback that would appear as monotype text. The current to_html()
implementation would help you of course.

I'd ask Edward if he would welcome the refactoring described in step
2. I think that creating a Docbook writer the naive way would lead to
harder maintenance, inconsistencies between formats and to too much
code duplication.

Let me now if i can help you, but i'd like to hear Ed's advice first.

Good luck!

Daniele