Thread: [Epydoc-devel] XML Docbook output
Brought to you by:
edloper
From: Saint G. <sai...@gm...> - 2007-02-01 00:26:11
|
Hello, I'm currently an active user of Python and XML Docbook and would like to have my Python documentation generated in XML Docbook. Epydoc seems to be a really fine documentation generator and I would like to help you add an XML Docbook output. I know Python, XML Docbook and LaTeX and I'm a heavy user of the dblatex project (Docbook to LaTeX, and it's written in Python !) : http://dblatex.sourceforge.net Would you like me to add such a feature and if yes can someone help to guide me through Epydoc ? If the LaTeX output is OK, I can just analyse this part and convert it in XML Docbook ? Thanks in advance, |
From: Daniele V. <dan...@gm...> - 2007-02-05 16:42:44
|
On 2/1/07, Saint Germain <sai...@gm...> wrote: > Hello, > > I'm currently an active user of Python and XML Docbook and would > like to have my Python documentation generated in XML Docbook. > > Epydoc seems to be a really fine documentation generator and I would > like to help you add an XML Docbook output. > > I know Python, XML Docbook and LaTeX and I'm a heavy user of the > dblatex project (Docbook to LaTeX, and it's written in Python !) : > http://dblatex.sourceforge.net > > Would you like me to add such a feature and if yes can someone help to > guide me through Epydoc ? > > If the LaTeX output is OK, I can just analyse this part and convert it > in XML Docbook ? If you want to create a Docbook generator, the best starting point is largely the LaTeX generator. Honestly i've not been using the latter generator as often as the HTML generator, but it is stable enough to generate at least documentation for Epydoc itself. Furthermore most of the hard work (parsing the distinct pieces and merging stuff together) is carried out before even reaching the docwriter. A sketch of the epydoc structure and where you may want to put your hands if you want to implement is the following:: Markups Writers epytext | | LaTeX javadoc | -> build -> | HTML plaintext | | plaintext restructuredtext | Each markup is implemented by a ParsedDocstring subclass. Writers are more free-form beasts: there is no base class for writers, whose interface is a single function called by the cli.py module after options have been parsed and a docindex generated. The ParsedDocstring interface offers methods to retrieve a docstring in the target output format: currently methods to_plaintext(), to_html(), to_latex() methods are exposed. The concrete subclasses are responsible to implement such methods (while a fallback must be provided by the base class). Writing a Docbook generator should require the following steps: 1. add a new writer implementing the sequences of calls to observe the built set of documents; 2. add a new method ParsedDocstring.to_docbook() implementing a default behavior, which typically is to return a a few more than the text version; 3. implement the details of converting the specific markups into Docbook format. 2. should be easy. You may wrap the to_plaintext() output into <programlisting> tags, i guess; see ParsedDocstring.to_html() for an example. 1. can be cargo-culted from the LaTeX writer, but this probably would lead to more maintenance burden. Probably a nicer work would be to refactor the LaTeX writer creating a base class implementing the strategy (e.g. write the start; for each class: write it; write the end...) of creating a document from the parsed documentation (as "document" i mean something that can be read from the beginning to the end, which is very different from the hypertext created by the html writer) but delegating the details of how to write the leaves (such a paragraph or a string) to concrete subclasses (cfr. the strategy pattern). Such base class may be subclasses into concrete writers for LaTeX, Docbook, plain text (which is currently less maintained than the other writers), and going on with single page html, reST... The step 3. could require more knowledge of the single markups. On the pro side it can be accomplished gradually, because there is always a fallback that would appear as monotype text. The current to_html() implementation would help you of course. I'd ask Edward if he would welcome the refactoring described in step 2. I think that creating a Docbook writer the naive way would lead to harder maintenance, inconsistencies between formats and to too much code duplication. Let me now if i can help you, but i'd like to hear Ed's advice first. Good luck! Daniele |
From: Edward L. <ed...@gr...> - 2007-02-06 02:01:04
|
On 2/1/07, Saint Germain <sai...@gm...> wrote: >> I know Python, XML Docbook and LaTeX and I'm a heavy user of the >> dblatex project (Docbook to LaTeX, and it's written in Python !) : >> http://dblatex.sourceforge.net >> >> Would you like me to add such a feature and if yes can someone help to >> guide me through Epydoc ? If you haven't already, you should read through the front page of the epydoc API docs: http://epydoc.sourceforge.net/api/ It provides a brief overview of how epydoc is structured. Daniele Varrazzo wrote: > If you want to create a Docbook generator, the best starting point is > largely the LaTeX generator. Agreed. And feel free to suggest changes to the LaTeX generator -- it doesn't get nearly as much use as the HTML generator, and so it isn't as well developed. Daniele's description of how you should go about adding a new docwriter should be enough to get you started, but if you get stuck or have further questions, you can certainly email us. Daniele Varrazzo wrote: > 1. can be cargo-culted from the LaTeX writer, but this probably would > lead to more maintenance burden. Probably a nicer work would be to > refactor the LaTeX writer [...] > > I'd ask Edward if he would welcome the refactoring described in step > 2. I think that creating a Docbook writer the naive way would lead to > harder maintenance, inconsistencies between formats and to too much > code duplication. I think this type of refactoring would be a good idea. I expect the LaTeX and Docbook outputs to be structured very similarly. And there may be other output formats that would also share similar structure. -Edward |
From: Saint G. <sai...@gm...> - 2007-02-06 02:04:51
|
On Mon, 5 Feb 2007 17:41:43 +0100, "Daniele Varrazzo" <dan...@gm...> wrote : > > I'm currently an active user of Python and XML Docbook and would > > like to have my Python documentation generated in XML Docbook. > > > > Epydoc seems to be a really fine documentation generator and I would > > like to help you add an XML Docbook output. > A sketch of the epydoc structure and where you may want to put your > hands if you want to implement is the following:: > > Markups Writers > > epytext | | LaTeX > javadoc | -> build -> | HTML > plaintext | | plaintext > restructuredtext | > > Each markup is implemented by a ParsedDocstring subclass. Writers are > more free-form beasts: there is no base class for writers, whose > interface is a single function called by the cli.py module after > options have been parsed and a docindex generated. Ok I understand. > The ParsedDocstring interface offers methods to retrieve a docstring > in the target output format: currently methods to_plaintext(), > to_html(), to_latex() methods are exposed. The concrete subclasses are > responsible to implement such methods (while a fallback must be > provided by the base class). > > Writing a Docbook generator should require the following steps: > 1. add a new writer implementing the sequences of calls to observe the > built set of documents; > 2. add a new method ParsedDocstring.to_docbook() implementing a > default behavior, which typically is to return a a few more than the > text version; > 3. implement the details of converting the specific markups into > Docbook format. > > 2. should be easy. You may wrap the to_plaintext() output into > <programlisting> tags, i guess; see ParsedDocstring.to_html() for an > example. Ok it's a start > 1. can be cargo-culted from the LaTeX writer, but this probably would > lead to more maintenance burden. Probably a nicer work would be to > refactor the LaTeX writer creating a base class implementing the > strategy (e.g. write the start; for each class: write it; write the > end...) of creating a document from the parsed documentation (as > "document" i mean something that can be read from the beginning to the > end, which is very different from the hypertext created by the html > writer) but delegating the details of how to write the leaves (such a > paragraph or a string) to concrete subclasses (cfr. the strategy > pattern). Such base class may be subclasses into concrete writers for > LaTeX, Docbook, plain text (which is currently less maintained than > the other writers), and going on with single page html, reST... Seems reasonable. Does that mean that currently the LaTeX and html writers are completely independant ? And how is currently built the LaTeX writer ? Do you just throw the LaTeX markups one after another in a procedural way ? > The step 3. could require more knowledge of the single markups. On the > pro side it can be accomplished gradually, because there is always a > fallback that would appear as monotype text. The current to_html() > implementation would help you of course. That step could be quite long but rather easy : Docbook markup are really clear and there are no subtles/magics as with LaTeX. > I'd ask Edward if he would welcome the refactoring described in step > 2. I think that creating a Docbook writer the naive way would lead to > harder maintenance, inconsistencies between formats and to too much > code duplication. I of course agree. I can help a few hours (let's say 3-5) per week at most. > Let me now if i can help you, but i'd like to hear Ed's advice first. Well I would have started even without a refactoring, just for the fun of it but it's better to wait and see if you want to refactor first... Regards, |
From: Daniele V. <dan...@gm...> - 2007-02-06 02:55:51
|
Saint Germain ha scritto: >> 1. can be cargo-culted from the LaTeX writer, but this probably would >> lead to more maintenance burden. Probably a nicer work would be to >> refactor the LaTeX writer creating a base class implementing the >> strategy (e.g. write the start; for each class: write it; write the >> end...) of creating a document from the parsed documentation (as >> "document" i mean something that can be read from the beginning to the >> end, which is very different from the hypertext created by the html >> writer) but delegating the details of how to write the leaves (such a >> paragraph or a string) to concrete subclasses (cfr. the strategy >> pattern). Such base class may be subclasses into concrete writers for >> LaTeX, Docbook, plain text (which is currently less maintained than >> the other writers), and going on with single page html, reST... > > Seems reasonable. > Does that mean that currently the LaTeX and html writers are completely > independant ? Yes, they are. They actually create very different output: html creates an hypertext with many index page summarizing different facets of a package (the modules, the classes, all the names...) and each module creates a matching html page, with links to other page with annotated and colored code and some javascript thrown in. The LaTeX writer (of which i don't have a good insight: i rarely used it and worked on its source) also creates many files, but tied together into a document suitable for printing. There are some common services that both LaTeX and html could benefit, but i can see much more similarities between two writers of linear documents such LaTeX and Docbook would be. > And how is currently built the LaTeX writer ? Do you just throw the > LaTeX markups one after another in a procedural way ? I'll try to answer, but i don't know if i correctly understood your question. The writer (both LaTeX and others) receives a DocIndex instance holding the whole content of the analysis Epydoc carried on the code which has been fed to it. It's the writer responsibility to decide what to use of the whole informations bulk and how: the writer is a viewer of the model in the DocIndex instance. The LaTeX document generation is directed by the write() method, which creates a top-level file and then iterates over the DocIndex node running the proper generation function for each class and module it meets. In this "generation strategy" there is almost no LaTeXisms (about only the generated ".tex" files extensions smell like LaTeX). The functions called by the write() method on selected detail decide what to write about such detail; so for a "class" there is an iteration on its methods, and for each of them a proper write_something() function is called. Apart from sporadic snippets of LaTeX code you can find here and there, the navigation in the DocIndex nodes basically the same you would carry on to generate a Docbook output (assuming you want to put the same information in such documents of course). The leaves function called during the index navigation are the ones where you will find most of the \stuff{\like\this} you can expect in a LaTeX generator, which should be replaced by <stuff><like/><this/></stuff> to create a Docbook document. > Well I would have started even without a refactoring, just for the fun > of it but it's better to wait and see if you want to refactor first... Then, please, start and have a good time while coding :) Please, don't wait a refactoring movement pouring from above. Mainly because there's no point in creating a base class for a single subclass: it would be programming in the vacuum. Instead trying to adapt the LaTeX writer into a Docbook writer would give you a precise idea of what is common to both writers and what is specific to each one. I'd effectively proceed this way: - copy the current LatexWriter class into a LinearDocumentWriter class. - create an empty subclass DocbookWriter(LinearDocumentWriter) - walk the code from the entry point write(): each time you stumble into a latex output string, translate such output into Docbook idiom, but put the updated strings into the DocbookWriter. You will end up with a base class dispatching action into a concrete subclass, which is actually an implementation of the strategy design pattern See for example LatexWriter.write_class() method: it's about independent from the output format, except for a single statement: # Label our current location. out(' \\label{%s}\n' % self.label(doc)) You may replace it with a call: self.write_current_location(out, self.label(doc)) and write a matching implementation in the Docbook subclass: def write_current_location(self, out, label): # pretending i know Docbook markup... out(' <a name="%s" />\n' % label) Where to dispose all that messy latex strings? Probably another concrete subclass would be the ideal bin :) Hope this helps. Feel free to write if you need help. Regards, Daniele |