[Doxygen-develop] Doxygen and XML... (was: Status of XML development? (was Adding o f..))
Brought to you by:
dimitri
From: Prikryl,Petr <PRI...@sk...> - 2001-07-23 12:29:00
|
Hi, Dimitri wrote... > > I see XML output as an intermediate interface, that would allow > several front-ends to produce specific output (e.g. html output or > something very different like code metrics). The XML output would > contain all information, the front-ends will then pick the appropriate > information and transform that into the actual output. The more I think about it, the more I also incline towards internal XML DTD (say doxygen internal XML). > In theory there are plenty of XML tools that can transform XML output > into something else. In practice these tools are just not there (at > least I haven't seen them). All that there really is, is an easy way to > parse XML and build up the structure contained in an XML file into > structures in memory. So the plan is to provide a C++/Qt based XML > parser that understands doxygen's XML output. People that wish to > add support for another output format can do so by using the structures > build up by this parser. I am very new to XML, but there are tools used with DocBook XML and they are more general than only for supporting DocBook. This requires better analysis. > With respect to DocBook format: I have looked at it, but I think it > covers only 20% of what doxygen will produce. So any docbook tool > (which are currently all SGML based by the way), wouldn't be very > useful. I am not sure (just starting with DocBook), but I think that DocBook is much richer that say HTML or LaTeX and it is very suitable for producing the end documents. It may not fit to be used as the internal XML format, but I would see it as the main final output format. Let's think about the following approach: input sources | +--> doxygen internal XML (by doxygen parsers) | +--> DocBook XML | | | +--> HTML | +--> RTF | +--> jadetex --> DVI, PDF, PS | +--> etc. | +--> some other postprocessing of the internal doxygen XML The important thing to note is that DocBook is not exclusively SGML based. While this could be the truth in the past, majority of DocBook users probably uses DocBook XML these days. Norman Walsh, one of the DocBook leaders also considers the XML be the future of DocBook. I suggest to focus on DocBook XML exclusively (instead of thinking about DocBook SGML). What should be clarified is the mentioned 20% coverage of doxygen's problems by DocBook. > I do not know how these ideas match/conflict with the character > encoding problems mentioned by Petr. Would using XML like this still > solve all those problems? I guess that yes -- XML will always help to solve the problems. At least, the first parsing phase can be done without problems with respect to encoding. Once having the correctly marked internal XML, all problems with languages and encoding become covered by the XML standard. What I see as extremely important here is to use correctly the encoding attribute and the xml:lang attribute. This implies neccessary splitting the XML output into separate files, at least, based on the encoding -- if the standard approach was chosen. Here are the reasons: a) If XML document consists of more than one file, one of the files is main (contains the DTD identification), the other files are read as so called "external entities" (basically &myfile1; is expanded as the content, the &myfile1; entity is defined inside one separate file). b) Each xml file implicitly assumes UTF-8 encoding. If other encoding is used, the first line should contain: <?xml version="1.0" encoding="windows-1250"> for main file or <?xml encoding="windows-1250"> for the other files (i.e. the external entities). Then, the rest of the file contains the text encoded in the mentioned encoding. This also means that new Doxyfile option should be used for the implicit encoding of the input sources. And also, new doxygen tag should be introduced for explicit marking the file content encoding. This way, it would be possible to process project files with different encoding (legacy and OS dependency reasons). c) The language specific text (i.e. not the encoding specific but really things like English, French, Portuguese) can be marked so in any element using the xml:lang attribute. Example (here in <para> but this can be inside "any" element): <chapter xml:lang="en"> <para>Some text in English</para> <para xml:lang="fr">Bonjour (i.e. some exceptional text in French -- excuse mois; I have close to zero knowledge of French ;-).</para> <para xml:lang="ptBR">Brazilian Portuguese</para> ... </chapter> This also means that doxygen could define new tags for marking the other language than the base sources (human) language. The Doxyfile should define new option that says what is the implicit language of input sources -- possibly INPUT_LANGUAGE. This can be (of course) different than the existing OUTPUT_LANGUAGE. The sentences generated by doxygen translators can be produced as named entities definitions into one file -- this would require further analysis. IMPORTANT: The output could even (possibly) be generated independently on the languages and the translator could possibly collapse into one general class. The internationalization can possibly be done via language dependent entity rendering via DSSSL or XSL files (I am not very good here yet. But at least for DSSSL it is done in DocBook this way). I still think that DocBook XML should be the main output to files. The internal XML coul be so much intermediate that it could exist only in memory in the form supported by some standard XML library. For that reason, the internal XML should use DocBook tags if the tags should not be somehow more special (in the sense to prefer <para> instead of <p>). > The nice thing about having an intermediate > file is that the parser and front-end could also be written in another > language such as Python. Furthermore, other input parsers could produce > the same XML output and benefit from the availble front-ends. > > In summary doxygen would consist of the following: > > - the main engine as a library > - the xml parser as a library > - an extendable configuration parser as a library (contains the > config options for the engine, but can be dynamically extended by the > front-ends to support more options). > - a number of front-ends, either as a libraries or as a standalone tools > - some glue to make a user friendly tool out of these. As far as I understand, the internal XML format will not contain any sentences generated by doxygen translators. The things like the text around, say, the list of places from where the method is called, is not generated into the internal XML. Am I right? I would like to be ;-) Regards, Petr -- Petr Prikryl, SKIL, spol. s r.o., pri...@sk... |