|
From: Julian S. <js...@ac...> - 2004-08-28 14:11:23
|
Greetings.
Converting the documentation into XML
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I've been looking into converting all of Valgrind's documentation into
XML. It seems to me that having the originals in html is not a good
long-term solution and that XML is a better starting format since:
- Conversion from html to other formats (pdf, dvi, ps, tex) is too
difficult, and there have been requests for the docs in those
formats.
- Using xml allows generation of the documentation in a book form,
which looks a lot more professional, is easier to use, and prints
nicely. Printing the html pages does not give what one would
normally think of as a decent manual.
- xml documents must be well-formed and validated prior to processing;
therefore all resultant html is guaranteed to be well-formed
(ie. w3c compliant), which in turns means accessible by any browser.
If invalid xml is committed, it simply won't build.
- One day we may be able to edit the xml masters directly in
OpenOffice (wysiwyg), which would save loads of time and effort. At
the moment, one has to edit the html and constantly reload it in a
browser to see that the right thing is happening.
- XML may make it easier to translate the documentation into different
languages, should that become necessary.
- XML allows consistent, checked, cross-referencing across the entire
documentation set. Similarly, out-of-date references to fixed bits
of information (old email addresses, dates, version numbers, etc)
goes away because we declare one single entity thus:
<!ENTITY vg-lifespan "2000-2004">
then everywhere we need to put copyright stuff, we can say:
Copyright (C) &vg-lifespan; Valgrind Developers
- And finally, using xml allows us to separate content from
presentation (always a Good Thing).
Changes in repo structure and build process
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Roughly speaking, the changes in repo structure are as follows:
- All the html files disappear and are replaced with xml:
Docs specific to each tool:
valgrind/<toolname>/docs/<file(s)>.xml
Main /docs/ dir:
valgrind/docs/
Distribution docs in plain text format, ie. README, etc.
valgrind/docs/dist_files/
Any images used in the docs:
valgrind/docs/images/
Stylesheets, catalogs, parsing/formatting scripts:
valgrind/docs/lib/
Top-level xml files: docs/xml/
vg-bookset.xml: top-level bookset wrapper
faq.xml, howto.xml: also kept in here.
vg-entities.xml: various strings, dates etc. used by the various docs
tool-template.xml: template file for writing new tool docs.
xml-hints: quick reference for frequently-used xml tags.
- The build process is entirely driven from a single Makefile in
valgrind/docs/. Building documentation is done by:
(cd docs && make target)
where target is one of html, pdf, ps, or all.
What you get
~~~~~~~~~~~~
What you get is a sequence of "books", where each book is a
self-contained entity, with its own table of contents, and index if
you want. Linking both within and across books is consistently done.
The main books generated are:
(user docs)
- The User's Manual
- The FAQ
- The Howto
(tech docs)
- The design and implementation of valgrind (old)
- How cachegrind works
- Writing a new Valgrind Tool
(reference)
- Distribution documents (the old README* files)
- GNU General Public License
Clearly we can mash around the top-level book structure as needed, but
this seems like a good start.
A tarball of the final results is available at
http://www.dancedetails.org/vgdocs.tar.bz2
This contains the generated .pdf, .ps and .html for the user manual
and FAQ, so you can get a good idea what the result looks like.
Toolchain dependencies
~~~~~~~~~~~~~~~~~~~~~~
Some time was spent on the docbook-apps list in order to ascertain the
most-useful / widely-available / least-fragile / advanced toolchain.
Basically, everything has problems of one sort or another, so I ended
up going with what I felt was the least-problematical of the various
options.
It's unrealistic to expect end-users to have these tools available on
their systems. Therefore the plan is to have two different kinds of
build: end-user builds, and developer builds. Developers must have
the required tools, and will be able to build pdf, ps, html, etc (any
supported target). For end-users building from tarballs, we will
pre-generate the pdf/ps/html, and these will simply be copied by 'make
install', so there is no further fragility there.
Developers need to ensure the following tools are present:
XML validation
- xmllint: libxml version 20607
XML translation top-level driver (.xml -> .html or .xml -> .fo)
- xsltproc: libxml 20607, libxslt 10102 and libexslt 802
Converts .fo to .pdf
- pdfxmltex: pdfTeX (Web2C 7.4.5) 3.14159-1.10b
- and therefore TeX
Converts .pdf to .ps
- pdftops: version 3.00
DocBook (schema, DTD, style sheets, macros, etc)
- DocBook: version 4.2
Converts .html into .txt
- lynx
Needed at some point in the process
- bzip2
A big problem is latency. DocBook is constantly being updated, but
the tools available in Linux distros tend to lag behind. It is
important that the versions get on with each other. If you decide to
upgrade something, you need to ensure that things still work nicely -
something which cannot be assumed.
All feedback gratefully received. It would be handy for folks to
look at the results, at http://www.dancedetails.org/vgdocs.tar.bz2.
J
|
|
From: Nicholas N. <nj...@ca...> - 2004-08-28 14:45:58
|
On Sat, 28 Aug 2004, Julian Seward wrote: > Distribution docs in plain text format, ie. README, etc. > valgrind/docs/dist_files/ I'm not so keen on this -- it's totally standard for README, INSTALL, COPYING, TODO, etc. to be in the top-level directory. Nobody will look for them in valgrind/docs/dist_files/. As for the FAQ, it's nice to have a copy of that in the top-level directory as well, in plain-text, if possible. > (tech docs) > - The design and implementation of valgrind (old) > - How cachegrind works > - Writing a new Valgrind Tool I'm not sure if the first two need to be in there any more. If they are, the first should definitely be marked at the top with a big "out of date, many details no longer true!" warning. The rest seems good to me. > All feedback gratefully received. It would be handy for folks to > look at the results, at http://www.dancedetails.org/vgdocs.tar.bz2. Is that the original XML, or the resulting HTML? It would be good to have both up for viewing. N |
|
From: Donna R. <do...@te...> - 2004-08-28 15:31:56
|
> I'm not so keen on this -- it's totally standard for README, INSTALL, > COPYING, TODO, etc. to be in the top-level directory. Nobody will look > for them in valgrind/docs/dist_files/. sorry - I didn't explain this very well. At build-time, FAQ.txt *also* gets generated from the xml, and put into dist_files/. After the other formats are generated, the entire contents of dist_files gets moved into the top-level dir valgrind/ > As for the FAQ, it's nice to have a copy of that in the top-level > directory as well, in plain-text, if possible. see above. > > (tech docs) > > - The design and implementation of valgrind (old) > > - How cachegrind works > > - Writing a new Valgrind Tool > I'm not sure if the first two need to be in there any more. If they are, > the first should definitely be marked at the top with a big "out of date, > many details no longer true!" warning. um, well, praps *all* the docs need a facelift? > Is that the original XML, or the resulting HTML? It would be good to have > both up for viewing. This is just a tarball of files in .pdf, .ps and .html format so people can get can idea of what it all comes out like. d |
|
From: Bob F. <bfr...@si...> - 2004-08-28 15:08:27
|
On Sat, 28 Aug 2004, Julian Seward wrote: > > - One day we may be able to edit the xml masters directly in > OpenOffice (wysiwyg), which would save loads of time and effort. At > the moment, one has to edit the html and constantly reload it in a > browser to see that the right thing is happening. OpenOffice already uses XML format for its document files. Do an 'unzip -l' on an OpenOffice .sxw file to see the content. The XML is very compact and ugly, but I notice that there is a configuration option to format the XML for humans (resulting in larger files). Bob ====================================== Bob Friesenhahn bfr...@si... http://www.simplesystems.org/users/bfriesen |
|
From: Donna R. <do...@te...> - 2004-08-28 15:33:20
|
On Saturday 28 August 2004 16:08, Bob Friesenhahn wrote: > OpenOffice already uses XML format for its document files. At this point I think OO can only deal with DOCTYPE article. Most of the vg-docs are DOCTYPE books, so no-go at present. d |
|
From: Tom H. <th...@cy...> - 2004-08-28 16:14:54
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> A tarball of the final results is available at
>
> http://www.dancedetails.org/vgdocs.tar.bz2
>
> This contains the generated .pdf, .ps and .html for the user manual
> and FAQ, so you can get a good idea what the result looks like.
Did somebody do s/code/computeroutput/ or something? Only there
are lots of instances of the strange word computeroutput in that
documentation ;-)
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Donna R. <do...@te...> - 2004-08-28 20:29:42
|
On Saturday 28 August 2004 17:14, Tom Hughes wrote:
> Did somebody do s/code/computeroutput/ or something? Only there
> are lots of instances of the strange word computeroutput in that
> documentation ;-)
that's the closest I could get to <code>blah</code>
it's not great.
FYI - xml to html markup transformations:
<programlisting> --> <pre class="programlisting">
<screen> --> <pre class="screen">
<computeroutput> --> <tt class="computeroutput">
<literal> --> <tt>
<emphasis> --> <i>
<command> --> <b class="command">
<blockquote> --> <div class="blockquote">
<blockquote class="blockquote">
I claim to be a total non-expert at all this.
If anyone has better ideas, tell me please.
d
|
|
From: Tom H. <th...@cy...> - 2004-08-28 21:00:48
|
In message <200...@te...>
Donna Robinson <do...@te...> wrote:
> On Saturday 28 August 2004 17:14, Tom Hughes wrote:
> > Did somebody do s/code/computeroutput/ or something? Only there
> > are lots of instances of the strange word computeroutput in that
> > documentation ;-)
>
> that's the closest I could get to <code>blah</code>
> it's not great.
What I meant was that it is only tags that have been changed. Lots of
instances of the word code in the text itself have been changed.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Donna R. <do...@te...> - 2004-08-28 21:06:48
|
On Saturday 28 August 2004 22:00, Tom Hughes wrote: > What I meant was that it is only tags that have been changed. Lots of > instances of the word code in the text itself have been changed. thanks, yes, on further investigation you are right. mea culpa - that's what you get for not paying attention when doing search & replace. will fix asap. (blush) d |