Thread: [Docutils-develop] Re: [Doc-SIG] ASCII to Word?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Sorry for the long delay on this response, but I've been too
scatter-brained to concentrate on this.  I'm combining responses to two
separate messages.

On Tue, Jun 04, 2002, David Goodger wrote:
> Aahz wrote:
>>
>> All right, I'm ready to get started on writing my book.
> 
> Great!  What's the title and/or subject, if you can say?  Are there
> any special features you'll need?

Working title is _Effective Python_, playing off _Effective C++_ and
_Effective Java_.  The subject should be pretty obvious if you're
familiar with those, though the emphasis will be a bit different.  If
you're not familiar with those books, it's basically a cross between a
tutorial and a reference book for the intermediate programmer, short
enough to carry around easily (targeted at 250 pages).  The final title
will change because my publisher is Manning and we don't want to look
like we're poaching on Addison-Wesley.

>> My publisher wants the document in Word format.
> 
> My condolences.

Well, if we do this right, it won't matter, since the ASCII source will
be the same.

>> I'm starting in some form of ASCII, quite likely reST.
> 
> Glad to hear it.  I (and hopefully others involved with Docutils) will
> be happy to help.
> 
> But are there any other requirements other than "Word"?  When I wrote
> a chapter on Python for Wrox [*]_, they sent me an elaborate (and
> broken) stylesheet to use.  Anything like that?  Could be significant.
> 
> .. [*] *Professional Linux Programming*, chapter 15.  My mug is at the
>    far right, second from the top.

It's not clear, but it shouldn't matter -- if I send them the correct
style tags, they should be able to handle it properly.  Note that Word
is actually not the end product; they then convert to Frame for
typesetting.  (They just do their copy-editing in Word.  <sigh>)

> Once you produce a Word document, I assume your publisher will add
> comments to the Word files and send them back to you for revision.
> (Wrox loved using Word's comment feature.)  Will your publisher
> insist on preserving the comments and revision history?  If they do
> insist on "read-write" Word files, you're screwed, and are stuck with
> Word.  If they'll accept fresh, uncommented drafts each time, then
> using a toolchain to generate Word files as a final, read-only display
> format should be feasible.  On second thought, I think you can merge
> versions of documents in Word, so you might be safe either way.

I've made it clear to them that this is going to be a toolchain-based
formatting system, and they haven't said anything about me being able to
deal with Word documents.

> [Engelbert Gruber]
>>> for going to word html might be not a bad option as word would read
>>> it as i heard. what means word actually, a word readable format is
>>> definately something different ? would rtf be word enough ?
> 
> [Aahz]
>> RTF is an old format with few features; in particular, it doesn't
>> support index tags.
> 
> If anyone cares to dig in deep, the RTF 1.6 spec is online at:
> http://msdn.microsoft.com/library/?url=/library/en-us/dnrtfspec/html/rtfspec
> .asp
> The 1.7 spec can be downloaded from:
> http://download.microsoft.com/download/Word2002/Install/1.7/W98NT42KMeXP/EN-
> US/W2KRTFSF.exe

Yeah, I've been thinking about that.  I've been informed that I'm wrong
about RTF not supporting index tags, but one advantage of using
OpenOffice as the intermediate formate is that it's editable.

>> I suspect HTML would have the same problem; I'll assume it does
>> unless someone can tell me otherwise from direct experience.
> 
> With the "class" attribute on "span" and "group" elements, HTML can
> represent literally anything, if indirectly and inelegantly.  The
> problem is, can the software *reading* the HTML (Word, in this case)
> *do* anything with this information?  Probably not.  You'll need more
> direct access to native features.

Yes, exactly.

>> In the absence of better advice, I'm going to convert to
>> OpenOffice's XML format, then use OpenOffice to convert to Word.
> 
> I found docs on the StarOffice XML spec (which is the draft for
> OpenOffice) at http://xml.openoffice.org/.  It looks much more
> promising than RTF.  If it's a reasonable markup -- and it appears to
> be so -- then I don't forsee any major problems with an "OpenOffice
> Writer" for Docutils.  Any takers?

It'll probably end up being me, either by writing a direct converter or
by writing XSLT for the DocBook format.  One problem I've discovered
with OpenOffice's "XML format" is that it isn't, really.  It's a ZIP
package of several XML documents.  Not a huge problem, but a bit
annoying.

>> What I'd really like is a DocBook-to-Word converter, but I haven't
>> seen anything like that.  If I don't get anything here, my next step
>> is to ask on comp.text.xml.
> 
> I wouldn't be surprised if DocBook-to-Word converters *do* exist.  The
> question then becomes, how to produce the DocBook?  Would you still
> use reStructuredText?  A "DocBook Writer" for Docutils shouldn't be
> too challenging.

Yes, I'm committed to starting with some kind of simple ASCII, no matter
what.  Haven't seen any evidence for DocBook-to-Word converters, and
it's definitely true that Jade's RTF converter doesn't support index
tags.

> Did that scare you off?  If not, I'm sure that together we can come up
> with a decent system.  Input from real-life usage like this is
> invaluable.  Docutils will undoubtedly benefit.

Not scared, but not sure how much I'll be able to contribute other than
feedback and the OpenOffice converter.  I'm willing to do a fair bit,
because I've had graphic evidence writing my OSCON slides how effective
structured text can be for rapid generation of formatted text.  (I used
my own structured text language and utility.)

On Mon, Jun 24, 2002, David Goodger wrote:
> Aahz wrote:
>>
>> Do you have a mechanism for generating index entries?
> 
> I doubt Oliver's DocBook writer does, because reStructuredText and the
> rest of DocBook doesn't, yet.  You'll be needing these?

Absolutely.  If it weren't for index tags, I could probably use a lot of
other options.

> For your purposes, setting a default role of "index" may do the trick.
> If all of your index entries will appear verbatim in the text, this
> should be sufficient.  If not (e.g., if you want "Van Rossum, Guido"
> in the index but "Guido van Rossum" in the text), we'll have to figure
> out a supplemental mechanism, perhaps using substitutions.

There will definitely need to be directives in addition to interpreted
text, for "see" and "see also" entries; it may be easier for now to use
only directives.

> Either way, no support for any of this is implemented yet.  Can you
> spell out your requirements?

Yes and no.  I think I'm going to need to spend some time figuring out
the current state of reST before I do much formal specification.  But if
reST has some corresponding functionality for every element in DocBook,
I know it can be made to work for my purposes.  (Yeah, I know that's not
going to happen. ;-)

BTW, how should one display the result of "import this"?  It's not an
enumerated list nor a bulleted list, but it's definitely a list of some
sort.
-- 
Aahz (aa...@py...)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/

Thread: [Docutils-develop] Re: [Doc-SIG] ASCII to Word?

docutils-develop