From: Stephen E. <ste...@ya...> - 2007-08-13 07:31:51
|
Hi, (1) I would like to convert docbook source files into restructured text. Can anyone suggest a way to do this? (2) Could someone please explain to me how to tranform restructured text into docbook. I am currently working in a windows envt. plus cygwin if required, but I am open to all suggestions regardless of platform. Many thanks, Stephen ___________________________________________________________ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ |
From: Andreas R. <reu...@we...> - 2007-08-16 20:44:57
|
just some notes (as I see that no one else has answered these questions yet): * There is a sandbox project for (2) rst->dbk, have a look at http://svn.berlios.de/viewcvs/docutils/trunk/sandbox/oliverr/docbook/ can't comment on the shape of that project though. Personally I have taken a different approach: rst --(via rst2xml)--> xml --(via xslt stylesheets/processor)--> dbk I found this somehow easier/natural as dbk is xml and rst is provided as xml already via rst2xml. One can break down the main style sheet in smaller ones by inclusions, like this: <xsl:include href="./para.xsl" /> for paragraphs e. g., para.xsl is then just responsible for the translation of rst paragraphs to dbk para's <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <xsl:template match="paragraph"> <para> <xsl:apply-templates /> </para> </xsl:template> </xsl:stylesheet> That's not to hard, is it? One particular difficulty I found: both dbk and rst allow splitting up documents in smaller pieces and pulling them in by inclusions, e. g. mydoc.rst might look like this ... .. include:: foo.rst .. include:: bar.rst just doing a conversion mydoc.rst -> mydoc.dbk might not be to hard, but if you want the output resemble the input structure i. e. mydoc.dbk to look like this ... <xi:include xmlns:xi="http://www.w3.org/2003/XInclude" href="foo.dbk"/> <xi:include xmlns:xi="http://www.w3.org/2003/XInclude" href="bar.dbk"/> that's harder, because in order to get the references within the document right, you have to look at the whole document just doing foo.rst -> foo.xml will typically result in lots of errors: missing references etc. So what I am doing is this: to produce foo.xml from foo.rst I translate the whole document mydoc.rst to mydoc.xml and pick just the foo.xml part from within mydoc.xml - sounds horribly inefficient but rst2xml is fast, so it is not that bad. The second part of the translation foo.xml -> foo.dbk by means of the stylesheets can then just work on that individual foo node. My code would however require some cleanup, it's not at all ready for a release and I don't really have the time for it. * dbk -> rst I have written a parser that works like this: I parse dbk as a dom tree. from this I create my own recursive tree consisting of Nodes, Blockquotes, Titles, etc. - I found this easier than working with the dom directly. Each of those node classes knows how to translate itself to rst, i. e. has some rst method. There is also a pretty method which shows the Node (the tree). The code really needs some more love (and I am short on time) - lot's of edge cases are not treated correctly, rst output needs love etc. Anyway I am posting this here AS IS in the hope that it might be useful - see below hope this helps -Andreas dbk2rst.py -------------------------------------------------- #!/usr/bin/env python2.4 # -*- coding: utf-8 -*- import sys from Ft.Xml.Domlette import NonvalidatingReader, PrettyPrint from Ft.Lib import Uri from Ft.Xml.XPath import Evaluate from StringIO import StringIO import re from string import Template # globals, s. init() global s, doc def init(file): global s, doc s=StringIO() uri = Uri.OsPathToUri(file) doc = NonvalidatingReader.parseUri(uri) def dom2Node(n): if n.nodeName=='para': return Para.fromDom(n) elif n.nodeName=='blockquote': return Blockquote.fromDom(n) elif n.nodeName=='#text': return Text.fromDom(n) elif n.nodeName=='footnote': return Footnote.fromDom(n) elif n.nodeName=='title': return Title.fromDom(n) else: return Node.fromDom(n) def sep(fst, sep, snd): if fst and snd: return fst+sep+snd else: return fst + snd class Node(object): def __init__(self, value=None, children=[], parent=None): self.value=value self.children=children self.nodeStr='Node' @classmethod def fromDom(self, node): n=Node() n.value=node.nodeValue n.children=[dom2Node(c) for c in node.childNodes] return n def indent(self): if self.parent: return self.parent.indent()+len(self.nodeStr)+2 else: return len(self.nodeStr)+2 def pretty(self, indent=0): t=Template("$node($valueChildren)") v=self.value and '"'+self.value[:20].encode('utf-8')+'"' or '' indent=indent+len(self.nodeStr)+2 if self.children: cn='['+(',\n'+indent*' ').join([c.pretty(indent) for c in self.children])+']' else: cn='' return t.substitute(node=self.nodeStr, valueChildren=sep(v, ', ', cn)) def ws(self): r=re.compile('^\s+$') return bool(r.search(self.value or ' ')) and not(self.children) def rmWs(self): self.children=[c.rmWs() for c in self.children if not(c.ws())] return self def rst(self): vstr=self.value and self.value.encode('utf-8') or '' cstr=''.join([c.rst() for c in self.children]) return vstr+cstr class Para(Node): def __init__(self, children=[]): super(Para, self).__init__(children=children) self.nodeStr='Para' @classmethod def fromDom(self, node): p=Para() p.value=node.nodeValue p.children=[dom2Node(c) for c in node.childNodes] return p def rst(self): return super(Para, self).rst()+'\n\n' def indent(str, num=2): lines=str.split('\n') lines=[num*' '+l for l in lines] # return num*' '+('\n'+num*' ').join(lines) return '\n'.join(lines)+'\n' class Blockquote(Node): def __init__(self, children=[]): super(Blockquote, self).__init__(children=children) self.nodeStr='Blockquote' @classmethod def fromDom(self, node): b=Blockquote() b.value=node.nodeValue b.children=[dom2Node(c) for c in node.childNodes] return b def rst(self): return indent(super(Blockquote, self).rst()) class Title(Node): def __init__(self, children=[]): super(Title, self).__init__(children=children) self.nodeStr='Title' @classmethod def fromDom(self, node): t=Title() # title elem hat keinen nodeValue # t.value=node.nodeValue t.children=[dom2Node(c) for c in node.childNodes] return t def rst(self): txt=self.children[0] underline=Text('\n'+'='*len(txt.value)) return Para(self.children+[underline]).rst() class Footnote(Node): def __init__(self, children=[]): super(Footnote, self).__init__(children=children) self.nodeStr='Footnote' @classmethod def fromDom(self, node): f=Footnote() f.value=node.nodeValue f.children=[dom2Node(c) for c in node.childNodes] return f C=""" f() und f_() sind wechselseitig rekursiv möglichst später noch den Fall, daß eine Footnote mehrere Paras als children hat """ def f_(nodelist): done=[] done2=[] for n in nodelist: if isinstance(n, Footnote): done.append(Text(' [#]_ ')) done2.append(Text('\n\n.. [#] ')) para=n.children[0] done2.append(para.children[0]) else: done.append(f(n)) return done+done2 def f(n): n.children=f_(n.children) return n class Text(Node): def __init__(self, value=None): super(Text, self).__init__(value) self.nodeStr='Text' @classmethod def fromDom(self, node): t=Text() t.value=node.nodeValue # Text hat keine children # t.children=[dom2Node(c, parent=t) for c in node.childNodes] return t def run(): root=doc.documentElement elems=Evaluate('/*', root) # nur die section for e in elems: n=Para.fromDom(e) n.rmWs() n=f(n) s.write(n.rst()) print s.getvalue() C=""" just some debugging: sect=doc.childNodes[0] para=sect.childNodes[3] s=Para.fromDom(sect) p=Para.fromDom(para) p.rmWs() """ if __name__=='__main__': init(sys.argv[1]) run() On Mon, Aug 13, 2007 at 08:31:12AM +0100, Stephen Eastham wrote: > Hi, > > (1) I would like to convert docbook source files into > restructured text. Can anyone suggest a way to do > this? > > (2) Could someone please explain to me how to tranform > restructured text into docbook. > > I am currently working in a windows envt. plus cygwin > if required, but I am open to all suggestions > regardless of platform. > > Many thanks, > > Stephen > > > ___________________________________________________________ > Yahoo! Answers - Got a question? Someone out there knows the answer. Try it > now. > http://uk.answers.yahoo.com/ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Docutils-develop mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-develop > > Please use "Reply All" to reply to the list. > > > !DSPAM:46c00fda216428874772664! |
From: Andreas R. <reu...@we...> - 2007-08-16 22:48:27
|
dbk-> rst: actually, if I had to do it again, I would probably use lxml. -Andreas |
From: Lele G. <le...@na...> - 2007-08-20 08:03:14
|
At Mon, 13 Aug 2007 08:31:12 +0100 (BST), Stephen Eastham <ste...@ya...> wrote: > > Hi, > > (1) I would like to convert docbook source files into > restructured text. Can anyone suggest a way to do > this? Never tried this, but it may help: http://sophos.berkeley.edu/macfarlane/pandoc/ ciao, lele. |