|
From: Axel K. <ax...@co...> - 2002-09-20 10:18:01
|
hi all, i'm a contributor to drupal ( http://drupal.org ), a php content management/discussion engine. me and some others there quite like rST, and we are currently discussing ( http://drupal.org/node.php?id=3D507#649 , http://drupal.org/node.php?title=3DStructured+Text+-+filter+enhancement ) the use of (re)StructuredText for submitting (and maybe storing) content to the system. hence my questions: - does a php port of (re)StructuredText exists? - does a php port of something similar to (re)StructuredText exists (aside of http://www.keithdevens.com/software/ , StructuredText Markup)? - does any other language port of rST or similar exists? - as there is probably none of above: do you think it is possible to port rST to php? more specifically: . is there any general experience in porting python to php? . how much of / which parts of the docutils distribution would be required to port for basic functionality, i.e. for rendering html? how much for more functionality including xml-storage? . is there any developer documentation about the code structure / class hierarchies / ... beside http://docutils.sourceforge.net/#docutils-internals , the code and the devel-mailing list? looking forward for your feedback. tia. --=81 ax "Jeder will alt werden, aber keiner will es sein." (Martin Held, dt. Schauspieler) |
|
From: Dethe E. <de...@ma...> - 2002-09-20 15:41:54
|
I'd love to see this. Projects such as porting reST to PHP (or=20 whatever) are part of why I was asking about rigorous specs in the form of EBNF syntax, to=20= make it more portable. Also, if we parse directly to a DOM it makes reST more flexible and=20 easier to port, since a DOM binding exists for most languages. Many DOMs use either SAX or Expat to build the DOM itself, my idea would be to replace the low-level parser with reST. I understand that David doesn't want=20 to give up the simplicity of the node constructors for the verbosity of DOM=20 calls, but node.py could be reimplemented as convenience functions to make the DOM calls for you. Building up a true XML DOM internally has several advantages. More=20 potential developers would be familiar with the API than are currently=20 comfortable with the reST internals. Writers could be written in XSLT without knowing=20 anything about reST besides it's DTD. And my *other* project of converting=20 existing HTML and DocBook documents into reST for maintenance would be=20= that much easier! Even further off-topic, the docs mention that reST has constructs which=20= are missing from DocBook. What are they? --Dethe On Friday, September 20, 2002, at 03:17 AM, Axel Kollmorgen wrote: > hi all, > > i'm a contributor to drupal ( http://drupal.org ), a php content > management/discussion engine. me and some others there quite like rST, > and we are currently discussing ( = http://drupal.org/node.php?id=3D507#649 > , > http://drupal.org/node.php?title=3DStructured+Text+-+filter+enhancement = ) > the use of (re)StructuredText for submitting (and maybe storing)=20 > content > to the system. hence my questions: > > - does a php port of (re)StructuredText exists? > - does a php port of something similar to (re)StructuredText exists > (aside of http://www.keithdevens.com/software/ , StructuredText=20 > Markup)? > - does any other language port of rST or similar exists? > - as there is probably none of above: do you think it is possible to > port rST to php? more specifically: > . is there any general experience in porting python to php? > . how much of / which parts of the docutils distribution would be > required to port for basic functionality, i.e. for rendering html? how > much for more functionality including xml-storage? > . is there any developer documentation about the code structure / > class hierarchies / ... beside > http://docutils.sourceforge.net/#docutils-internals , the code and the > devel-mailing list? > > looking forward for your feedback. tia. > > --=99 > ax > > "Jeder will alt werden, aber keiner will es sein." (Martin Held, dt. > Schauspieler) > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Docutils-develop mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-develop |
|
From: David G. <go...@us...> - 2002-09-21 02:51:42
|
Dethe Elza wrote: > Also, if we parse directly to a DOM it makes reST more flexible and > easier to port, since a DOM binding exists for most languages. Many > DOMs use either SAX or Expat to build the DOM itself, my idea would > be to replace the low-level parser with reST. I don't see how that would improve flexibility. The parser can already build a real DOM tree; just call ``document.asdom()``. What benefits would a DOM approach provide? I'm not being defensive; I'd really like to know. If there is a benefit that outweighs the cost, it should be explored. > Building up a true XML DOM internally has several advantages. More > potential developers would be familiar with the API than are > currently comfortable with the reST internals. I don't think the document tree is the bottleneck. Rather, I think it's the complexity of the parser. Unfortunately, parsing reStructuredText *is* complex, because it has to grok two-dimensional patterns that humans understand implicitly. It's the curse of user-friendliness. ;-) > Writers could be written in XSLT without knowing anything about reST > besides it's DTD. This can already be done: just use ``document.asdom()`` then run that through the XSLT engine. The reason we don't go that route is because there is no XSLT engine in core Python. If PyXML is ever incorporated into the core, we can re-examine that decision. > And my *other* project of converting existing HTML and DocBook > documents into reST for maintenance would be that much easier! I don't follow this at all. Can you elaborate? > Even further off-topic, the docs mention that reST has constructs > which are missing from DocBook. What are they? There are plenty. Off the top of my head: field lists, option lists, decorations (headers & footers), doctest blocks, line blocks, transitions. None of these are difficult to render or approximate using regular DocBook elements, it's just that there's no one-to-one correspondence. Even in elements where there *is* a strong correspondence, some are not completely compatible, such as definition lists. It is the goal of http://docutils.sf.net/spec/doctree.html to document all of this; any assistance would be gratefully accepted and much appreciated. The Docutils document model was designed by me (with much input, of course), as it makes sense to me. I've had some experience with various models, including DocBook and TEI, and I've designed several DTDs before. Every document designer has different sensibilities, so differences and incompatibilies are inevitable. For example, I know of no DTD that has the equivalent of a "transition" element, although they're quite common in novels and articles. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Dethe E. <de...@ma...> - 2002-09-21 06:34:09
|
On Friday, September 20, 2002, at 07:55 PM, David Goodger wrote: > Dethe Elza wrote: >> Also, if we parse directly to a DOM it makes reST more flexible and >> easier to port, since a DOM binding exists for most languages. Many >> DOMs use either SAX or Expat to build the DOM itself, my idea would >> be to replace the low-level parser with reST. > > I don't see how that would improve flexibility. The parser can > already build a real DOM tree; just call ``document.asdom()``. What > benefits would a DOM approach provide? I'm not being defensive; I'd > really like to know. If there is a benefit that outweighs the cost, > it should be explored. I didn't know about asdom(), I'll have to explore that to see how expensive it is and which DOM implementation it uses. >> Building up a true XML DOM internally has several advantages. More >> potential developers would be familiar with the API than are >> currently comfortable with the reST internals. > > I don't think the document tree is the bottleneck. Rather, I think > it's the complexity of the parser. Unfortunately, parsing > reStructuredText *is* complex, because it has to grok two-dimensional > patterns that humans understand implicitly. It's the curse of > user-friendliness. ;-) Yes, that's certainly true, and one of the things I really like about reST is the effort it takes to make the document author's life easier. I still think that the internals could be simplified and that this would encourage more participation in the project. I've seen some complaints about the complexity of reST in toto, which I think could be addressed by modularizing reST, but that's another issue. I agree that the parser is the most complex component of reST, so if it focuses on the parser and reuses architecture from the python libraries for the rest of reST it may be easier to grok for a programmer coming to it fresh. >> Writers could be written in XSLT without knowing anything about reST >> besides it's DTD. > > This can already be done: just use ``document.asdom()`` then run that > through the XSLT engine. The reason we don't go that route is because > there is no XSLT engine in core Python. If PyXML is ever incorporated > into the core, we can re-examine that decision. I thought a version of PyXML was part of the core now, but not 4Suite. Besides, Optik is not part of core python, but it drastically simplifies the reST code, so it's included in docutils. >> And my *other* project of converting existing HTML and DocBook >> documents into reST for maintenance would be that much easier! > > I don't follow this at all. Can you elaborate? Sorry, that wasn't very clear. I want to think of the reST DOM as it's canonical form, so I can transform XHTML and DocBook to reST via XSLT. Ideally I also want a writer to create reST from the reST DOM. >> Even further off-topic, the docs mention that reST has constructs >> which are missing from DocBook. What are they? > > There are plenty. Off the top of my head: field lists, option lists, > decorations (headers & footers), doctest blocks, line blocks, > transitions. None of these are difficult to render or approximate > using regular DocBook elements, it's just that there's no one-to-one > correspondence. Even in elements where there *is* a strong > correspondence, some are not completely compatible, such as definition > lists. It is the goal of http://docutils.sf.net/spec/doctree.html to > document all of this; any assistance would be gratefully accepted and > much appreciated. Thanks, that's a good start. DocBook has added support for describing EBNF in documentation, as well as including modules for MathML and SVG, it is essentially a superset of XmlSpec, which is the *other* widely used XML documentation format (at least in the W3C). I just had a wild idea that instead of inventing a new XML DTD for internal structure, reST could use DocBook (or a subset of it) for it's DOM representation. Like I said, a wild idea. > The Docutils document model was designed by me (with much input, of > course), as it makes sense to me. I've had some experience with > various models, including DocBook and TEI, and I've designed several > DTDs before. Every document designer has different sensibilities, so > differences and incompatibilies are inevitable. For example, I know > of no DTD that has the equivalent of a "transition" element, although > they're quite common in novels and articles. <hr /> doesn't qualify? Thanks for the eloquent feedback. --Dethe |
|
From: David G. <go...@us...> - 2002-09-21 16:06:11
|
Dethe Elza wrote:
> I didn't know about asdom(), I'll have to explore that to see how
> expensive it is and which DOM implementation it uses.
It uses xml.dom.minidom, but it's intended to be able to use any DOM
implementation (it's parameterized). I haven't actually tried it with
any other DOM, so I don't know if it works or not. It's a case of
"don't add functionality until it's needed". So if it's needed, it
can be added. (By those who need it, of course!)
Please don't think of *any* part of the current Docutils
implementation as written in stone. It's all experimental, all
subject to change if we discover it's broken or deficient in some way.
That's what the "0." part of the "0.2" ("0.2.3", currently) version
number is meant to imply.
> I still think that the internals could be simplified and that this
> would encourage more participation in the project.
There are hairy parts of the parser code which could use refactoring,
true. You're working on one of them: the directive parser. In the
parser's core code (states.py), there's some duplicate code that could
be refactored. But that's low priority on my list at present. It
ain't broke (much), so it's not important (to me) to fix it (yet).
> I've seen some complaints about the complexity of reST in toto,
> which I think could be addressed by modularizing reST, but that's
> another issue.
It's just people mistaking richness for complexity, and the ubiquitous
"I don't need it, therefore it's superfluous" crap. It's mostly
bogus. We can modularize the parser on a case-by-case basis, but I
think a general modularization will only increase the perceived
complexity. Some modularization is already in place: the code behind
the "--pep-references" and "--rfc-references" options was ripped out
of the PEP reader, so that it could be used any time.
> I agree that the parser is the most complex component of reST, so if
> it focuses on the parser and reuses architecture from the python
> libraries for the rest of reST it may be easier to grok for a
> programmer coming to it fresh.
Ideally yes, but in practise it's too expensive. It's just too
convenient for a transform to say ``isinstance(node, Body)``, since
the element hierarchy is built in to nodes.py. This would be painful
if we were using a DOM tree.
> I thought a version of PyXML was part of the core now, but not
> 4Suite.
There's xml.dom (.minidom, .pulldom), xml.sax, and xml.parsers.expat.
That's it. Your build or distribution of Python may include it, but
it's not part of the core.
> Besides, Optik is not part of core python, but it drastically
> simplifies the reST code, so it's included in docutils.
Optik *will* be part of core Python, as of 2.3. And it's small enough
that we can include it with Docutils now. One of the goals is for
Docutils itself to be included in the core stdlib, so we can only use
modules already part of the stdlib or scheduled to be included. We
can't presume to get PyXML into the stdlib riding on Docutils'
coattails.
>>> And my *other* project of converting existing HTML and DocBook
>>> documents into reST for maintenance would be that much easier!
>>
>> I don't follow this at all. Can you elaborate?
>
> Sorry, that wasn't very clear. I want to think of the reST DOM as
> it's canonical form, so I can transform XHTML and DocBook to reST
> via XSLT. Ideally I also want a writer to create reST from the reST
> DOM.
That last sentence seems to me the key, and that's exactly what's
missing. Transforming canonical Docutils DOM to internal nodes.py
doctree should be reasonably easy, but without the reStructuredText
writer, it's practically useless. If I were writing a converter from
XHTML or DocBook to reStructuredText, I would do it in a quick & dirty
way, completely independently of Docutils (as I've described before).
In addition, there's a lot going on behind the scenes that the
Docutils DTD doesn't expose. Try running ``html.py --dump-internals
input.txt ...`` to see what I mean. ("--dump-internals" is an
internal, hidden option, for debugging.)
> Thanks, that's a good start. DocBook has added support for
> describing EBNF in documentation, as well as including modules for
> MathML and SVG, it is essentially a superset of XmlSpec, which is
> the *other* widely used XML documentation format (at least in the
> W3C). I just had a wild idea that instead of inventing a new XML
> DTD for internal structure, reST could use DocBook (or a subset of
> it) for it's DOM representation. Like I said, a wild idea.
I looked at DocBook, XMLSpec, TEI, and HTML (and I've worked
extensively with DocBook and TEI in the past), but none of them fit.
I've borrowed ideas from some of them. It was easier to start fresh
than try to shoehorn an existing system into use in Docutils.
>> For example, I know of no DTD that has the equivalent of a
>> "transition" element, although they're quite common in novels and
>> articles.
>
> <hr /> doesn't qualify?
That's true, that's the one example I do know of. But HTML doesn't
really count; I think of it as an output format, not as descriptive
markup. I'm still puzzled why there's no transition equivalent in
TEI, which is a DTD for publishers! They probably just use an empty
paragraph or some other kludge. I don't know of a decent/standard DTD
for novels out there.
> Thanks for the eloquent feedback.
Any time. Thank *you* for the penetrating questions.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Dethe E. <de...@ma...> - 2002-09-23 04:19:55
|
>> I didn't know about asdom(), I'll have to explore that to see how
>> expensive it is and which DOM implementation it uses.
>
> It uses xml.dom.minidom, but it's intended to be able to use any DOM
> implementation (it's parameterized).
Excellent. But reST actually converts to the DOM when you call
asdom(), right? So it's a potentially expensive operation.
> Please don't think of *any* part of the current Docutils
> implementation as written in stone. It's all experimental, all
> subject to change if we discover it's broken or deficient in some way.
> That's what the "0." part of the "0.2" ("0.2.3", currently) version
> number is meant to imply.
Point taken, and I'm glad we're able to have these discussions in that
light.
>> I still think that the internals could be simplified and that this
>> would encourage more participation in the project.
>
> There are hairy parts of the parser code which could use refactoring,
> true. You're working on one of them: the directive parser. In the
> parser's core code (states.py), there's some duplicate code that could
> be refactored. But that's low priority on my list at present. It
> ain't broke (much), so it's not important (to me) to fix it (yet).
No, of course not. Some things jump out at me because I'm new to the
project.
If I find they're important enough to me, I may jump in and fix them.
>> I agree that the parser is the most complex component of reST, so if
>> it focuses on the parser and reuses architecture from the python
>> libraries for the rest of reST it may be easier to grok for a
>> programmer coming to it fresh.
>
> Ideally yes, but in practise it's too expensive. It's just too
> convenient for a transform to say ``isinstance(node, Body)``, since
> the element hierarchy is built in to nodes.py. This would be painful
> if we were using a DOM tree.
Why is node.nodeName == 'Body' more expensive than isinstance(node,
Body)?
If it's because of a string compare rather than a pointer compare, then
the strings can be interned and it's a pointer comparison again. Or is
there some other reason?
>> Ideally I also want a writer to create reST from the reST
>> DOM.
>
> That last sentence seems to me the key, and that's exactly what's
> missing. Transforming canonical Docutils DOM to internal nodes.py
> doctree should be reasonably easy, but without the reStructuredText
> writer, it's practically useless. If I were writing a converter from
> XHTML or DocBook to reStructuredText, I would do it in a quick & dirty
> way, completely independently of Docutils (as I've described before).
Yes, and that's what I've done so far. But once I'm done with the
include
directives I want to take a look at a reST DOM -> reST text writer to
get a
feel for how tricky it will be.
> In addition, there's a lot going on behind the scenes that the
> Docutils DTD doesn't expose. Try running ``html.py --dump-internals
> input.txt ...`` to see what I mean. ("--dump-internals" is an
> internal, hidden option, for debugging.)
OK, but without knowing more about *why* that is, it looks like a bug
to me.
It's Tim Peter's koan, "explicit is better than implicit." Once we
start getting too much magic going on under the covers, python starts
veering towards perl. IMHO that's one
of the major problems in Zope, and makes work in Zope, beyond the very
trivial, much
harder to do and to understand than it would otherwise be.
> I don't know of a decent/standard DTD for novels out there.
Hah! That's perhaps the reason I like reST so much. Right now I'm
using it for technical documentation at work, but the reason I got into
the work I'm in is because I was so frustrated with word processors and
proprietary formats that I've put my creative writing on hold for
several years to work on tools.
One of my upcoming projects is to pull one of my unfinished novels out
and serialize it on my weblog using reST. I really like the way I can
more or less forget about the tool and focus on writing--it's almost as
good as a typewriter that way!
Of course, a typewriter never gives you compile errors. That's a major
problem with XML and currently with reST. I have little hope of seeing
it solved in XML, but I think we can and should make an effort in reST
to make errors rarer, clearer, and more easily found/fixed. A casual
user should never have to see a python stack trace, for instance.
Fortunately, what's there is rich enough and stable enough to think
about things like catching all the exceptions. And it may only be that
I see a lot of exceptions because I'm working on a) very long and
complex documents, and b) actively changing the guts of the system.
TTFN
--Dethe
|
|
From: David G. <go...@us...> - 2002-09-23 04:59:48
|
Dethe Elza wrote:
> But reST actually converts to the DOM when you call asdom(), right?
Yes; what else should it do?
> So it's a potentially expensive operation.
Theoretically it's O(n), and you should only have to do it once.
>> Please don't think of *any* part of the current Docutils
>> implementation as written in stone. It's all experimental, all
>> subject to change if we discover it's broken or deficient in some way.
>> That's what the "0." part of the "0.2" ("0.2.3", currently) version
>> number is meant to imply.
>
> Point taken, and I'm glad we're able to have these discussions in that
> light.
I've found that recognizing that all code is disposable has been a great
liberator. Someone said that any decent system needs to be written three
times: once to discover the issues, again to discover the *real* issues by
trying to solve the initial ones, and a third time to write an elegant,
general solution that really works.
> Some things jump out at me because I'm new to the project. If I find they're
> important enough to me, I may jump in and fix them.
Please feel free!
> Why is node.nodeName == 'Body' more expensive than isinstance(node, Body)? If
> it's because of a string compare rather than a pointer compare, then the
> strings can be interned and it's a pointer comparison again. Or is there some
> other reason?
I neglected to point out that ``nodes.Body`` is an abstract superclass of
*many* concrete node classes. There are many such "element category"
superclasses, roughly corresponding to those listed in
http://docutils.sf.net/spec/doctree.html#element-hierarchy. The difference
is between ``isinstance(node, nodes.Body)`` and ``node.tagName in
('paragraph', 'bullet_list', 'enumerated_list', 'definition_list', ...)``.
Multiply by the maintenance required every time a new element class is
added, and the difference should be clear.
> But once I'm done with the include directives I want to take a look at a reST
> DOM -> reST text writer to get a feel for how tricky it will be.
One thing to remember: it's not a "reST DOM", it's a "Docutils DOM" (or
"Docutils doc tree", as I usually refer to it to avoid misinterpretation).
The doc tree (internal document representation) is independent of
reStructuredText.
>> In addition, there's a lot going on behind the scenes that the
>> Docutils DTD doesn't expose. Try running ``html.py --dump-internals
>> input.txt ...`` to see what I mean. ("--dump-internals" is an
>> internal, hidden option, for debugging.)
>
> OK, but without knowing more about *why* that is, it looks like a bug
> to me.
It's mostly bookkeeping. In the source text, we may see::
Here is a reference_ to a web site.
.. _reference: http://www.example.com/
Once parsed, the target URL has to be moved over to the "reference" text.
The internal data structures keep track of stuff like this. Any browser or
other "user agent" has to do the same: create an internal database of
references and targets in order to make hyperlinks work. That internal
database is merely an implementation detail.
> It's Tim Peter's koan, "explicit is better than implicit." Once we start
> getting too much magic going on under the covers, python starts veering
> towards perl. IMHO that's one of the major problems in Zope, and makes work
> in Zope, beyond the very trivial, much harder to do and to understand than it
> would otherwise be.
I don't think that's a valid comparison, but perhaps we're talking at
cross-purposes here. There's nothing wrong with not exposing implementation
details where they're not relevant, such as in the output XML. However,
these details do need more documentation. I've tried to document them in
docstrings, but until there's a docstring extraction system in place you
have to read the source.
> One of my upcoming projects is to pull one of my unfinished novels out
> and serialize it on my weblog using reST. I really like the way I can
> more or less forget about the tool and focus on writing--it's almost as
> good as a typewriter that way!
I had occasion on Friday to write a document unrelated to Docutils. I used
reStructuredText in Emacs, and the text just flowed. I was quite pleased by
how easy it was to write the words *and* markup out without having the
markup get in the way. It's good to hear that I'm not the only one. I'm
biased, so I can't trust my own good experiences without corroboration.
> Of course, a typewriter never gives you compile errors. That's a major
> problem with XML and currently with reST. I have little hope of seeing
> it solved in XML, but I think we can and should make an effort in reST
> to make errors rarer, clearer, and more easily found/fixed.
I agree completely. In fact, I'm currently working on improving the
Reporter system to always report line numbers, and to report to stderr in
the GNU Tools format: "file:lineno: message". Should be checked in soon.
> A casual user should never have to see a python stack trace, for instance.
Again, agreed. The only time a stack trace should occur at present is with
a "SEVERE" (level-4) system message. I suppose even that ought to be
suppressed though; just output the system message plus a line saying
"Processing stopped due to the problems reported above" or some such.
If there are any other stack traces happening, they're bugs and should be
reported and squashed. I/O error handling could use some improvement; if
you specify a nonexistent file, you'll see a stack trace.
Another consequence of a "0.x" version: a bit of roughness around the edges.
> Fortunately, what's there is rich enough and stable enough to think
> about things like catching all the exceptions. And it may only be that
> I see a lot of exceptions because I'm working on a) very long and
> complex documents, and b) actively changing the guts of the system.
(b) I can understand, but (a) shouldn't cause any stack traces (except
perhaps a MemoryError if the document is *that* long!).
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Richard J. <rj...@ek...> - 2002-09-20 23:51:11
|
On Fri, 20 Sep 2002 8:17 pm, Axel Kollmorgen wrote: > i'm a contributor to drupal ( http://drupal.org ), a php content > management/discussion engine. me and some others there quite like rST, > and we are currently discussing ( http://drupal.org/node.php?id=507#649 > , > http://drupal.org/node.php?title=Structured+Text+-+filter+enhancement ) > the use of (re)StructuredText for submitting (and maybe storing) content > to the system. hence my questions: Sounds like a fine idea. > - does a php port of (re)StructuredText exists? No. > - does a php port of something similar to (re)StructuredText exists > (aside of http://www.keithdevens.com/software/ , StructuredText Markup)? *shrug* > - does any other language port of rST or similar exists? No. There should be :) > - as there is probably none of above: do you think it is possible to > port rST to php? more specifically: > . is there any general experience in porting python to php? I'm sure there's some out there, you just need to find it :) Seriously, I've never seen any conversion information between the two. I've seen some skin-deep comparisons, and they look similar - on the surface, PHP is Python with added punctuation ... "and bugs" according to our PHP developer :) > . how much of / which parts of the docutils distribution would be > required to port for basic functionality, i.e. for rendering html? how > much for more functionality including xml-storage? Fortunately, the design of the docutils project is quite clean and organised, and you can see how it all works at: http://docutils.sourceforge.net/#specification See PEP 258 for a description of the processing framework itself. That works along the basic lines of: 1. read from a source document, parsing the structure out of it and creating a DOM tree (http://docutils.sourceforge.net/spec/doctree.html) 2. optionally do some transforms on the tree (NOOP for current ReST reader) 3. pass the DOM to a writer, in this case HTML 4. perform any HTML-specific DOM transforms (insert system messages, turn references into hyperlinks, handle footnotes, ...) 5. turn the DOM into HTML > . is there any developer documentation about the code structure / > class hierarchies / ... beside > http://docutils.sourceforge.net/#docutils-internals , the code and the > devel-mailing list? See the above specification link. At a minimum, you could just implement a parser for ReST and your own framework for generating the HTML. If that's the case, you just need the ReST format specification: http://docutils.sourceforge.net/spec/rst/reStructuredText.html Hope this helps :) Richard |
|
From: David G. <go...@us...> - 2002-09-21 02:55:36
|
Richard Jones wrote: > See PEP 258 for a description of the processing framework itself. > That works along the basic lines of: > > 1. read from a source document, parsing the structure out of it and > creating a DOM tree > (http://docutils.sourceforge.net/spec/doctree.html) > 2. optionally do some transforms on the tree (NOOP for current ReST > reader) Not precisely true. In fact, there are several transforms that are done before the doc tree is handed off to the writer: doc title, docinfo, references, footnotes, substitutions, etc. However, please note that the present method of applying transforms is due for an overhaul. I'm thinking of moving the responsibility for running the transforms away from the Reader & Writer and into a new intermediate class, perhaps just a super-Transform. Readers, Writers, and other objects will register the transforms they want applied, into a list sorted by priority. The transforms will be applied to the doc tree in order. Transforms may insert further transforms into the list dynamically (which would be re-sorted). Transforms that are now associated with the Reader will simply have higher ("to be run earlier") priority. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Aahz <aa...@py...> - 2002-09-21 14:26:44
|
On Fri, Sep 20, 2002, David Goodger wrote: > Richard Jones wrote: >> >> See PEP 258 for a description of the processing framework itself. >> That works along the basic lines of: >> >> 1. read from a source document, parsing the structure out of it and >> creating a DOM tree >> (http://docutils.sourceforge.net/spec/doctree.html) >> 2. optionally do some transforms on the tree (NOOP for current ReST >> reader) > > Not precisely true. In fact, there are several transforms that are > done before the doc tree is handed off to the writer: doc title, > docinfo, references, footnotes, substitutions, etc. > > However, please note that the present method of applying transforms is > due for an overhaul. I'm thinking of moving the responsibility for > running the transforms away from the Reader & Writer and into a new > intermediate class, perhaps just a super-Transform. Readers, Writers, > and other objects will register the transforms they want applied, into > a list sorted by priority. The transforms will be applied to the doc > tree in order. Transforms may insert further transforms into the list > dynamically (which would be re-sorted). Transforms that are now > associated with the Reader will simply have higher ("to be run > earlier") priority. !!! Ah, now the lightbulb strikes about why I was having such difficulty understanding things earlier -- I kept looking for the transformer and couldn't find it. Is there any place all this stuff is documented like this? (Now I understand how better how my index entries need to be handled.) -- Aahz (aa...@py...) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ |
|
From: David G. <go...@us...> - 2002-09-21 16:06:26
|
Aahz wrote: > !!! Ah, now the lightbulb strikes about why I was having such difficulty > understanding things earlier -- I kept looking for the transformer and > couldn't find it. Is there any place all this stuff is documented like > this? PEP 258 has a high-level overview of the Docutils design. The diagram shows that the transforms are run by the Reader and Writer, and the text reinforces this. My idea is to disconnect the transforms from the Reader & Writer, add a new class in-between them (perhaps "Transformer"), and hang the transforms off of that. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: <pf...@ar...> - 2002-09-21 15:31:41
|
Hi, [...] > > - as there is probably none of above: do you think it is possible to > > port rST to php? more specifically: > > . is there any general experience in porting python to php? > > I'm sure there's some out there, you just need to find it :) Seriously, I've > never seen any conversion information between the two. I've seen some > skin-deep comparisons, and they look similar - on the surface, PHP is Python > with added punctuation ... "and bugs" according to our PHP developer :) [...] I've only heard of projects working on the opposite direction. For example at http://www.zope.org/Members/mjablonski/PHParser you can find a tool developed to port existing PHP scripts to Zope. Personally I have never used any PHP. But from everybody I've heard of and who knows the languages PHP or Perl and Python, they have started to port or rewrite their PHP or Perl stuff from PHP or Perl to Python. Most web servers nowawdays run on Linux and Linux systems usually come with Python preinstalled (the major distributor Red Hat uses Python for its installer 'anaconda' and the future Linux kernel configuration manager is also written in Python). Maybe it would be possible for you to embed the fine Python implementation of reST written by David Goodger into your PHP application with some glue code? Porting Davids implementation to another language looks like a lot of work. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) |
|
From: David G. <go...@us...> - 2002-09-21 02:50:41
|
Axel, Richard Jones gave the answers I would have given to your questions (thanks Richard!). I have a question for you. Please note, I know nothing about PHP. Before you invest the considerable time and effort [*]_ into porting the reStructuredText parser (and relevant parts of the rest of Docutils), can't you simply install Python and Docutils in parallel and call it from your PHP code? If that proved to be a great success, then the porting effort may prove worthwhile. If you do choose to port it, I wish you success. I'll be happy to help in any way I can. I really mean that, not least because most requests for assistance inevitably make their way back into the Docutils code or docs. .. [*] At least you'll have the benefit of working from an existing implementation with a mature spec and the bugs worked out! -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |