On Tue, Jun 11, 2013 at 3:41 PM, David Goodger <goodger@python.org> wrote:
On Tue, Jun 11, 2013 at 5:11 PM, eliben <eliben@gmail.com> wrote:
> David Goodger <goodger <at> python.org> writes:
>
>>
>> On Sun, Jun 2, 2013 at 5:40 AM, Stefan Merten <smerten <at> oekonux.de>
> wrote:
>> > Hi!
>> >
>> > 3 days ago Guenter Milde wrote:
>> >> On 2013-05-29, eliben wrote:
>> >>> Hello,
>> >>
>> >>> I want to read ReST into a representation I can manipulate
> programmatically
>> >>> (say with publish_doctree) and then output the file back in ReST
> format.
>> >>> Does docutils have a "ReST writer" or some similar interface to
> achieve that?
>> >>
>> >> Unfortunately not (yet).
>> >> This is a long standing TODO issue. Some work has been done in the
>> >> "lossless ..." branch of the SVN repository.
>> >
>> > Well, indeed this question comes up once in a while. Unfortunately the
>> > wrong answer is given all the time. Guys, I understand you don't like
>> > the way this problem has been solved but IMHO answering plain "no" to
>> > people looking for a solution is really exaggerating your dislike.
>>
>> Whoa, you're casting aspersions. To my knowledge there's no dislike
>> here, just ignorance. In my case, I have no "dislike" for your
>> approach, it simply wasn't on my radar. I've undoubtedly heard of it
>> before, but haven't used it myself (I have never needed it), so I
>> didn't think to mention it because I didn't *remember* it. The problem
>> here is one of awareness. I, and probably GŁnter, and many others,
>> were not actively aware of the existence of xml2rst.
>
> Out of curiosity - why wasn't this needed before? As the "source code" for
> documentation, it makes sense to me to see tools that transform ReST
> programmatically.

I can only answer for myself. Why wasn't reST -> reST processing
needed before? Just because I never needed it. The only reST -> reST
transformations I ever needed were provided by Emacs and its
reStructuredText mode (to which I made only small contributions). I
treat reST the same as Python code, as source, which I edit manually.
There are source-code manipulation systems out there (e.g. refactoring
tools like Bicycle Repair Man and, I believe, features of Eclipse),
but I have never needed or used them.

Any time I have faced a situation where large-scale reST source
reworking was needed, I wrote minimal special-purpose tools to do the
job. But these tools didn't "understand" reST much at all; they just
did fancy search-and-replace.

Do you really need reST output, or would storing the internal document
tree (doctree) for later processing be sufficient? You can do that
with the docutils.core.publish_doctree & .publish_from_doctree
functions (along with pickle or equivalent)

I really do need reST output. My case is probably typical of recent usage of reST (which, it seems to me, is starting to "suffer" from its popularity). The document in question is meant to be placed into a larger output-generation system based on Sphinx. So I really want to just have a reST file eventually, which Sphinx will transform to HTML for me.

The "minimal special-purpose tools" you mention above are exactly what I had to use eventually, but it made me wonder whether serializability of doctrees into reST would make such manipulations easier; certainly it would reduce a lot of boilerplate. Hence my question about the information lost during parsing.
> Is there, to the best of your knowledge, any technical reason why the
> internal "tree" that was parsed from input ReST can't be emitted back to
> ReST? Is there any semantically-important information that gets lost in the
> parsing process?

For most ordinary constructs, no and no. But for many directives,
information does get lost. For example, the "list-table" directive
produces a table in the doctree with no indication that it came from a
directive. Anything producing reST would have to choose which form of
table to emit. The "include" directive is another example. By
carefully examining the doctree you might be able to infer than
another file was included at a certain point, but it's not explicit.
And I don't guarantee that you'll always be able to do this. Such
information was only included in the doctree for debugging and
error-reporting purposes; I never considered reST -> reST processing.

This is a shame. Lost information means that serialization back to reST is difficult, as you say.

If this goes against the design goals of reST & docutils, I guess I'll have to stick with the custom hacking for now.

Thanks for the answers,

Eli