From: David G. <go...@py...> - 2005-06-20 23:11:33
Attachments:
signature.asc
|
[fel...@us...] > Modified: trunk/docutils/HISTORY.txt > =================================================================== ... > + - Specifying either ``--stylesheet`` or ``--stylesheet-path`` is > + mandatory now. "--stylesheet" is an *option*. That means it is *optional*. There is no such thing as a "mandatory option", it's an oxymoron. If we want the stylesheet to be mandatory, it should be implemented as a positional parameter. If left as an option, we can specify a reasonable default (e.g., "default.css"); combined with a default embed_stylesheet=1, it will cause an IOError exception if no such stylesheet exists. Please revert or modify this & related changes. We can *warn* the user about this (using the warnings module), but raising an exception is too much. -- David Goodger <http://python.net/~goodger> |
From: David G. <go...@py...> - 2005-06-27 22:17:44
Attachments:
signature.asc
|
[Felix Wiemann] > Do you think we can do it that way? We can, but we should not. It breaks the model. A writer-specific transform is the correct way to do it. -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2005-06-27 22:33:42
|
David Goodger wrote: > Felix Wiemann wrote: > >> [Run universal.Messages transform from within a writer.] >> Do you think we can do it that way? > > We can, but we should not. It breaks the model. Valid point; then let's not insert the warning into the document at all. > A writer-specific transform is the correct way to do it. That's too much code, really. And since the writer-specific transform would need to be applied *before* universal.Messages, you could no longer intercept the document tree between parsing and writing (as does publish_doctree) because the transforms are intermixed. We need some way to separate reader/parser, default, and writer transforms. But let's defer that issue and get back to it when we actually have a writer-specific transform. -- Felix Wiemann -- http://www.ososo.de/ |
From: David G. <go...@py...> - 2005-06-28 13:53:37
Attachments:
signature.asc
|
>> Felix Wiemann wrote: >>> [Run universal.Messages transform from within a writer.] >>> Do you think we can do it that way? > David Goodger wrote: >> We can, but we should not. It breaks the model. [Felix Wiemann] > Valid point; then let's not insert the warning into the document at > all. You miss my point. Transforms may not be *applied* by a Writer, or by any other component other than the Transformer. But transforms can be and are *specified* by any component, including a Writer. The Transformer collects transform specs from the components when the Publisher is assembled. >> A writer-specific transform is the correct way to do it. > > That's too much code, really. Not much at all, really. See revision 3617. Most of the change is support for the new transform, whose "apply" method is all of 3 logical lines long (8 physical lines). > And since the writer-specific transform would need to be applied > *before* universal.Messages, you could no longer intercept the > document tree between parsing and writing (as does publish_doctree) > because the transforms are intermixed. Not true. Transforms from all components (reader, parser, writer) are added to the Transformer's queue when the Publisher is assembled. > We need some way to separate reader/parser, default, and writer > transforms. Why do we have to separate them? We already have a way to determine the order of transform application: the default_priority attribute. > But let's defer that issue and get back to it when we actually have > a writer-specific transform. We have one now. -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2005-06-28 16:37:07
|
David Goodger wrote: > You miss my point. Transforms may not be *applied* by a Writer, or by > any other component other than the Transformer. But transforms can be > and are *specified* by any component, including a Writer. Yes, I understood that. :-) >> And since the writer-specific transform would need to be applied >> *before* universal.Messages, you could no longer intercept the >> document tree between parsing and writing (as does publish_doctree) >> because the transforms are intermixed. > > Not true. Transforms from all components (reader, parser, writer) are > added to the Transformer's queue when the Publisher is assembled. Yes, of course, but you can no longer do the following: 1. Read, parse, apply transforms of reader, parser, and default transforms. 2. Get the doctree and store it. 3. Write the doctree with *any* writer, applying writer-specific transforms beforehand. because with the newly added transform you would have to apply the writer-specific transforms in step 1 (before the default transform universal.Message); at this step the writer is not known yet, however. >> We need some way to separate reader/parser, default, and writer >> transforms. > > Why do we have to separate them? It's very convenient for 3rd-party applications to call publish_doctree, modify the doctree, and call publish_from_doctree. And it may be necessary to read and parse a document, store it, and write it out later. (E.g. for performance reasons.) That's no longer possible now in a clean manner. It just smells bad if there is no point *between* parsing and writing. Because with the new writer-transform, before applying the transforms the parsing isn't complete (parser-specific transforms haven't been applied yet) and after applying them the writing has already begun (because writer-specific transforms have already been applied). Parsing and writing should be *completely* separate, and that's only possible if we do not mix parser- and writer-specific transforms. As a solution, we could apply transforms in the following order: * Reader-/parser-specific transforms, Decorations, FinalChecks. * Writer-specific transforms, Messages and FilterMessages. So we would have the writer-specific transforms before Message and FilterMessages so that writer transforms can add system_messages, and we'd still have parsing and writing completely separate. What d'you think? -- For private mail please ensure that the header contains 'Felix Wiemann'. "the number of contributors [...] is strongly and inversely correlated with the number of hoops each project makes a contributing user go through." -- ESR |
From: Martin B. <mar...@gm...> - 2005-06-28 21:59:08
|
On 6/28/05, Felix Wiemann <Fel...@gm...> wrote: > David Goodger wrote: >=20 > > You miss my point. Transforms may not be *applied* by a Writer, or by > > any other component other than the Transformer. But transforms can be > > and are *specified* by any component, including a Writer. >=20 > Yes, I understood that. :-) >=20 > >> And since the writer-specific transform would need to be applied > >> *before* universal.Messages, you could no longer intercept the > >> document tree between parsing and writing (as does publish_doctree) > >> because the transforms are intermixed. > > > > Not true. Transforms from all components (reader, parser, writer) are > > added to the Transformer's queue when the Publisher is assembled. >=20 > Yes, of course, but you can no longer do the following: >=20 > 1. Read, parse, apply transforms of reader, parser, and default > transforms. >=20 > 2. Get the doctree and store it. >=20 > 3. Write the doctree with *any* writer, applying writer-specific > transforms beforehand. >=20 > because with the newly added transform you would have to apply the > writer-specific transforms in step 1 (before the default transform > universal.Message); at this step the writer is not known yet, however. well... yes and no. i'm not quite sure i fully understand the issue with the message transform. With the current design, the results may be different if you run it at once, or in separate steps as above. I'm not sure if they are, but they "may" be. Here is why. What's happening now is: publish_doctree: the transforms from (reader, parser, default, null writer) are used to produce the document tree. (in my system, i run additional transforms before storing, but that is irrelevant here, although i feel like it may become an issue depending on how this story ends. Mostly I extract information using these transforms, until now i just add class attributes to the doctree, specific to my application.) pubilsh_from_doctree: the transforms (default, chosen writer) are used to produce the output document. i suppose you can decide to run the messages transform in either of those steps, or both. essentially, what i'm saying is that doing these two steps separately *may* produce different results than a single transform run of transforms from (reader, parser, default, chosen writer). The differences will depend on interdependencies between the transforms and the order in which they are applied. i suppose it would not be too hard to check if it matters or not to separate the way i did. in other words, are the writer transforms always run after all the other transforms? > >> We need some way to separate reader/parser, default, and writer > >> transforms. > > > > Why do we have to separate them? >=20 > It's very convenient for 3rd-party applications to call publish_doctree, > modify the doctree, and call publish_from_doctree. And it may be > necessary to read and parse a document, store it, and write it out > later. (E.g. for performance reasons.) That's no longer possible now > in a clean manner. writing later can be fun too: you can run some transforms depending on the requested output context (or user settings, say, for a web app). btw, this could also be used to provide a new meaning for "print page" links: a PDF file could be generated on the fly and produced and returned to the user... > It just smells bad if there is no point *between* parsing and writing. > Because with the new writer-transform, before applying the transforms > the parsing isn't complete (parser-specific transforms haven't been > applied yet) and after applying them the writing has already begun > (because writer-specific transforms have already been applied). Parsing > and writing should be *completely* separate, and that's only possible if > we do not mix parser- and writer-specific transforms. it's possible now, but it's only possible in a predictable manner if you do the separation you mention (see explanation above). > As a solution, we could apply transforms in the following order: >=20 > * Reader-/parser-specific transforms, Decorations, FinalChecks. > * Writer-specific transforms, Messages and FilterMessages. >=20 > So we would have the writer-specific transforms before Message and > FilterMessages so that writer transforms can add system_messages, and > we'd still have parsing and writing completely separate. What d'you > think? conditional +1 : only if no-one can see a real or potential use of a transform from the writer and a transform from one of the other components that mingle in the way i mentioned above, which cannot be implemented in some other way. perso, i think that if there is indeed such a pair of transforms, their interdependency probably smells fishy. does anyone know of such interplay? and if so, could be change those transforms to not depend on each other, in order to perform the full separation? note that the separation that you mention above effectively put new constraints on the transforms: it will not be possible to do that interplay between a writer transform and an other transform. IMHO it is reasonable to add this restriction for the benefit of obtaining a usable doctree in between, therefore +1. if you know of cases where we need for the writer transform to run before the other transforms, plz speak up (i will try to check tonite when i can back to HQ.) cheers, |
From: David G. <go...@py...> - 2005-06-29 22:26:52
Attachments:
signature.asc
|
[Felix Wiemann] > you can no longer do the following: > > 1. Read, parse, apply transforms of reader, parser, and default > transforms. > > 2. Get the doctree and store it. > > 3. Write the doctree with *any* writer, applying writer-specific > transforms beforehand. > > because with the newly added transform you would have to apply the > writer-specific transforms in step 1 (before the default transform > universal.Message); at this step the writer is not known yet, > however. On reprocessing the existing doctree, the (now known) writer transform is added to the Transformer. That takes care of step 3. The Transformer's default transforms are also added; all we have to do is make sure that no undesired transforms are there. I've added a "reprocess_transforms" attribute (a tuple of transforms) to the Transformer class, and use it to replace "default_transforms" in docutils.core.publish_from_doctree. It is the same as "default_transforms", but without the universal.Decorations transform, which probably shouldn't be applied twice. I think this solves the problem. In any case, I don't see any easy way out of this problem. The application programmer must determine which transforms should be applied during the first stage (reading/parsing) and which during the second (writing). Some transforms may be applied twice, some shouldn't be. They're not documented in that way now, since this situation has never come up before. >>> We need some way to separate reader/parser, default, and writer >>> transforms. >> >> Why do we have to separate them? > > It's very convenient for 3rd-party applications to call > publish_doctree, modify the doctree, and call publish_from_doctree. > And it may be necessary to read and parse a document, store it, and > write it out later. (E.g. for performance reasons.) That's no > longer possible now in a clean manner. It seems clean to me. Cleaner than what was there before, certainly. The Docutils model is Reader/Parser, Transformer, Writer -- always. Storing the doctree means using a "null" Writer. Reprocessing means using a "doctree" Reader and a "null" Parser. But both stages require the execution of the full Docutils model. Applications simply have to adjust their transform requirements to suit. > It just smells bad if there is no point *between* parsing and > writing. You're thinking about it the wrong way. There *is* a point between reading/parsing and writing, and it's the transforms. > Because with the new writer-transform, before applying the > transforms the parsing isn't complete Yes it is. > (parser-specific transforms haven't been applied yet) There are none. There are Reader-specific transforms though. > and after applying them the writing has already begun (because > writer-specific transforms have already been applied). Apply a writer-specific transform does not mean "the writing has already begun". Transforms are a separate stage of processing from reading, parsing, and writing. > Parsing and writing should be *completely* separate, and that's only > possible if we do not mix parser- and writer-specific transforms. Parsing and writing *are* separate. Why should their transforms have to be separated? ISTM that you need to adjust your thinking. > As a solution, we could apply transforms in the following order: > > * Reader-/parser-specific transforms, Decorations, FinalChecks. > * Writer-specific transforms, Messages and FilterMessages. > > So we would have the writer-specific transforms before Message and > FilterMessages so that writer transforms can add system_messages, > and we'd still have parsing and writing completely separate. What > d'you think? Take a look at the list of transforms (docs/ref/transforms.txt; should probably be in docs/dev). See if the order/separation above can be imposed. If it can, fine. The transform priorities are not written in stone; they work as they are, but they can be changed. The html.StylesheetCheck transform needs to be applied before universal.Messages, but I don't think there are any other dependencies (but my memory may be faulty). (The reasons for the transform priority order [interdependencies] ought to be documented explitly.) -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2005-07-07 13:32:21
|
David Goodger wrote: > I've added a "reprocess_transforms" attribute (a tuple of transforms) > to the Transformer class, and use it to replace "default_transforms" > in docutils.core.publish_from_doctree. OK, that's solving one half of the problem (because Decorations and FinalChecks aren't run a second time when writing out the document). Now we have to make sure that in publish_doctree Messages and FilterMessages are not yet executed. This is necessary because the first run of FilterMessages might filter out messages which wouldn't be filtered out in the second run, and applying "Messages" twice could create two "Docutils System Messages" sections. I separated the default transforms into two stages and added a ``stage`` parameter to the Publisher.publish method. That's also why I split the FinalChecks transform So now we can indeed process a document in two stages. The only problems that's left is that the writer-specific transforms are intermixed with reader-specific transforms, so that processing in two stages (with publish_[from_]doctree) may give different results from processing with publish_file or so. I consider the ability to hook into the document processing with publish_doctree and publish_from_doctree extremely valuable because it enables people to poke with the node tree without needing to understand much of the inner workings of the framework -- using the doctree publishing functions is extremely simple. This makes contributing (i.e. writing code that works with Docutils) very easy, so it's a Really Good Thing[tm]. Since it is such a valuable feature, I think we should support it in the cleanest way possible. Thus we should really try to separate the transforms. So here's my silver-bullet solution: ;-) Extend the transform priority space to 2000 and make it policy that reader/parser/stage1 transforms have priorities less than 1000 and that writer/stage2 transforms have priorities greater than 1000. So we could assign the following priorities: * html.StylesheetCheck: 1400 (or so) * universal.Messages: 1860 * universal.FilterMessages: 1870 * universal.TestMessages: 1880 All other priorities stay the same. That's all; besides the priority changes, no implementation is necessary. > Apply a writer-specific transform does not mean "the writing has > already begun". Transforms are a separate stage of processing from > reading, parsing, and writing. Yes, they are, but you cannot apply writer-specific transforms before the writer is known, which is not the case (and should not be the case!) when using publish_doctree. This is what I meant with reading/parsing and writing should be independent. >> So we would have the writer-specific transforms before Message and >> FilterMessages so that writer transforms can add system_messages, >> and we'd still have parsing and writing completely separate. > > Take a look at the list of transforms (docs/ref/transforms.txt; should > probably be in docs/dev). See if the order/separation above can be > imposed. OK, see above for my proposal. > The html.StylesheetCheck transform needs to be applied before > universal.Messages, but I don't think there are any other dependencies > (but my memory may be faulty). I think you're right. -- For private mail please ensure that the header contains 'Felix Wiemann'. "the number of contributors [...] is strongly and inversely correlated with the number of hoops each project makes a contributing user go through." -- ESR |
From: David G. <go...@py...> - 2005-08-11 00:09:43
Attachments:
signature.asc
|
[David Goodger] >> I've added a "reprocess_transforms" attribute (a tuple of >> transforms) to the Transformer class, and use it to replace >> "default_transforms" in docutils.core.publish_from_doctree. [Felix Wiemann] > OK, that's solving one half of the problem (because Decorations and > FinalChecks aren't run a second time when writing out the document). > > Now we have to make sure that in publish_doctree Messages and > FilterMessages are not yet executed. This is necessary because the > first run of FilterMessages might filter out messages which wouldn't > be filtered out in the second run, That's not an issue. It's up to the user or application to ensure that settings match between runs. Think of it this way: each pass filters out system messages according to its settings, resulting in a doctree as requested by the user and/or application. > and applying "Messages" twice could create two "Docutils System > Messages" sections. That transform could simply be modified to check for an existing section first, before creating a new one, eliminating the problem. > I separated the default transforms into two stages and added a > ``stage`` parameter to the Publisher.publish method. I've thought this over a lot, and I don't like it. It smells bad, like the tail is wagging the dog. It's an artificial state parameter being imposed on an otherwise clean system. There aren't two "stages" of processing; rather, the entire system is run twice, and the output of the first run is used as the imput of the second. -1 > That's also why I split the FinalChecks transform Which I still don't like. It was called "final" for a reason: to do a set of checks at the *end* of processing, after *all* the other transforms have been applied. We can't always separate transforms the way you want, and we shouldn't need to. However, the writer is the final consumer of the doctree, and is free to transform it as it likes. If there are any writer-specific transforms that can and must only be applied after all other transforms, the writer itself can apply them (but directly, not via the Transformer). The Transformer mechanism was not designed to be used as you seem to want/intend. The artificial split changes should be reverted. > So now we can indeed process a document in two stages. ISTM that we could do that before this addition. What was broken that this fixed? > The only problems that's left is that the writer-specific transforms > are intermixed with reader-specific transforms, That's a feature, not a problem. I'm serious. It's essential to the architecture of Docutils. If you don't get that, please don't modify the Transformer internals. > so that processing in two stages (with publish_[from_]doctree) may > give different results from processing with publish_file or so. "May", theoretically, but won't if properly used in practice. > I consider the ability to hook into the document processing with > publish_doctree and publish_from_doctree extremely valuable because > it enables people to poke with the node tree without needing to > understand much of the inner workings of the framework -- using the > doctree publishing functions is extremely simple. This makes > contributing (i.e. writing code that works with Docutils) very > easy, so it's a Really Good Thing[tm]. Agreed. But adding complexity to the Transformer mechanism is a Really Bad Thing. The framework as a whole should not suffer a kludge for the sake of a single use case. This artificial splitting of the transforms *is* a kludge. > Since it is such a valuable feature, I think we should support it in > the cleanest way possible. Thus we should really try to separate > the transforms. That's not clean, for the reasons given above. > So here's my silver-bullet solution: ;-) > > Extend the transform priority space to 2000 and make it policy that > reader/parser/stage1 transforms have priorities less than 1000 and > that writer/stage2 transforms have priorities greater than 1000. So > we could assign the following priorities: > > * html.StylesheetCheck: 1400 (or so) > * universal.Messages: 1860 > * universal.FilterMessages: 1870 > * universal.TestMessages: 1880 > > All other priorities stay the same. -1 >> Apply a writer-specific transform does not mean "the writing has >> already begun". Transforms are a separate stage of processing from >> reading, parsing, and writing. > > Yes, they are, but you cannot apply writer-specific transforms > before the writer is known, which is not the case (and should not be > the case!) when using publish_doctree. But the writer *is* known -- it's writers/null.py. > This is what I meant with reading/parsing and writing should be > independent. But they *are* independent, and always have been! Your goals can be achieved much more cleanly and just as easily, by fixing the Messages transform, and using the reprocess_transforms attribute (deleted in revision 3663 -- should be restored). ***** This is the kind of change that really ought to be put on a branch and discussed, before being committed to the trunk. It's a potentially controversial change, representing a real disconnect in our understanding of the underlying system. Let's chalk this up as a lesson learned. -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2005-08-14 14:12:36
|
David Goodger wrote: > Felix Wiemann wrote: > >> That's also why I split the FinalChecks transform > > Which I still don't like. It was called "final" for a reason: to do a > set of checks at the *end* of processing, after *all* the other > transforms have been applied. The order of application should be determined by priorities, not by names. So, "FinalChecks" is neither an appropriate nor a descriptive name: It doesn't describe what the transform does, only *when* it's done. (No, "Checks" doesn't make the name more descriptive because the FinalChecks transform didn't only contain checks. And after all, *what* checks?) >> The only problems that's left is that the writer-specific transforms >> are intermixed with reader-specific transforms, so that processing in >> two stages (with publish_[from_]doctree) may give different results >> from processing with publish_file or so. > > "May", theoretically, but won't if properly used in practice. No, it will give problems. E.g. when using publish_from_doctree and publish_doctree, a document may end up with two "Docutils System Messages" sections. You may call this a minor issue, but it shows that there's a design-problem and that we will probably run into similar problems later. I understand and agree to your objections. Please suggest a better solution. (I don't think that making the Messages transform check for an existing "Docutils System Messages" section is a solution -- it doesn't solve the general problem, only the specific issue at hand.) > This is the kind of change that really ought to be put on a branch and > discussed, before being committed to the trunk. Agreed. -- For private mail please ensure that the header contains 'Felix Wiemann'. "the number of contributors [...] is strongly and inversely correlated with the number of hoops each project makes a contributing user go through." -- ESR |
From: David G. <go...@py...> - 2005-09-09 12:56:44
Attachments:
signature.asc
|
[Felix Wiemann] >>> That's also why I split the FinalChecks transform [David Goodger] >> Which I still don't like. It was called "final" for a reason: to >> do a set of checks at the *end* of processing, after *all* the >> other transforms have been applied. [Felix Wiemann] > The order of application should be determined by priorities, not by > names. It is. Names are simply chosen to reflect the functionality. > So, "FinalChecks" is neither an appropriate nor a descriptive name: > It doesn't describe what the transform does, only *when* it's done. > (No, "Checks" doesn't make the name more descriptive because the > FinalChecks transform didn't only contain checks. And after all, > *what* checks?) It originally contained just checks, but other things were added, because it was intended to be "final", to be run last. The "Final" part of the name was intended to correspond to its position in the priority order. Sometimes you split hairs in a most annoying way ;-) >>> The only problems that's left is that the writer-specific >>> transforms are intermixed with reader-specific transforms, so that >>> processing in two stages (with publish_[from_]doctree) may give >>> different results from processing with publish_file or so. >> >> "May", theoretically, but won't if properly used in practice. > > No, it will give problems. E.g. when using publish_from_doctree and > publish_doctree, a document may end up with two "Docutils System > Messages" sections. Not if that is fixed properly. > You may call this a minor issue, but it shows that there's a > design-problem and that we will probably run into similar problems > later. I don't see a design problem, I see a problem in the understanding and interpretation of the design. There may be some issues with the details, such as the specific set of transforms specified by components, but that's solvable without drastically changing the design. > I understand and agree to your objections. Please suggest a better > solution. The solution is as follows: stop thinking about two-stage processing. Every run of Docutils involves a full set of components: Reader, Parser, Transformer (with transforms supplied by the other components), and Writer. Even in the case of publish_doctree, there *is* a Writer: docutils.writers.null. In publish_from_doctree, the docutils.parsers.null Parser is used. When processing to and from a doctree, Docutils is simply run twice. > (I don't think that making the Messages transform check for an > existing "Docutils System Messages" section is a solution -- it > doesn't solve the general problem, only the specific issue at hand.) What is the general problem then? I only see the specific issue. >> This is the kind of change that really ought to be put on a branch >> and discussed, before being committed to the trunk. > > Agreed. Good: please revert your transform changes from the trunk, and put them in your "transform" branch. I have neither the time nor the will to check all the transform dependencies to make sure that rearranging them hasn't introduced some kind of problem. I already did that work when creating the transforms in the first place, and I don't want to do it again. The existing system works/worked fine, and I see no reason to change it. Again: please revert your transform changes (rev. 3659). -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2005-09-09 21:14:42
|
David Goodger wrote: > Felix Wiemann wrote: > >> [FinalChecks:] The order of application should be determined by >> priorities, not by names. > > It is. Names are simply chosen to reflect the functionality. Correct. Names are *not* chosen to reflect the priority. >> So, "FinalChecks" is neither an appropriate nor a descriptive name: >> It doesn't describe what the transform does, only *when* it's done. >> (No, "Checks" doesn't make the name more descriptive because the >> FinalChecks transform didn't only contain checks. And after all, >> *what* checks?) > > It originally contained just checks, but other things were added, So "FinalChecks" is not appropriate anymore. That's what I'm saying. > because it was intended to be "final", to be run last. The "Final" > part of the name was intended to correspond to its position in the > priority order. See above. That's a bad idea. > Sometimes you split hairs in a most annoying way ;-) I'm not splitting hairs. >>> This is the kind of change that really ought to be put on a branch >>> and discussed, before being committed to the trunk. >> >> Agreed. > > Good: please revert your transform changes [rev. 3659] from the trunk, > and put them in your "transform" branch. It causes too many conflicts and failing tests. That's wasted effort, and it would be a regression. Let's discuss this issue and then decide what to do, not blindly revert changes that happened two months ago. As a matter of fact, I was rather thinking of the two-stage-processing feature that should have been put on a branch, not the FinalChecks refactoring. I'm absolutely *not* going to branch for simple refactorings like this one, especially as long as we have review times over two weeks. > I have neither the time nor the will to check all the transform > dependencies to make sure that rearranging them hasn't introduced some > kind of problem. I don't think it has introduced a problem. > I already did that work [checking all dependencies] when creating the > transforms in the first place, and I don't want to do it again. A.k.a. "it's so poorly structured that I cannot easily check whether it correct, but it works at the moment, so won't you dare touch it because you might break it". That smells *extremely* bad. And it indicates that the code is in need of some refactoring. Renaming a reference checker from FinalChecks to DanglingReferences (compare: which name says more about what the transform actually does?) and changing its priority so it's executed together with the other reference-handling tranforms is clearly a step into the right direction. Embrace Change, please. -- For private mail please ensure that the header contains 'Felix Wiemann'. "the number of contributors [...] is strongly and inversely correlated with the number of hoops each project makes a contributing user go through." -- ESR |
From: David G. <go...@py...> - 2005-09-10 04:17:54
Attachments:
signature.asc
|
[Felix Wiemann] > So "FinalChecks" is not appropriate anymore. That's what I'm > saying. Fine. I agree. I have no issue with the name change. It's the *priority* change that I take issue with. >> Sometimes you split hairs in a most annoying way ;-) > > I'm not splitting hairs. Yes, you are. I'm talking about the priority of the transforms, and you keep harping on about the name. The idea I've been trying to get across, and that you've been obstinate about, is that the name, "Final", is a reflection of the priority. It reinforces the priority. It should have given you pause before you arbitrarily changed the priority. I don't know how much clearer I can make it. >> Good: please revert your transform changes [rev. 3659] from the >> trunk, and put them in your "transform" branch. > > It causes too many conflicts and failing tests. That's wasted > effort, and it would be a regression. Let's discuss this issue and > then decide what to do, not blindly revert changes that happened two > months ago. If you look back, you'll see that I questioned the change two months ago. It should have been reverted then. > As a matter of fact, I was rather thinking of the > two-stage-processing feature that should have been put on a branch, Yes, it should have been. That was definitely a bad idea, a wrong idea. It must be removed. Will you please do that? The reason you gave for the revision 3659 split & re-prioritizing was: I tried so divide the default transforms into two stages, So those changes were a direct result of the "two stage" idea. Repeat until enlightenment: There is no "two-stage processing". There is only processing. We will make no compromises for "two-stage processing". Any compromises made for this so far were mistaken, and must be removed. We will make changes to allow a doctree to be processed multiple times. > not the FinalChecks refactoring. I'm absolutely *not* going to > branch for simple refactorings like this one, especially as long as > we have review times over two weeks. We absolutely *are* going to branch for any and all API changes. The trunk is currently a mess. You made it that way with your heavy-handed checkins. I'm dreading the cleanup that's required. >> I have neither the time nor the will to check all the transform >> dependencies to make sure that rearranging them hasn't introduced >> some kind of problem. > > I don't think it has introduced a problem. Perhaps. I don't know. But it wasn't broken before, and you changed it arbitrarily, on a whim. If it ain't broke, don't fix it. It could very well be that there's no problem. But at this point, that's not what I care about. It's just a symptom. I care about the "two-stage processing" misfeature that's now in the code, and must be removed. And I care about the fact that you check in your ideas too quickly, without consultation. If your ideas were universally great, that would be OK, but this one was really, really bad. >> I already did that work [checking all dependencies] when creating >> the transforms in the first place, and I don't want to do it again. > > A.k.a. "it's so poorly structured that I cannot easily check whether > it correct, but it works at the moment, so won't you dare touch it > because you might break it". That smells *extremely* bad. No, that's not what I said nor what I meant, and you know it. The interactions between transforms are complex and tricky. Having written most of them, I know it very well. Document tree transforms are hard, mind-twisting. Don't twist my words or attempt to put words in my mouth. > And it indicates that the code is in need of some refactoring. > Renaming a reference checker from FinalChecks to DanglingReferences > (compare: which name says more about what the transform actually > does?) I don't give a $#!* about the name change. > and changing its priority so it's executed together with the other > reference-handling tranforms *That's* what I care about. > is clearly a step into the right direction. No, it is not. Look, that particular part of the transform was given that priority *for a reason*. The fact that I don't recall the reason, just says that it should have been documented better. Mea culpa. And as I said, it could well be that there is no problem now. But that's not the point. The point is, there's no reason to arbitrarily change a working transform's priority just so that the transform priorities sort cleaner. That's simply ridiculous. > Embrace Change, please. There's a difference between arbitrary change and progress. I embrace progress. Change for change's sake is *not* progress. Felix, stop being an ass, and stop wasting my time. This comment and the one above are offensive. ***** This weekend I'm going to take the time to review the codebase as it stands, and the transforms branch. -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2005-09-09 21:59:57
|
David Goodger wrote: > Felix Wiemann wrote: > >> (I don't think that making the Messages transform check for an >> existing "Docutils System Messages" section is a solution -- it >> doesn't solve the general problem, only the specific issue at hand.) > > What is the general problem then? I only see the specific issue. I want to the following to be possible: 1. Process an input document to Docutils' node tree. [1]_ 2. Modify the node tree using external tools. 3. Process the resulting node tree to an output document. For that, it is necessary to output the document can be written and read using e.g. an XML writer and parser of Docutils. I do *not* want to place the node-tree-modifying code (step 2) in a Docutils Transform, for several reasons: * I do not want to use Python. * I might want to store the node tree. * I do not want to couple the architecture of my program with Docutils' architecture. The problem is that the three steps listed above will not yield the same result as processing a document in a single blow. This is not a theoretic problem we can ignore: It results in hard-to-find bugs like the double insertion of a "Docutils System Messages" section. We can fix these symptoms, but I would rather like to see the cause fixed. Now, is there anything *wrong* with fixing the cause? That is, do the changes in the "transforms" branch cause any new problems? If they do not, it seems to me that we should merge the branch to the trunk because the changes have the (IMO significant) advantage I pointed out above. .. [1] I think we should recognize that the node tree is a valuable format to represent the structure of a document -- i.e. it can serve real-world purposes just like DocBook, except that it probably won't be edited manually. We should not limit the use of the node tree to being *only* an auxiliary internal representation of a document. -- For private mail please ensure that the header contains 'Felix Wiemann'. "the number of contributors [...] is strongly and inversely correlated with the number of hoops each project makes a contributing user go through." -- ESR |
From: David G. <go...@py...> - 2005-09-13 01:44:55
Attachments:
signature.asc
|
[Felix Wiemann] > I want the following to be possible: > > 1. Process an input document to Docutils' node tree. [1]_ ... > .. [1] I think we should recognize that the node tree is a valuable > format to represent the structure of a document -- i.e. it can > serve real-world purposes just like DocBook, except that it > probably won't be edited manually. We should not limit the use > of the node tree to being *only* an auxiliary internal > representation of a document. Yes, that may be so, but there is a difference between the doctree and the output of the docutils_xml Writer. They are not one and the same. The doctree is a data structure, and its XML is just its representation: a shadow of the thing, not the thing itself. > 2. Modify the node tree using external tools. Are you talking about the doctree or the XML? If the former, then the external tools would necessarily have to be Python tools, not XSL or anything else. > 3. Process the resulting node tree to an output document. > > For that, it is necessary to output the document can be written and > read using e.g. an XML writer and parser of Docutils. The XML is *not* the node tree. They are not entirely isomorphic. This would be possible with a pickled or otherwise perfectly serialized doctree. The XML output is *not* a perfect serialization. It is merely one form of output. > I do *not* want to place the node-tree-modifying code (step 2) in a > Docutils Transform, for several reasons: > > * I do not want to use Python. > * I might want to store the node tree. > * I do not want to couple the architecture of my program with > Docutils' architecture. Then sorry, you're out of luck. That's an insoluble problem, because the doctree is a *Python* data structure, and it is not just coupled with the Docutils architecture, it *is* the Docutils architecture. > The problem is that the three steps listed above will not yield the > same result as processing a document in a single blow. This is not > a theoretic problem we can ignore: It results in hard-to-find bugs > like the double insertion of a "Docutils System Messages" section. > We can fix these symptoms, but I would rather like to see the cause > fixed. > > Now, is there anything *wrong* with fixing the cause? That is, do > the changes in the "transforms" branch cause any new problems? Mostly, the transforms branch is good. It represents a net gain. But there are some problems with it (detailed in a separate message in the "transforms branch" thread, today), such as the "pending" node issue. The transforms branch as is represents a less-than-perfect understanding of the Docutils architecture. I hope that this reply and the one referred to above will help enlighten you. > If they do not, it seems to me that we should merge the branch to > the trunk because the changes have the (IMO significant) advantage I > pointed out above. I'm absolutely against merging the transforms branch as it stands, for the reasons I give in the above-mentioned message. -- David Goodger <http://python.net/~goodger> |
From: Martin B. <mar...@gm...> - 2005-09-10 20:10:33
|
On 9/9/05, David Goodger <go...@py...> wrote: > I don't see a design problem, I see a problem in the understanding and > interpretation of the design. There may be some issues with the > details, such as the specific set of transforms specified by > components, but that's solvable without drastically changing the > design. >=20 > > I understand and agree to your objections. Please suggest a better > > solution. >=20 > The solution is as follows: stop thinking about two-stage processing. Wouldn't it be nice if we could think about two-stage processing. > Every run of Docutils involves a full set of components: Reader, > Parser, Transformer (with transforms supplied by the other > components), and Writer. Even in the case of publish_doctree, there > *is* a Writer: docutils.writers.null. In publish_from_doctree, the > docutils.parsers.null Parser is used. When processing to and from a > doctree, Docutils is simply run twice. Here is the thing: there is this great conversion tool that defines a pretty nice intermediate data structure to do its job. It is already designed by component, where the input and output phases are almost completely decoupled. It is already setup for two-stage processing.=20 Just some minor assumptions--which require some work to check through-- prevent from saying "yes you can do that". With two-stage processing, you can do new things that you could not do otherwise (i.e. parse once, output to many formats from the same data structure. Or just parse and store for later conversion). Not allowing it is missing an opportunity. David: it's your fault. You made the code too clean and separated, and the code ITSELF is calling for two-stage processing. cheers, |
From: David G. <go...@py...> - 2005-09-13 01:52:08
Attachments:
signature.asc
|
[Felix Wiemann] >>> I understand and agree to your objections. Please suggest a >>> better solution. [David Goodger] >> The solution is as follows: stop thinking about two-stage >> processing. [Martin Blais] > Wouldn't it be nice if we could think about two-stage processing. No. It isn't necessary. Not needed at all. All we need are specialized Writer and Reader classes to handle these cases, and it works just fine. We have those already, or are very close to them. Converting Docutils into a two-stage processing engine would be a huge, serious, fatal mistake. I hope my replies today convince you of that. If not, I don't know how else to express it. But I am absolutely certain, and nothing I have seen to date has come close to convincing me otherwise. (I could be wrong, I freely admit that. I'm only human. But I feel no twinge of uncertainty in this case. None.) > Here is the thing: there is this great conversion tool that defines > a pretty nice intermediate data structure to do its job. It is > already designed by component, where the input and output phases are > almost completely decoupled. It is already setup for two-stage > processing. Yes, but naturally, by assembling the correct components -- all the components, omitting none -- and **running Docutils twice**. > Just some minor assumptions--which require some work to > check through-- prevent from saying "yes you can do that". The "minor assumptions" you mention are fundamental to the architecture. > With two-stage processing, you can do new things that you could not > do otherwise (i.e. parse once, output to many formats from the same > data structure. Or just parse and store for later conversion). Not > allowing it is missing an opportunity. What can't be done now? Together with the doctree Reader (or perhaps a new pickle Reader), your pickle Writer does the job just fine. > David: it's your fault. You made the code too clean and separated, > and the code ITSELF is calling for two-stage processing. Thanks... I think ;-). I appreciate the premise, but I disagree with the conclusion. -- David Goodger <http://python.net/~goodger> |
From: Martin B. <bl...@fu...> - 2005-09-14 16:37:25
|
On 9/12/05, David Goodger <go...@py...> wrote: > > Just some minor assumptions--which require some work to > > check through-- prevent from saying "yes you can do that". >=20 > The "minor assumptions" you mention are fundamental to the > architecture. I think this is the crux of the disagreement. I do not see that. To me, separating the stages explicitly makes the architecture cleaner. Why is it necessary that a writer be specified and run if I want to stop at the document structure and not create output at this point? Is there some case of weird coupling between one type of reader and one type of writer that requires such an assumption? I'm sure you've got good reasons for maintaining this standpoint, I'm just not sure what they are. > > With two-stage processing, you can do new things that you could not > > do otherwise (i.e. parse once, output to many formats from the same > > data structure. Or just parse and store for later conversion). Not > > allowing it is missing an opportunity. >=20 > What can't be done now? Together with the doctree Reader (or perhaps > a new pickle Reader), your pickle Writer does the job just fine. The problem is: there is no guarantee that in the future someone will create some kind of ordering or other dependency between a particular writer and a particular reader, and that it will break applications which choose to do a two-step process to convert a document. cheers, |
From: David G. <go...@py...> - 2005-09-15 03:03:13
Attachments:
signature.asc
|
[Martin Blais] >>> Just some minor assumptions--which require some work to >>> check through-- prevent from saying "yes you can do that". >> >> The "minor assumptions" you mention are fundamental to the >> architecture. > > I think this is the crux of the disagreement. > I do not see that. You do not see what? > To me, separating the stages explicitly makes the architecture > cleaner. Not to me. :-) If you feel strongly about it, start a branch and show us some code. > Why is it necessary that a writer be specified and run if I want to > stop at the document structure and not create output at this point? What do you want out of the process? The doctree. How do you extract the doctree in mid-process? Well, you could write a new Publisher method (perhaps "publish_partially"). Or you could write a Writer that does little or nothing to the doctree, but instead just outputs the doctree as-is. The latter is what we're doing now. It's the equivalent of the former, but a lot easier. It takes advantage of the component nature of Docutils naturally, without any need for reworking the internal workings. So why bother? > Is there some case of weird coupling between one type of reader and > one type of writer that requires such an assumption? No. It's simply the designed architecture. > I'm sure you've got good reasons for maintaining this standpoint, > I'm just not sure what they are. See http://docutils.sf.net/docs/peps/pep-0258.html#docutils-project-model > The problem is: there is no guarantee that in the future someone > will create some kind of ordering or other dependency between a > particular writer and a particular reader, and that it will break > applications which choose to do a two-step process to convert a > document. Although that would probably be considered a bug, it just proves my point. In any case, let's deal with real problems if and when they occur, and not waste time worrying about or designing around hypothetical future problems. -- David Goodger <http://python.net/~goodger> |
From: Martin B. <bl...@fu...> - 2005-09-15 16:38:13
|
On 9/14/05, David Goodger <go...@py...> wrote: > [Martin Blais] > >>> Just some minor assumptions--which require some work to > >>> check through-- prevent from saying "yes you can do that". > >> > >> The "minor assumptions" you mention are fundamental to the > >> architecture. > > > > I think this is the crux of the disagreement. > > I do not see that. >=20 > You do not see what? I do not see a necessity for the three processes to always be present upon any single invocation. > > To me, separating the stages explicitly makes the architecture > > cleaner. >=20 > Not to me. :-) > If you feel strongly about it, start a branch and show us some code. Not much needs to be done: all we need to do is make the list of transforms separate for source/reader/parser and writer/destination.=20 Two lists. With the explicit assumption that what is in-between is always the same. (More detailed explanation below.) > > Why is it necessary that a writer be specified and run if I want to > > stop at the document structure and not create output at this point? >=20 > What do you want out of the process? The doctree. How do you extract > the doctree in mid-process? Well, you could write a new Publisher > method (perhaps "publish_partially"). Or you could write a Writer > that does little or nothing to the doctree, but instead just outputs > the doctree as-is. The latter is what we're doing now. It's the > equivalent of the former, but a lot easier. It takes advantage of the > component nature of Docutils naturally, without any need for reworking > the internal workings. So why bother? A case where the writer runs a transform intermingled with the reader/parser could cause a dependency that is difficult to break of when running the transforms in two processes. Case 1: normal uninterrupted invocation config: reader R1, parser P1, writer W1. transforms (in order): W1.t1 R1.t1 W1.t2, R2.t2 Case 2: two-step invocation, storing the doctree in-between: config: reader R1, parser P1, null writer. transforms (in order): R1.t1, R2.t2 step 2, read the doctree, and then: config: dummy reader, writer W1 transforms (in order): W1.t1, W2.t2 In case 1, the transforms run are: W1.t1 R1.t1 W1.t2 R2.t2 In case 2, the effect is: R1.t1 R1.t2 W1.t1 W1.t2 Note: the order of the transforms is determined by the priority of the transform, and it is possible that a writer transform have a higher priority (comes first) than a reader transform. The example above is an example of this: W1.t1 comes before R1.t1. Right now, there is no guarantee that there is no kind of interaction between, say, the changes to the tree made by W1.t1 and R1.t1. If there is, the results will be different because the transforms get run in a different order depending on if we interrupt conversion or not. The whole point of this discussion can be summarized as "a guarantee" that you will obtain the SAME result if you convert in one step or in two steps. I think that it holds in practice. There is no guarantee, however, and until we split the list of transforms into two lists, one on each side of the point where you would interrupt the conversion (e.g. to store the tree in a blob in a database, like I'm doing), we cannot make this guarantee. By splitting the list of transforms in two, we allow a point at which we know the tree is independent of the configuration of the writer. Do you see any good reason not to split the list of transforms into two lists, to insure that the transforms in the first list are always run before all the transforms in the second list? > > I'm sure you've got good reasons for maintaining this standpoint, > > I'm just not sure what they are. >=20 > See http://docutils.sf.net/docs/peps/pep-0258.html#docutils-project-model This graph itself does not tell the whole story. The Transformer stage depends on the configuration of both the reader and the writer, by way of the list of transforms which is ordered by its priority number. Choose a different writer, and you might get a different doctree in-between. Not a problem if you know a-priori where the output should go, but it *may* be a problem if you store in-between and might output to different media. > > The problem is: there is no guarantee that in the future someone > > will create some kind of ordering or other dependency between a > > particular writer and a particular reader, and that it will break > > applications which choose to do a two-step process to convert a > > document. >=20 > Although that would probably be considered a bug, it just proves my > point. In any case, let's deal with real problems if and when they > occur, and not waste time worrying about or designing around > hypothetical future problems. Sure, it just works now, but that you do not acknowledge the existence of a quirk in the design only stimulates more questioning and debate.=20 I hope the example I give above makes clear precisely what my bugger is with the potential problem that could occur. cheers, |
From: David G. <go...@py...> - 2005-06-29 22:26:57
Attachments:
signature.asc
|
[Martin Blais] > in other words, are the writer transforms always run after all the > other transforms? No. See docs/ref/transforms.txt for the order. -- David Goodger <http://python.net/~goodger> |
From: David G. <go...@py...> - 2005-06-29 22:27:16
Attachments:
signature.asc
|
> Martin Blais wrote: >> With the current design, the results may be different if you run it >> at once, or in separate steps as above. [Felix Wiemann] > Yes, but they should not be different, and they needn't be. That's > the issue my previous posting addresses. I don't think there's any silver-bullet method to prevent differing results. It requires careful analysis. The current ordering of transforms has grown organically, determined by the transforms themselves. IOW, transform A clearly has to be applied before transform B, thus the ordering. -- David Goodger <http://python.net/~goodger> |
From: David G. <go...@py...> - 2005-06-29 22:27:21
Attachments:
signature.asc
|
[Felix Wiemann] > Another solution to make sure the stylesheet is available: > > Place it next to the HTML writer, as docutils/writers/default.css. Sure, as a data file; it's feasible. The default setting (i.e. the setting implied by ``None``) could be that stylesheet. The path would have to be computed though, since it is installation-dependent. > This would ensure it's always available and we wouldn't need to warn > about missing stylesheets then. I suppose. The user or application could supply a nonexistent stylesheet though. If embedded, it would cause an exception. If not, it would result in poor rendering. -- David Goodger <http://python.net/~goodger> |
From: David G. <go...@py...> - 2005-09-16 13:06:14
Attachments:
signature.asc
|
[Martin Blais] >>>>> Just some minor assumptions--which require some work to >>>>> check through-- prevent from saying "yes you can do that". [David Goodger] >>>> The "minor assumptions" you mention are fundamental to the >>>> architecture. [Martin Blais] >>> I think this is the crux of the disagreement. >>> I do not see that. [David Goodger] >> You do not see what? [Martin Blais] > I do not see a necessity for the three processes to always be > present upon any single invocation. (I assume you mean the three major components. It's only one process.) That's the architecture. It simplifies a lot, because it's not *just* the three components. The Reader has a source (docutils.io.Input object) attached to it, the Writer has a destination (docutils.io.Output object) attached, and all 3 major components (Reader, Parser, Writer) can specify transforms. >>> To me, separating the stages explicitly makes the architecture >>> cleaner. >> >> Not to me. :-) >> If you feel strongly about it, start a branch and show us some >> code. > > Not much needs to be done: Then start a branch, and do it. I'm not interested. And even if you *do* start a branch, I'm not guaranteeing that I'll OK it. I want to see concrete evidence that it's *useful* and *solves a real problem* first. Hypotheticals aren't enough. > all we need to do is make the list of transforms separate for > source/reader/parser and writer/destination. Two lists. That's fine, from a purely theoretical standpoint. My position is practical: the transforms have a natural ordering, and that ordering works. Careful analysis of the transforms themselves tells us what order they have to be applied in. The priorities of the transforms are a direct result of this. If dependencies do exist between transforms, any attempt to reorder them will fail. If and when something doesn't work, *then* we'll deal with it. I'm really not interested in hypotheticals. > With the explicit assumption that what is in-between is > always the same. (More detailed explanation below.) ... > A case where the writer runs a transform intermingled with the > reader/parser could cause a dependency that is difficult to break of > when running the transforms in two processes. > > Case 1: normal uninterrupted invocation > > config: reader R1, parser P1, writer W1. > transforms (in order): W1.t1 R1.t1 W1.t2, R2.t2 > > Case 2: two-step invocation, storing the doctree in-between: > > config: reader R1, parser P1, null writer. > transforms (in order): R1.t1, R2.t2 > > step 2, read the doctree, and then: > > config: dummy reader, writer W1 > transforms (in order): W1.t1, W2.t2 > > In case 1, the transforms run are: > > W1.t1 R1.t1 W1.t2 R2.t2 > > In case 2, the effect is: > > R1.t1 R1.t2 W1.t1 W1.t2 > > Note: the order of the transforms is determined by the priority of > the transform, and it is possible that a writer transform have a > higher priority (comes first) than a reader transform. The example > above is an example of this: W1.t1 comes before R1.t1. > > Right now, there is no guarantee that there is no kind of > interaction between, say, the changes to the tree made by W1.t1 and > R1.t1. If there is, the results will be different because the > transforms get run in a different order depending on if we interrupt > conversion or not. That's hypothetically true. But is there any such case in concrete reality? If there is, then I see two possibilities: 1) The transform priorities are wrong, or the transforms themselves are flawed. They should be fixed. 2) The transform priorities are correct. The natural transform ordering prevents the division you seek. IOW, the dependencies prevent two-stage processing, and cannot be fixed. > The whole point of this discussion can be summarized as "a > guarantee" that you will obtain the SAME result if you convert in > one step or in two steps. I think that it holds in practice. Show me a case where this doesn't hold, and then we'll talk. > There is no guarantee, however, and until we split the list of > transforms into two lists, one on each side of the point where you > would interrupt the conversion (e.g. to store the tree in a blob in > a database, like I'm doing), we cannot make this guarantee. By > splitting the list of transforms in two, we allow a point at which > we know the tree is independent of the configuration of the writer. We make no claim of offering any such guarantee! Docutils was never designed to do two-pass processing. Either it works, or it doesn't. If it works, we have nothing to talk about. If it doesn't, we have one of the two cases above: either it can be fixed, or it can't. If it can be fixed, we'll fix it. If it can't, too bad. I have no evidence that the system doesn't work, either in regular or multiple-pass processing. Show me some concrete evidence otherwise, and I'll change my tune. > Do you see any good reason not to split the list of transforms into > two lists, to insure that the transforms in the first list are > always run before all the transforms in the second list? Yes: it works as it is now. Without a real, concrete example of it not working, it's not worth the effort. >>> I'm sure you've got good reasons for maintaining this standpoint, >>> I'm just not sure what they are. >> >> See http://docutils.sf.net/docs/peps/pep-0258.html#docutils-project-model > > This graph itself does not tell the whole story. No, of course not. But it does tell the story of the data path. There's a clear entry point for input, and a clear exit point for output. That model works well. Without it, Docutils would probably not exist today. > The Transformer stage depends on the configuration of both the > reader and the writer, by way of the list of transforms which is > ordered by its priority number. > Choose a different writer, and you might get a different doctree > in-between. Not a problem if you know a-priori where the output > should go, but it *may* be a problem if you store in-between and > might output to different media. "Might" and "may" aren't enough to warrant reimplementing a subsystem that works just fine now. Show it to be "do" and "is" first. >>> The problem is: there is no guarantee that in the future someone >>> will create some kind of ordering or other dependency between a >>> particular writer and a particular reader, and that it will break >>> applications which choose to do a two-step process to convert a >>> document. >> >> Although that would probably be considered a bug, it just proves my >> point. In any case, let's deal with real problems if and when they >> occur, and not waste time worrying about or designing around >> hypothetical future problems. > > Sure, it just works now, but that you do not acknowledge the > existence of a quirk in the design only stimulates more questioning > and debate. I don't see any design quirks. I see a design that tackles a real-world problem, and real-world problems are sometimes dirty. To a certain extent Docutils grew organically. I thought long and hard about the initial design of Docutils, and it has evolved since then. The addition of multi-pass processing functionality is just another stage in its evolution. And BTW, I'm not defending the design because it's mine, or refusing to own up to design quirks that I'm emotionally attached to. If a real problem does present itself, we'll tackle it. If it requires a redesign, so be it. I've thrown away a lot of my own code in the past, and I'm sure I'll throw away a lot more in the future. But not without good reason. I'm perfectly willing to see more, and more radical, evolution of the Docutils architecture, **iff** it's warranted. In any case, I've had enough of debate on this issue. Show me the evidence (input data and output results, clearly showing the problem), or give me peace! > I hope the example I give above makes clear precisely what my bugger > is with the potential problem that could occur. Yes, thank you for detailing the issue. Up until now, I haven't understood what the real issue was. I agree that if there really is a problem, we should fix it. But as far as I know, it works just fine now, and I see no reason to fix anything. -- David Goodger <http://python.net/~goodger> |
From: Martin B. <bl...@fu...> - 2005-09-17 04:59:42
|
On 9/16/05, David Goodger <go...@py...> wrote: > >>> To me, separating the stages explicitly makes the architecture > >>> cleaner. > >> > >> Not to me. :-) > >> If you feel strongly about it, start a branch and show us some > >> code. > > > > Not much needs to be done: >=20 > Then start a branch, and do it. I'm not interested. And even if you > *do* start a branch, I'm not guaranteeing that I'll OK it. I want to > see concrete evidence that it's *useful* and *solves a real problem* > first. Hypotheticals aren't enough. This is not hypothetical, rather 100% practical. I've got a real application (Nabu) which is used to support creating front-ends which very likely will rely on that assumption in order to serve documents in various formats from the same doctree. This cannot be done reliably without some kind of order-independent guarantee, EVEN IF IT WORKS: you could decide to create such a dependency in the future. Otherwise there will be a latent bug in the system (and a hard-to-find one). I'm going to check the priorities of the transforms and of those in the sandbox, it will take less energy than arguing on a mailing-list. cheers, |