From: David A. <da...@bo...> - 2004-06-25 13:22:22
|
David Goodger <go...@py...> writes: > David Abrahams wrote: > > There are still quite a few places where a node's file is set to > > None and its line number is incorrect (field lists, definition > > lists, enumerated lists). > > > > I'd like some guidance as to how to go about fixing these. If I can > > understand enough of the system I'll fix the include directive > > problems. > > OK. There's a bit about it below; otherwise, ask away. > > > I also have some comments on the comments: > > I agree with your comments about the docstrings; please feel free to > edit the source directly. Are you saying that now that you've answered you want me to go and put your answers into the docstrings? > >> - ``content`` is a list of strings, the directive content. > > > > These are in raw form, or what? > > With maximal common leading whitespace removed, yes. It's actually an > instance of docutils.statemachine.StringList, for purposes of keeping > track of the source & line offset of the text. That was added after > the "include" directive, as were the "source" & "line" internal > attributes of the doc tree. They're all related. It surprises me to hear all that, knowing that error reporting gets messed up by the include directive anyway. > >> - ``lineno`` is the line number of the first line of the directive. > > > > > > Line number with respect to what? > > With respect to the beginning of the document. ^^^^^^^^ You should probably adopt another term. C/C++ uses "translation unit" to mean the file being compiled and everything it includes. Not that it's such a wonderful term, but you know it, it disambiguates this sort of thing perfectly. > For example, > > First > > .. include:: other.txt > > Last > > "First" is line number 1. The directive is line 3. If other.txt > contains 5 lines, it's first line will be line 5 (a blank line is > inserted after the "include" to separate them), it's last line will be > line 9, and "Last" will be line 11 (or 12; another blank line may be > inserted). > > > From what you say below, it might be the number of lines processed > > by the parser before arriving at the beginning of the directive, > > plus one. However, my observations showed lineno to be less than > > content_offset, which is inconsistent with that theory. > > Remember, line number is just one more than line offset. "Line offset"? That's a new term in this discussion (so I can't remember it, yet ;->). > The content of a directive may start several lines after the first > line of the directive itself. So lineno < content_offset is not > unusual. E.g. (1-based line numbers on the left): > > 5 .. note:: > 6 > 7 Figure title. > > ``lineno`` should be 5; ``content_offset`` should be 6 (+ 1 == 7, the > line number of the content). My content never starts more than 1 line after the first line of the directive, so this doesn't explain why I was seeing differences of more than 5. > >> - ``content_offset`` is the line offset of the first line of the > >> content from the beginning of the current input. Used when > >> initiating a nested parse. > > > > What does "the beginning of the current input" mean, precisely? > > The current input is the list of lines of text (a StringList, > acutally) which the current state machine is currently processing. > The parser is recursive. Take this input: > > Normal paragraph 1 > > Block quote para 1 > > Block quote para 2 > > Normal paragraph 2 > > After processing the first line (Normal paragraph 1), the parser sees > an indented block. It isolates the block, removes the maximal common > leading whitespace (4 spaces here), creates a block_quote node, > creates a new state machine to parse the block quote, and initializes > it with the 3 lines of the block quote. Those 3 lines become the > block-quote-parsing state machine's "current input". So the first > line (Block quote para 1) is at offset 0. Why don't you say "the beginning of the current nested parse", then? > > "Used when initiating a nested parse" is unhelpful. Used how? > > Passed to state.nested_parse. In the "note" directive example above, > state.nested_parse is called to parse the note's content. Oh... <light dawns> You mean that it's used when the directive itself needs to initiate a nested parse on its contents (?). > > Overall I get the sense that this interface has too many parameters, > > and that some grouping and ecapsulation would be beneficial to the > > architecture. For example, maybe it makes sense to store all the > > line/content_offset information on the State. Maybe access to the > > state machine could also be through the state. > > Then directive function code would have to drill down through the > ``state`` attributes, several levels deep, for information that's > commonly needed. Wouldn't that break encapsulation? Probably; I was just grasping at straws for ways to simplify. > >> Function attributes, interpreted by the directive parser (which > >> calls the directive function): > >> > >> - ``arguments``: A 3-tuple specifying the expected positional > >> arguments > ... > > This sounds awfully familiar. It seems as though some of this could > > be handled more elegantly using the Python argument-checking > > mechanisms... > > I used familiar terms to describe familiar concepts. But they're > different enough that no, we can't use getopt or optparse. I wasn't thinking of that. I always assumed that when I said "Python argument..." you'd know I meant the arguments to methods. Methods have positional and keyword parameters. > > if only the directive function itself didn't have so many darned > > parameters ;-) > > I don't see how they're related. Because using additional positional and keyword parameters to handle the directive's user interface would only bloat the interface further. > > Alternatively, you could really clean this stuff up with classes. > > Have the base directive-handler's __init__ function gather the stuff > > currently being passed as arguments and stuff it into attributes, > > and then have a method (__call__?) that accepts the Directive > > Arguments and Directive Options as positional and keyword > > parameters. > > At first glance, that just seems more complex. If you really care, > show us some code to convince us. The sketch looks something like this (I don't love the first attempt here): class DirectiveParser def __init__(name, lineno, content_offset, state, state_machine): self.name = name self.content = content self.lineno = lineno self.content_offset = content_offset self.block_text = block_text self.state = state self.state_machine = state_machine self.reporter = state_machine.reporter def parse(self, block_text, content, *directive_arguments, **directive_options): raise NotImplementedError class Topic(DirectiveParser): def parse( self , block_text , content , title # one argument, called "title" is required , class_ = None # options have trailing underscores , subtitle_ = None ): if class_: class_ = directives.class_option(class_) if not self.state_machine.match_titles: error = self.reporter.error( 'The "%s" directive may not be used within topics, sidebars, ' 'or body elements.' % name, nodes.literal_block(block_text, block_text), line=self.lineno) return [error] # ...etc... It feels like using function parameters for these things might be a stretch, but you could *still* make it all better by doing something like this: class DirectiveParser required_args = 0 optional_args = 0 option_processors = {} def __init__(name, lineno, content_offset, state, state_machine): self.name = name self.content = content self.lineno = lineno self.content_offset = content_offset self.block_text = block_text self.state = state self.state_machine = state_machine self.reporter = state_machine.reporter def parse(block_text, content, arguments, options): raise NotImplementedError class Topic(DirectiveParser): required_args = 1 option_processors = { 'class' : directives.class_option }; def parse( self , block_text , content , directive_arguments , directive_options ): if class_: class_ = directives.class_option(class_) if not self.state_machine.match_titles: error = self.reporter.error( 'The "%s" directive may not be used within topics, sidebars, ' 'or body elements.' % name, nodes.literal_block(block_text, block_text), line=self.lineno) return [error] # ...etc... You could also buy a lot with inheritance, e.g. sidebar is derived from topic. I guess I don't have the time to work out the optimal design in detail right now, but I hope you get the idea. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com |