From: Rayene B. R. <ray...@gm...> - 2009-02-23 01:57:30
|
Hi, Well, this one was hard to find out. I had a long document that failed at parsing just because it contained the word "RFC" followed by a number. After some googling I found that there is a text role especially for RFCs http://docutils.sourceforge.net/docs/ref/rst/roles.html#rfc-reference I think that docutils is trying to parse my "RFC 2462" as if it was " :RFC:`2462` " . Am I right ? So I managed to replace all the occurences with the text role but it does not work. Same message (see below) I also tried all the combinations like RFC-2462 RFC2462... same message again ! >>> docutils.__version__ '0.5' File "/Users/rayenebenrayana/Subversion/DjangoProj/App/models.py", line 51, in __init__ parser.parse(text, doc) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/__init__.py", line 157, in parse self.statemachine.run(inputlines, document, inliner=self.inliner) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 170, in run input_source=document['source']) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/statemachine.py", line 232, in run context, state, transitions) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/statemachine.py", line 420, in check_line return method(match, context, next_state) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 1366, in field_marker field, blank_finish = self.field(match) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 1391, in field self.parse_field_body(indented, line_offset, field_body) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 1401, in parse_field_body self.nested_parse(indented, input_offset=offset, node=node) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 266, in nested_parse node=node, match_titles=match_titles) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 195, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/statemachine.py", line 238, in run result = state.eof(context) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 2606, in eof self.blank(None, context, None) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 2598, in blank context, self.state_machine.abs_line_number() - 1) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 399, in paragraph textnodes, messages = self.inline_text(text, lineno) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 408, in inline_text return self.inliner.parse(text, lineno, self.memo, self.parent) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 504, in parse processed += self.implicit_inline(remaining, lineno) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 958, in implicit_inline + method(match, lineno) + File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/docutils/parsers/rst/states.py", line 936, in rfc_reference ref = self.document.settings.rfc_base_url + self.rfc_url % rfcnum AttributeError: Values instance has no attribute 'rfc_base_url' Cheers, Rayene, |
From: David G. <go...@py...> - 2009-02-23 02:57:16
|
On Sun, Feb 22, 2009 at 20:39, Rayene Ben Rayana <ray...@gm...> wrote: > Well, this one was hard to find out. > I had a long document that failed at parsing just because it contained the > word "RFC" followed by a number. Works for me ... by which I mean, "RFC 1234" in a document does nothing special. > After some googling I found that there is a text role especially for RFCs > http://docutils.sourceforge.net/docs/ref/rst/roles.html#rfc-reference > > I think that docutils is trying to parse my "RFC 2462" as if it was " > :RFC:`2462` " . Am I right ? Implicit RFC parsing is an optional feature, which you can turn on, but is off by default. Are you using standard Docutils tools, or your own code? I suspect the latter, and that you're not using runtime settings properly. See http://docutils.sourceforge.net/docs/api/runtime-settings.html Please send a minimal doc & code, otherwise all we can do is speculate. -- David Goodger <http://python.net/~goodger> |
From: Rayene B. R. <ray...@gm...> - 2009-02-23 16:43:21
|
Thanks for the answer David. I am using standard versions of docutils and python. By standard I mean the ones provided by macports with no modifications. python version is 2.6.1 docutils version is 0.5 But you are right when you said that I modified the default configuration. here is a minimal code that reproduces the problem: a ReST file named a.rst contains just : RFC 1234 a Python script : import docutils.parsers.rst text = open('a.rst').read() parser = docutils.parsers.rst.Parser() doc = docutils.utils.new_document('doc') doc.settings.tab_width = 4 doc.settings.pep_references = 1 doc.settings.rfc_references = 1 parser.parse(text, doc) The problem was partially solved by forcing doc.settings.rfc_references = 0 With doc.settings.rfc_references = 0, a file containing a bare RFC 1234string is parsed normally. But the problem remains when using the text role :RFC:`1234`. Then, I had the idea to add : doc.settings.rfc_base_url = 'http://www.ietf.org/rfc/' Which solved the problem. This is when I understood that my problem came from the fact that my default configuration is incomplete. In fact : import docutils.parsers.rst doc = docutils.utils.new_document('doc') print(dir (doc.settings)) gives: ['__cmp__', '__doc__', '__init__', '__module__', '__repr__', '__str__', '_config_files', '_destination', '_disable_config', '_source', '_update', '_update_careful', '_update_loose', 'auto_id_prefix', 'config', 'copy', 'datestamp', 'debug', 'dump_internals', 'dump_pseudo_xml', 'dump_settings', 'dump_transforms', 'ensure_value', 'error_encoding', 'error_encoding_error_handler', 'exit_status_level', 'expose_internals', 'footnote_backlinks', 'generator', 'halt_level', 'id_prefix', 'input_encoding', 'input_encoding_error_handler', 'language_code', 'output_encoding', 'output_encoding_error_handler', 'read_file', 'read_module', 'record_dependencies', 'report_level', 'sectnum_xform', 'source_link', 'source_url', 'strict_visitor', 'strip_classes', 'strip_comments', 'strip_elements_with_classes', 'title', 'toc_backlinks', 'traceback', 'update', 'warning_stream'] As you can see, rfc_references (among others) is missing. Right now, I don't know if the problem comes from my code (i.e. I have to tell new_document() to use the default conf explicitely) or if it comes from the distribution (macports). I did not understand everything but I tried : import docutils.parsers.rst from docutils.frontend import OptionParser settings = OptionParser().get_default_values() doc = docutils.utils.new_document('doc', settings) print(dir (doc.settings)) with no success (had the same outout). Any clue or a code snippet ? Thanks in advance, Rayene, On Mon, Feb 23, 2009 at 3:57 AM, David Goodger <go...@py...> wrote: > On Sun, Feb 22, 2009 at 20:39, Rayene Ben Rayana > <ray...@gm...> wrote: > > Well, this one was hard to find out. > > I had a long document that failed at parsing just because it contained > the > > word "RFC" followed by a number. > > Works for me ... by which I mean, "RFC 1234" in a document does nothing > special. > > > After some googling I found that there is a text role especially for RFCs > > http://docutils.sourceforge.net/docs/ref/rst/roles.html#rfc-reference > > > > I think that docutils is trying to parse my "RFC 2462" as if it was " > > :RFC:`2462` " . Am I right ? > > Implicit RFC parsing is an optional feature, which you can turn on, > but is off by default. > > Are you using standard Docutils tools, or your own code? I suspect the > latter, and that you're not using runtime settings properly. See > http://docutils.sourceforge.net/docs/api/runtime-settings.html > > Please send a minimal doc & code, otherwise all we can do is speculate. > > -- > David Goodger <http://python.net/~goodger <http://python.net/%7Egoodger>> > |
From: David G. <go...@py...> - 2009-02-23 16:57:04
|
On Mon, Feb 23, 2009 at 11:43, Rayene Ben Rayana <ray...@gm...> wrote: > Thanks for the answer David. > > I am using standard versions of docutils and python. By standard I mean the > ones provided by macports with no modifications. > > python version is 2.6.1 > docutils version is 0.5 > > But you are right when you said that I modified the default configuration. > here is a minimal code that reproduces the problem: > > a ReST file named a.rst contains just : > RFC 1234 > > a Python script : > > import docutils.parsers.rst > text = open('a.rst').read() > parser = docutils.parsers.rst.Parser() > doc = docutils.utils.new_document('doc') > doc.settings.tab_width = 4 > doc.settings.pep_references = 1 > doc.settings.rfc_references = 1 > parser.parse(text, doc) The problem is that you're not using one of the standard front-end tools (rst2html.py, etc.). Your code doesn't use the Docutils public API. You really should be using one of the publisher convenience functions, described here: http://docutils.sourceforge.net/docs/api/publisher.html The order of setup is important, and you're missing a step: setting up the components and thus, the runtime settings. Docutils consists of several components (reader, parser, writer, etc.), each of which may define its own runtime settings. Docutils components are be assembled at runtime. My advice: use the convenience functions. That's the public API. > The problem was partially solved by forcing > doc.settings.rfc_references = 0 > > With doc.settings.rfc_references = 0, a file containing a bare RFC 1234 > string is parsed normally. But the problem remains when using the text role > :RFC:`1234`. > > Then, I had the idea to add : > doc.settings.rfc_base_url = 'http://www.ietf.org/rfc/' > > Which solved the problem. This is when I understood that my problem came > from the fact that my default configuration is incomplete. In fact : > > import docutils.parsers.rst > doc = docutils.utils.new_document('doc') > print(dir (doc.settings)) > > gives: > > ['__cmp__', '__doc__', '__init__', '__module__', '__repr__', '__str__', > '_config_files', '_destination', '_disable_config', '_source', '_update', > '_update_careful', '_update_loose', 'auto_id_prefix', 'config', 'copy', > 'datestamp', 'debug', 'dump_internals', 'dump_pseudo_xml', 'dump_settings', > 'dump_transforms', 'ensure_value', 'error_encoding', > 'error_encoding_error_handler', 'exit_status_level', 'expose_internals', > 'footnote_backlinks', 'generator', 'halt_level', 'id_prefix', > 'input_encoding', 'input_encoding_error_handler', 'language_code', > 'output_encoding', 'output_encoding_error_handler', 'read_file', > 'read_module', 'record_dependencies', 'report_level', 'sectnum_xform', > 'source_link', 'source_url', 'strict_visitor', 'strip_classes', > 'strip_comments', 'strip_elements_with_classes', 'title', 'toc_backlinks', > 'traceback', 'update', 'warning_stream'] > > As you can see, rfc_references (among others) is missing. The above does not include parser, reader, or writer settings. Those come from the components. > Right now, I don't know if the problem comes from my code (i.e. I have to > tell new_document() to use the default conf explicitely) or if it comes from > the distribution (macports). It's your code :-) -- David Goodger <http://python.net/~goodger> |
From: Rayene B. R. <ray...@gm...> - 2009-02-23 17:46:49
|
Everything is clear(er) now. Thanks :-) Just a last question before changing everything, In my code, I extract some fields like keywords from dom just before producing html. I use : parser.parse(text, doc) dom = doc.asdom() for f in dom.getElementsByTagName('field'): #do things Will I still be able to do something similar with the standard API easily ? in other words, publish_programmatically or no publish_programmatically ? Thanks, On Mon, Feb 23, 2009 at 5:57 PM, David Goodger <go...@py...> wrote: > On Mon, Feb 23, 2009 at 11:43, Rayene Ben Rayana > <ray...@gm...> wrote: > > Thanks for the answer David. > > > > I am using standard versions of docutils and python. By standard I mean > the > > ones provided by macports with no modifications. > > > > python version is 2.6.1 > > docutils version is 0.5 > > > > But you are right when you said that I modified the default > configuration. > > here is a minimal code that reproduces the problem: > > > > a ReST file named a.rst contains just : > > RFC 1234 > > > > a Python script : > > > > import docutils.parsers.rst > > text = open('a.rst').read() > > parser = docutils.parsers.rst.Parser() > > doc = docutils.utils.new_document('doc') > > doc.settings.tab_width = 4 > > doc.settings.pep_references = 1 > > doc.settings.rfc_references = 1 > > parser.parse(text, doc) > > The problem is that you're not using one of the standard front-end > tools (rst2html.py, etc.). Your code doesn't use the Docutils public > API. You really should be using one of the publisher convenience > functions, described here: > http://docutils.sourceforge.net/docs/api/publisher.html > > The order of setup is important, and you're missing a step: setting up > the components and thus, the runtime settings. Docutils consists of > several components (reader, parser, writer, etc.), each of which may > define its own runtime settings. Docutils components are be assembled > at runtime. > > My advice: use the convenience functions. That's the public API. > > > The problem was partially solved by forcing > > doc.settings.rfc_references = 0 > > > > With doc.settings.rfc_references = 0, a file containing a bare RFC 1234 > > string is parsed normally. But the problem remains when using the text > role > > :RFC:`1234`. > > > > Then, I had the idea to add : > > doc.settings.rfc_base_url = 'http://www.ietf.org/rfc/' > > > > Which solved the problem. This is when I understood that my problem came > > from the fact that my default configuration is incomplete. In fact : > > > > import docutils.parsers.rst > > doc = docutils.utils.new_document('doc') > > print(dir (doc.settings)) > > > > gives: > > > > ['__cmp__', '__doc__', '__init__', '__module__', '__repr__', '__str__', > > '_config_files', '_destination', '_disable_config', '_source', '_update', > > '_update_careful', '_update_loose', 'auto_id_prefix', 'config', 'copy', > > 'datestamp', 'debug', 'dump_internals', 'dump_pseudo_xml', > 'dump_settings', > > 'dump_transforms', 'ensure_value', 'error_encoding', > > 'error_encoding_error_handler', 'exit_status_level', 'expose_internals', > > 'footnote_backlinks', 'generator', 'halt_level', 'id_prefix', > > 'input_encoding', 'input_encoding_error_handler', 'language_code', > > 'output_encoding', 'output_encoding_error_handler', 'read_file', > > 'read_module', 'record_dependencies', 'report_level', 'sectnum_xform', > > 'source_link', 'source_url', 'strict_visitor', 'strip_classes', > > 'strip_comments', 'strip_elements_with_classes', 'title', > 'toc_backlinks', > > 'traceback', 'update', 'warning_stream'] > > > > As you can see, rfc_references (among others) is missing. > > The above does not include parser, reader, or writer settings. Those > come from the components. > > > Right now, I don't know if the problem comes from my code (i.e. I have > to > > tell new_document() to use the default conf explicitely) or if it comes > from > > the distribution (macports). > > It's your code :-) > > -- > David Goodger <http://python.net/~goodger <http://python.net/%7Egoodger>> > |
From: David G. <go...@py...> - 2009-02-23 17:51:05
|
On Mon, Feb 23, 2009 at 12:45, Rayene Ben Rayana <ray...@gm...> wrote: > Everything is clear(er) now. Thanks :-) > Just a last question before changing everything, > In my code, I extract some fields like keywords from dom just before > producing html. > I use : > > parser.parse(text, doc) > dom = doc.asdom() > for f in dom.getElementsByTagName('field'): > #do things > > Will I still be able to do something similar with the standard API easily ? > in other words, publish_programmatically or no publish_programmatically ? You could use the publish_doctree function, extract the info you want (and modify the doctree if you want), then finish the job with publish_from_doctree. If your use case doesn't fit that, go ahead and use your own publishing code, but use the convenience functions as guides so you do everything you need to do in the right order. -- David Goodger <http://python.net/~goodger> |