From: Peter L. S. <pe...@so...> - 2013-05-22 13:17:29
|
Hi, I am running docutils.core.publish_string inside some python scripts and I need to process a lot of REST files from other people, and they often contains errors. I need to do some preprocessing to the files before passing them onto docutils, so that is why I call publish_string from inside of Python instead of the command line tools. Currently, I just see errors and warnings on the command line like: <string>:73: (ERROR/3) Unexpected indentation. <string>:74: (WARNING/2) Block quote ends without a blank line; unexpected unindent. Are there some parameters that I can pass onto publish_string so that I can capture the error output instead of it going straight to stderr? Or some better method than calling publish_string ? Cheers, Peter. |
From: Michael P. <mic...@gm...> - 2013-05-22 13:41:19
|
Perhaps just identify and rectify the indentation/tabbing errors in the preprocessing by using python string split to first split the reST string source into lines and then using python string replace to correct mistabbing. Don't know of a publish method option for this but would be great if one existed. Sort of need a reST source string "lint" MIchael PS Ditto to being occasionally vexed by the seeming need for precise and consistent indentation in reST source for error free publishing . Emacs handles this for me but like you will have to implement checking/rectification for documents prepared for others On Wed, May 22, 2013 at 9:17 AM, Peter L. Soendergaard <pe...@so...>wrote: > Hi, > > I am running docutils.core.publish_string inside some python scripts and > I need to process a lot of REST files from other people, and they often > contains errors. I need to do some preprocessing to the files before > passing them onto docutils, so that is why I call publish_string from > inside of Python instead of the command line tools. > > Currently, I just see errors and warnings on the command line like: > > <string>:73: (ERROR/3) Unexpected indentation. > <string>:74: (WARNING/2) Block quote ends without a blank line; > unexpected unindent. > > Are there some parameters that I can pass onto publish_string so that I > can capture the error output instead of it going straight to stderr? > > Or some better method than calling publish_string ? > > Cheers, > Peter. > > > > > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may > _______________________________________________ > Docutils-users mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-users > > Please use "Reply All" to reply to the list. > -- Michael G. Prisant <Mic...@gm...> |
From: Michael P. <mic...@gm...> - 2013-05-22 13:48:51
|
Sorry this link should have been in the previous email: http://stackoverflow.com/questions/2077897/substitute-multiple-whitespace-with-single-whitespace-in-python And especially note this repsonse: A simple possibility (if you'd rather avoid REs) is ' '.join(mystring.split()) The split and join perform the task you're explicitly asking about -- plus, they also do the extra one that you don't talk about but is seen in your example, removing trailing spaces;-). share <http://stackoverflow.com/a/2077944>|improve this answer<http://stackoverflow.com/posts/2077944/edit> answered Jan 16 '10 at 15:54 <http://stackoverflow.com/users/95810/alex-martelli> Alex Martelli <http://stackoverflow.com/users/95810/alex-martelli> 261k27458810 1 Oh cool, I was fumbling with a similar solution, but using split(' ') and then a filter to remove empty elements. I never knew split with no arguments worked like this. This is also much faster, timeit.py gives me around 0.74usec for this, versus 5.75usec for regular expressions. – Roman Stolper <http://stackoverflow.com/users/217337/roman-stolper> Jan 16 '10 at 16:00<http://stackoverflow.com/questions/2077897/substitute-multiple-whitespace-with-single-whitespace-in-python#comment2008468_2077944> 1 @Roman, yes, x.split() (and x.split(None)) splits on *sequences of whitespace* (including tabs, newlines, etc, like re's \s) of length 1+ -- and it's pretty fast indeed. So, always glad to help! – Alex Martelli<http://stackoverflow.com/users/95810/alex-martelli> Jan 16 '10 at 16:25<http://stackoverflow.com/questions/2077897/substitute-multiple-whitespace-with-single-whitespace-in-python#comment2008566_2077944> On Wed, May 22, 2013 at 9:41 AM, Michael Prisant <mic...@gm...>wrote: > Perhaps just identify and rectify the indentation/tabbing errors in the > preprocessing by using python string split to first split the reST string > source into lines and then using python string replace to correct > mistabbing. > > Don't know of a publish method option for this but would be great if one > existed. Sort of need a reST source string "lint" > > MIchael > > PS Ditto to being occasionally vexed by the seeming need for precise and > consistent indentation in reST source for error free publishing . Emacs > handles this for me but like you will have to implement > checking/rectification for documents prepared for others > > > > On Wed, May 22, 2013 at 9:17 AM, Peter L. Soendergaard < > pe...@so...> wrote: > >> Hi, >> >> I am running docutils.core.publish_string inside some python scripts and >> I need to process a lot of REST files from other people, and they often >> contains errors. I need to do some preprocessing to the files before >> passing them onto docutils, so that is why I call publish_string from >> inside of Python instead of the command line tools. >> >> Currently, I just see errors and warnings on the command line like: >> >> <string>:73: (ERROR/3) Unexpected indentation. >> <string>:74: (WARNING/2) Block quote ends without a blank line; >> unexpected unindent. >> >> Are there some parameters that I can pass onto publish_string so that I >> can capture the error output instead of it going straight to stderr? >> >> Or some better method than calling publish_string ? >> >> Cheers, >> Peter. >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Try New Relic Now & We'll Send You this Cool Shirt >> New Relic is the only SaaS-based application performance monitoring >> service >> that delivers powerful full stack analytics. Optimize and monitor your >> browser, app, & servers with just a few lines of code. Try New Relic >> and get this awesome Nerd Life shirt! >> http://p.sf.net/sfu/newrelic_d2d_may >> _______________________________________________ >> Docutils-users mailing list >> Doc...@li... >> https://lists.sourceforge.net/lists/listinfo/docutils-users >> >> Please use "Reply All" to reply to the list. >> > > > > -- > Michael G. Prisant > <Mic...@gm...> -- Michael G. Prisant <Mic...@gm...> |
From: David G. <go...@py...> - 2013-05-22 19:58:37
|
On Wed, May 22, 2013 at 8:17 AM, Peter L. Soendergaard <pe...@so...> wrote: > Hi, > > I am running docutils.core.publish_string inside some python scripts and > I need to process a lot of REST files from other people, and they often > contains errors. I need to do some preprocessing to the files before > passing them onto docutils, so that is why I call publish_string from > inside of Python instead of the command line tools. > > Currently, I just see errors and warnings on the command line like: > > <string>:73: (ERROR/3) Unexpected indentation. > <string>:74: (WARNING/2) Block quote ends without a blank line; > unexpected unindent. > > Are there some parameters that I can pass onto publish_string so that I > can capture the error output instead of it going straight to stderr? > > Or some better method than calling publish_string ? publish_string seems like the right API for your use case. You can capture the text of the system messages by assigning a file-like object to the "warning_stream" setting. E.g. a StringIO.StringIO object or a custom object with a "write" method. This stream is used by docutils.utils.Reporter; the stream's write method is called once per error/warning. <stderr> is the default if no alternate stream is passed in. Or you could set "halt_level" appropriately (and "traceback" to True) to catch exceptions in try/except blocks. See http://docutils.sourceforge.net/docs/user/config.html Tip: the "source_path" parameter to publish_string will let you pass the source text's filename/path, which will then be reported in the system messages (currently only "<string>", because Docutils doesn't know where the text came from). -- David Goodger <http://python.net/~goodger> |
From: Peter L. S. <pe...@so...> - 2013-05-23 07:30:05
|
On 05/22/2013 09:57 PM, David Goodger wrote: > On Wed, May 22, 2013 at 8:17 AM, Peter L. Soendergaard > <pe...@so...> wrote: >> Hi, >> >> I am running docutils.core.publish_string inside some python scripts and >> I need to process a lot of REST files from other people, and they often >> contains errors. I need to do some preprocessing to the files before >> passing them onto docutils, so that is why I call publish_string from >> inside of Python instead of the command line tools. >> >> Currently, I just see errors and warnings on the command line like: >> >> <string>:73: (ERROR/3) Unexpected indentation. >> <string>:74: (WARNING/2) Block quote ends without a blank line; >> unexpected unindent. >> >> Are there some parameters that I can pass onto publish_string so that I >> can capture the error output instead of it going straight to stderr? >> >> Or some better method than calling publish_string ? > publish_string seems like the right API for your use case. > > You can capture the text of the system messages by assigning a > file-like object to the "warning_stream" setting. E.g. a > StringIO.StringIO object or a custom object with a "write" method. > This stream is used by docutils.utils.Reporter; the stream's write > method is called once per error/warning. <stderr> is the default if no > alternate stream is passed in. > > Or you could set "halt_level" appropriately (and "traceback" to True) > to catch exceptions in try/except blocks. > > See http://docutils.sourceforge.net/docs/user/config.html > > Tip: the "source_path" parameter to publish_string will let you pass > the source text's filename/path, which will then be reported in the > system messages (currently only "<string>", because Docutils doesn't > know where the text came from). > > -- > David Goodger <http://python.net/~goodger> Thanks, this was just what I needed. I will start working on it. Cheers, Peter. |
From: Michael P. <mic...@gm...> - 2013-05-22 20:40:32
|
Appreciate the explanation of source path configuration which I hadn't fully understand. And the summary of various modalities for error handling in a program setting as well as directly answering the post. The indentation which currently is pretty easy to identify/correct in individual reST text documents is harder to deal when batch processing many documents from multiple authors. One way or another the errors are identified when the text source is handed off to the docutils library. But has someone authored a batch corrector/reformator for multiple reST text source with some common author errors like indentation? Michael On Wed, May 22, 2013 at 3:57 PM, David Goodger <go...@py...> wrote: > On Wed, May 22, 2013 at 8:17 AM, Peter L. Soendergaard > <pe...@so...> wrote: > > Hi, > > > > I am running docutils.core.publish_string inside some python scripts and > > I need to process a lot of REST files from other people, and they often > > contains errors. I need to do some preprocessing to the files before > > passing them onto docutils, so that is why I call publish_string from > > inside of Python instead of the command line tools. > > > > Currently, I just see errors and warnings on the command line like: > > > > <string>:73: (ERROR/3) Unexpected indentation. > > <string>:74: (WARNING/2) Block quote ends without a blank line; > > unexpected unindent. > > > > Are there some parameters that I can pass onto publish_string so that I > > can capture the error output instead of it going straight to stderr? > > > > Or some better method than calling publish_string ? > > publish_string seems like the right API for your use case. > > You can capture the text of the system messages by assigning a > file-like object to the "warning_stream" setting. E.g. a > StringIO.StringIO object or a custom object with a "write" method. > This stream is used by docutils.utils.Reporter; the stream's write > method is called once per error/warning. <stderr> is the default if no > alternate stream is passed in. > > Or you could set "halt_level" appropriately (and "traceback" to True) > to catch exceptions in try/except blocks. > > See http://docutils.sourceforge.net/docs/user/config.html > > Tip: the "source_path" parameter to publish_string will let you pass > the source text's filename/path, which will then be reported in the > system messages (currently only "<string>", because Docutils doesn't > know where the text came from). > > -- > David Goodger <http://python.net/~goodger> > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may > _______________________________________________ > Docutils-users mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-users > > Please use "Reply All" to reply to the list. > -- Michael G. Prisant <Mic...@gm...> |
From: David G. <go...@py...> - 2013-05-22 20:50:24
|
On Wed, May 22, 2013 at 3:40 PM, Michael Prisant <mic...@gm...> wrote: > Appreciate the explanation of source path configuration which I hadn't fully > understand. And the summary of various modalities for error handling in a > program setting as well as directly answering the post. > > The indentation which currently is pretty easy to identify/correct in > individual reST text documents is harder to deal when batch processing many > documents from multiple authors. One way or another the errors are > identified when the text source is handed off to the docutils library. But > has someone authored a batch corrector/reformator for multiple reST text > source with some common author errors like indentation? I don't know of any such tool. A perfect tool is, I believe, impossible. The indentation depends on the intent of the author, and multiple indentations are possible at almost any point in the text (e.g. block quotes, definition lists). Perhaps a "good enough" tool may be possible. -- DG > On Wed, May 22, 2013 at 3:57 PM, David Goodger <go...@py...> wrote: >> >> On Wed, May 22, 2013 at 8:17 AM, Peter L. Soendergaard >> <pe...@so...> wrote: >> > Hi, >> > >> > I am running docutils.core.publish_string inside some python scripts and >> > I need to process a lot of REST files from other people, and they often >> > contains errors. I need to do some preprocessing to the files before >> > passing them onto docutils, so that is why I call publish_string from >> > inside of Python instead of the command line tools. >> > >> > Currently, I just see errors and warnings on the command line like: >> > >> > <string>:73: (ERROR/3) Unexpected indentation. >> > <string>:74: (WARNING/2) Block quote ends without a blank line; >> > unexpected unindent. >> > >> > Are there some parameters that I can pass onto publish_string so that I >> > can capture the error output instead of it going straight to stderr? >> > >> > Or some better method than calling publish_string ? >> >> publish_string seems like the right API for your use case. >> >> You can capture the text of the system messages by assigning a >> file-like object to the "warning_stream" setting. E.g. a >> StringIO.StringIO object or a custom object with a "write" method. >> This stream is used by docutils.utils.Reporter; the stream's write >> method is called once per error/warning. <stderr> is the default if no >> alternate stream is passed in. >> >> Or you could set "halt_level" appropriately (and "traceback" to True) >> to catch exceptions in try/except blocks. >> >> See http://docutils.sourceforge.net/docs/user/config.html >> >> Tip: the "source_path" parameter to publish_string will let you pass >> the source text's filename/path, which will then be reported in the >> system messages (currently only "<string>", because Docutils doesn't >> know where the text came from). >> >> -- >> David Goodger <http://python.net/~goodger> >> >> >> ------------------------------------------------------------------------------ >> Try New Relic Now & We'll Send You this Cool Shirt >> New Relic is the only SaaS-based application performance monitoring >> service >> that delivers powerful full stack analytics. Optimize and monitor your >> browser, app, & servers with just a few lines of code. Try New Relic >> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may >> _______________________________________________ >> Docutils-users mailing list >> Doc...@li... >> https://lists.sourceforge.net/lists/listinfo/docutils-users >> >> Please use "Reply All" to reply to the list. > > > > > -- > Michael G. Prisant |