Sorry this link should have been in the previous email:

http://stackoverflow.com/questions/2077897/substitute-multiple-whitespace-with-single-whitespace-in-python

And especially note this repsonse:

A simple possibility (if you'd rather avoid REs) is

' '.join(mystring.split())

The split and join perform the task you're explicitly asking about -- plus, they also do the extra one that you don't talk about but is seen in your example, removing trailing spaces;-).

share|improve this answer

1  
Oh cool, I was fumbling with a similar solution, but using split(' ') and then a filter to remove empty elements. I never knew split with no arguments worked like this. This is also much faster, timeit.py gives me around 0.74usec for this, versus 5.75usec for regular expressions. – Roman Stolper Jan 16 '10 at 16:00
1  
@Roman, yes, x.split() (and x.split(None)) splits on sequences of whitespace (including tabs, newlines, etc, like re's \s) of length 1+ -- and it's pretty fast indeed. So, always glad to help! – Alex Martelli Jan 16 '10 at 16:25



On Wed, May 22, 2013 at 9:41 AM, Michael Prisant <michael.prisant@gmail.com> wrote:
Perhaps just identify and rectify the indentation/tabbing errors in the preprocessing by using python string split to first split the reST string source into lines  and then using python string replace to correct mistabbing.

Don't know of a publish method option for this but would be great if one existed.  Sort of need a reST source string "lint"

MIchael

PS Ditto to being occasionally vexed by the seeming need for precise and consistent indentation in reST source for error free publishing .  Emacs handles this for me but like you will have to implement checking/rectification for documents prepared for others



On Wed, May 22, 2013 at 9:17 AM, Peter L. Soendergaard <peter@sonderport.dk> wrote:
Hi,

I am running docutils.core.publish_string inside some python scripts and
I need to process a lot of REST files from other people, and they often
contains errors. I need to do some preprocessing to the files before
passing them onto docutils, so that is why I call publish_string from
inside of Python instead of the command line tools.

Currently, I just see errors and warnings on the command line like:

<string>:73: (ERROR/3) Unexpected indentation.
<string>:74: (WARNING/2) Block quote ends without a blank line;
unexpected unindent.

Are there some parameters that I can pass onto publish_string so that I
can capture the error output instead of it going straight to stderr?

Or some better method than calling publish_string ?

Cheers,
Peter.





------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Docutils-users mailing list
Docutils-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.



--
Michael G. Prisant



--
Michael G. Prisant