From: Dave P. <dav...@gm...> - 2012-12-08 11:44:06
|
I'm having an issue with an encoding character. Called from the cmd line with -e utf-8 I get UnicodeEncodeError: 'ascii' codec can't encode character u'\u2234' in position 810: ordinal not in range(128) when I include 0x2234 in the .md file What am I getting wrong please? It seems the encoding options aren't documented anywhere? TIA -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |
From: Waylan L. <wa...@gm...> - 2012-12-09 15:58:12
|
Dave, I'll need more info to be able to help you. What version of Python-Markdown are you using? What version of Python? What specifically are you trying to do? Can you provide a minimal document and command that replicates the problem? Without that info, I can't say for sure, but it is possible that this is the same bug I recently fixed here (the error message matches): https://github.com/waylan/Python-Markdown/issues/158 If you apply the two commits referenced in that discussion, does that fix your problem? On Sat, Dec 8, 2012 at 6:43 AM, Dave Pawson <dav...@gm...> wrote: > I'm having an issue with an encoding character. > > Called from the cmd line with > -e utf-8 > > I get > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2234' in > position 810: ordinal not in range(128) > > when I include 0x2234 in the .md file > > What am I getting wrong please? > It seems the encoding options aren't documented anywhere? > > > TIA > > -- > Dave Pawson > XSLT XSL-FO FAQ. > Docbook FAQ. > http://www.dpawson.co.uk > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss -- ---- \X/ /-\ `/ |_ /-\ |\| Waylan Limberg |
From: Dave P. <dav...@gm...> - 2012-12-10 07:34:23
|
On 9 December 2012 15:57, Waylan Limberg <wa...@gm...> wrote: > Dave, > > I'll need more info to be able to help you. What version of > Python-Markdown are you using? What version of Python? What > specifically are you trying to do? Can you provide a minimal document > and command that replicates the problem? python 2.7.3 md.py from couple of days ago, so guess latest. I want to use utf-8 in output. the problem is now resolved, but others may see it. Solution below. I want HTML output, not the body element contents? I'm not sure why markdown.py doesn't do this. Is there a good reason? So I created a bash script bn=`nameonly $1` op=${bn}.html rmif tmp.tmp echo -e """ <!DOCTYPE html> <html> <head> <title> $1 </title> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /> <link rel=\"stylesheet\" href=\"/styles/css/md.css\" type=\"text/css\" /> </head> <body>""" >$op #python -m markdown -v -e utf-8 -o xhtml1 -x extra --noisy >> ${op} python -m markdown -v -e utf-8 -o xhtml1 -x extra --noisy -f tmp.tmp ${1} cat tmp.tmp >>${op} echo """ </body> </html> """ >> $op Not that I was redirecting output to the output file? (The commented out line) This created the problem. It would seem to be a difference between redirection and cp and >> The replacement line solves that problem. It would seem redirected output is unable to utilise an appropriate character set. The input which shows this error. 3. ∴ henrys car has four doors. (given) (u2234) is the character. HTH -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |
From: Brian N. <bg...@gm...> - 2012-12-10 13:49:59
|
On Mon, Dec 10, 2012 at 1:34 AM, Dave Pawson <dav...@gm...> wrote: > Not that I was redirecting output to the output file? (The commented out > line) > This created the problem. It would seem to be a difference between > redirection and > cp and >> > The replacement line solves that problem. > It would seem redirected output is unable to utilise an appropriate > character set. > I've run into this before. It isn't a problem with Markdown, more of an issue with Python 2.7 and redirection to a pipe. See http://stackoverflow.com/questions/4545661/unicodedecodeerror-when-redirecting-to-file In particular the answer by Mark Tolonen has helped me in the past. Regards, BN |
From: Dave P. <dav...@gm...> - 2012-12-10 14:58:26
|
On 10 December 2012 13:49, Brian Neal <bg...@gm...> wrote: > > On Mon, Dec 10, 2012 at 1:34 AM, Dave Pawson <dav...@gm...> wrote: >> >> Not that I was redirecting output to the output file? (The commented out >> line) >> This created the problem. It would seem to be a difference between >> redirection and >> cp and >> >> The replacement line solves that problem. >> It would seem redirected output is unable to utilise an appropriate >> character set. > > > I've run into this before. It isn't a problem with Markdown, more of an > issue with Python 2.7 and redirection to a pipe. See > > http://stackoverflow.com/questions/4545661/unicodedecodeerror-when-redirecting-to-file > > In particular the answer by Mark Tolonen has helped me in the past. "Python 3 defaults to 'utf8', but based on the OP's sample, he's using Python 2.X, which defaults to 'ascii'. – Mark Tolonen Jan 5 '11 at 18:49" That explains it. Thanks Brian. Nice to know why. I'm on Fedora, which uses Python heavily for scripts, so I'm reluctant to install 3 ahead of Fedora. My solution works which is enough for me. regards -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |
From: Brian N. <bg...@gm...> - 2012-12-10 15:16:35
|
On Mon, Dec 10, 2012 at 8:58 AM, Dave Pawson <dav...@gm...> wrote: > > I'm on Fedora, which uses Python heavily for scripts, so I'm reluctant > to install 3 ahead of Fedora. > My solution works which is enough for me. You can also set the environment variable PYTHONIOENCODING if you find yourself needing to do a lot of redirection or piping. Regards, BN |
From: Dave P. <dav...@gm...> - 2012-12-10 15:50:15
|
On 10 December 2012 15:16, Brian Neal <bg...@gm...> wrote: > On Mon, Dec 10, 2012 at 8:58 AM, Dave Pawson <dav...@gm...> wrote: >> >> I'm on Fedora, which uses Python heavily for scripts, so I'm reluctant >> to install 3 ahead of Fedora. >> My solution works which is enough for me. > > > You can also set the environment variable PYTHONIOENCODING if you find > yourself needing to do a lot of redirection or piping. > Useful to know... http://docs.python.org/2/using/cmdline.html?highlight=pythonioencoding#PYTHONIOENCODING presumable export PYTHONIOENCODING=utf-8 I think that is worth adding to the user docs? The values taken by the -e option are missing too. regards -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |
From: Brian N. <bg...@gm...> - 2012-12-10 15:48:30
|
On Mon, Dec 10, 2012 at 9:16 AM, Brian Neal <bg...@gm...> wrote: > You can also set the environment variable PYTHONIOENCODING if you find > yourself needing to do a lot of redirection or piping. Having said that, I went back to the script that I wrote that was having this problem. I fixed it by using unicode everywhere but then picking an encoding on output, something like: print my_string.encode('utf-8') It does look to me like Markdown is trying to do something like this if you run it as a module. Perhaps Waylan can chime in here. -BN |
From: Waylan L. <wa...@gm...> - 2012-12-10 16:19:29
|
On Mon, Dec 10, 2012 at 10:48 AM, Brian Neal <bg...@gm...> wrote: > On Mon, Dec 10, 2012 at 9:16 AM, Brian Neal <bg...@gm...> wrote: >> You can also set the environment variable PYTHONIOENCODING if you find >> yourself needing to do a lot of redirection or piping. > > Having said that, I went back to the script that I wrote that was > having this problem. I fixed it by using unicode everywhere but then > picking an encoding on output, something like: > > print my_string.encode('utf-8') > > It does look to me like Markdown is trying to do something like this > if you run it as a module. Perhaps Waylan can chime in here. > Yes, I have striven to have the code do the same thing regardless of what version of python is being used. Within the markdown.FromFile method, input is decoded to unicode (regardless of import source) and then forwarded to markdown, and the output is encoded (regardless of output method: file, stdout, etc) and written out as bytes (even in python3). In both instances (encoding and decoding) the character encoding used is the same user defined encoding (or defaults to uft8 if not defined). There were a few edge cases where the code was failing to take the encoding into account (PYTHONENCODING would have made a difference), however, I don't think that is the case anymore. Although perhaps the fallback default should perhaps be set to PYTHONENCODING rather than hardcoded to utf8. The relevant code is here: https://github.com/waylan/Python-Markdown/blob/master/markdown/__init__.py#L324 -- ---- \X/ /-\ `/ |_ /-\ |\| Waylan Limberg |
From: Dave P. <dav...@gm...> - 2012-12-10 16:30:16
|
I think the error I was getting shows that didn't happen in my case? 0x2234 as the character Python 2.7 Emacs encoding set to utf-8 cmd line either defaulted or -e utf-8 (if that is right?) markdown >> file.ext I received an error. HTH Dave On 10 December 2012 16:18, Waylan Limberg <wa...@gm...> wrote: > On Mon, Dec 10, 2012 at 10:48 AM, Brian Neal <bg...@gm...> wrote: >> On Mon, Dec 10, 2012 at 9:16 AM, Brian Neal <bg...@gm...> wrote: >>> You can also set the environment variable PYTHONIOENCODING if you find >>> yourself needing to do a lot of redirection or piping. >> >> Having said that, I went back to the script that I wrote that was >> having this problem. I fixed it by using unicode everywhere but then >> picking an encoding on output, something like: >> >> print my_string.encode('utf-8') >> >> It does look to me like Markdown is trying to do something like this >> if you run it as a module. Perhaps Waylan can chime in here. >> > > Yes, I have striven to have the code do the same thing regardless of > what version of python is being used. Within the markdown.FromFile > method, input is decoded to unicode (regardless of import source) and > then forwarded to markdown, and the output is encoded (regardless of > output method: file, stdout, etc) and written out as bytes (even in > python3). In both instances (encoding and decoding) the character > encoding used is the same user defined encoding (or defaults to uft8 > if not defined). > > There were a few edge cases where the code was failing to take the > encoding into account (PYTHONENCODING would have made a difference), > however, I don't think that is the case anymore. Although perhaps the > fallback default should perhaps be set to PYTHONENCODING rather than > hardcoded to utf8. > > The relevant code is here: > https://github.com/waylan/Python-Markdown/blob/master/markdown/__init__.py#L324 > > -- > ---- > \X/ /-\ `/ |_ /-\ |\| > Waylan Limberg -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |
From: Waylan L. <wa...@gm...> - 2012-12-10 17:05:51
|
On Mon, Dec 10, 2012 at 11:30 AM, Dave Pawson <dav...@gm...> wrote: > I think the error I was getting shows that didn't happen in my case? > 0x2234 as the character > Python 2.7 > Emacs encoding set to utf-8 > > cmd line either defaulted or -e utf-8 (if that is right?) > markdown >> file.ext > > I received an error. > And I can't seem duplicate that error on my system with the latest code. Which makes me wonder if perhaps the problem is with your script, but I can't say for sure. -- ---- \X/ /-\ `/ |_ /-\ |\| Waylan Limberg |
From: Dave P. <dav...@gm...> - 2012-12-10 17:29:22
|
On 10 December 2012 17:05, Waylan Limberg <wa...@gm...> wrote: > On Mon, Dec 10, 2012 at 11:30 AM, Dave Pawson <dav...@gm...> wrote: >> I think the error I was getting shows that didn't happen in my case? >> 0x2234 as the character >> Python 2.7 >> Emacs encoding set to utf-8 >> >> cmd line either defaulted or -e utf-8 (if that is right?) >> markdown >> file.ext >> >> I received an error. >> > > And I can't seem duplicate that error on my system with the latest > code. Which makes me wonder if perhaps the problem is with your > script, but I can't say for sure. Script as in bash script? #!/bin/bash source ~/bin/dpFunctions.sh # Run python markdown with following extensions # abbr, def_list # See /installation/python/pymarkdown bn=`nameonly $1` op=${bn}.html rmif tmp.tmp echo -e """ <!DOCTYPE html> <html> <head> <title> $1 </title> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /> <link rel=\"stylesheet\" href=\"/styles/css/md.css\" type=\"text/css\" /> </head> <body>""" >$op python -m markdown -v -e utf-8 -o xhtml1 -x extra -x toc --noisy -f tmp.tmp ${1} cat tmp.tmp >>${op} echo """ </body> </html> """ >> $op Now corrected. If you run that more simply echo -e "<html>" >op.html python -m markdown -v -e utf-8 -o xhtml1 -x extra -x toc --noisy >>op.html echo -e "</html>" >>op.html with a md file containing 0x2234 you should see the error. Running directly or with -f sidesteps this one. HTH -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |
From: Waylan L. <wa...@gm...> - 2012-12-10 18:05:41
|
Nope, I still can't replicate the problem. Just to be clear, I am not using the latest release available on pypi, but the most recent code in the git repo on GitHub [1] (which contains a patch for this kind of issue). In order words, I believe the bug has been fixed - with the fix becoming available in the next release. If however, you are using the latest from Github and still seeing the problem, all I can suggest is to set PYTHONENCODING (or avoid using pipes). [1]: https://github.com/waylan/Python-Markdown On Mon, Dec 10, 2012 at 12:29 PM, Dave Pawson <dav...@gm...> wrote: > On 10 December 2012 17:05, Waylan Limberg <wa...@gm...> wrote: >> On Mon, Dec 10, 2012 at 11:30 AM, Dave Pawson <dav...@gm...> wrote: >>> I think the error I was getting shows that didn't happen in my case? >>> 0x2234 as the character >>> Python 2.7 >>> Emacs encoding set to utf-8 >>> >>> cmd line either defaulted or -e utf-8 (if that is right?) >>> markdown >> file.ext >>> >>> I received an error. >>> >> >> And I can't seem duplicate that error on my system with the latest >> code. Which makes me wonder if perhaps the problem is with your >> script, but I can't say for sure. > > Script as in bash script? > > #!/bin/bash > source ~/bin/dpFunctions.sh > # Run python markdown with following extensions > # abbr, def_list > # See /installation/python/pymarkdown > > bn=`nameonly $1` > op=${bn}.html > rmif tmp.tmp > > echo -e """ > <!DOCTYPE html> > <html> > <head> > <title> $1 </title> > <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /> > <link rel=\"stylesheet\" href=\"/styles/css/md.css\" type=\"text/css\" /> > > </head> > <body>""" >$op > python -m markdown -v -e utf-8 -o xhtml1 -x extra -x toc --noisy -f > tmp.tmp ${1} > cat tmp.tmp >>${op} > echo """ > </body> > </html> > """ >> $op > > Now corrected. > > If you run that more simply > > echo -e "<html>" >op.html > python -m markdown -v -e utf-8 -o xhtml1 -x extra -x toc --noisy >>op.html > echo -e "</html>" >>op.html > > with a md file containing 0x2234 you should see the error. > > Running directly or with -f sidesteps this one. > > HTH > > > > -- > Dave Pawson > XSLT XSL-FO FAQ. > Docbook FAQ. > http://www.dpawson.co.uk > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss -- ---- \X/ /-\ `/ |_ /-\ |\| Waylan Limberg |
From: Dave P. <dav...@gm...> - 2012-12-10 18:18:33
|
On 10 December 2012 18:05, Waylan Limberg <wa...@gm...> wrote: > Nope, I still can't replicate the problem. Just to be clear, I am not > using the latest release available on pypi, but the most recent code > in the git repo on GitHub [1] (which contains a patch for this kind of > issue). In order words, I believe the bug has been fixed - with the > fix becoming available in the next release. If however, you are using > the latest from Github and still seeing the problem, all I can suggest > is to set PYTHONENCODING (or avoid using pipes). > > [1]: https://github.com/waylan/Python-Markdown Is there a Pythonic way to install from github please? regards > > On Mon, Dec 10, 2012 at 12:29 PM, Dave Pawson <dav...@gm...> wrote: >> On 10 December 2012 17:05, Waylan Limberg <wa...@gm...> wrote: >>> On Mon, Dec 10, 2012 at 11:30 AM, Dave Pawson <dav...@gm...> wrote: >>>> I think the error I was getting shows that didn't happen in my case? >>>> 0x2234 as the character >>>> Python 2.7 >>>> Emacs encoding set to utf-8 >>>> >>>> cmd line either defaulted or -e utf-8 (if that is right?) >>>> markdown >> file.ext >>>> >>>> I received an error. >>>> >>> >>> And I can't seem duplicate that error on my system with the latest >>> code. Which makes me wonder if perhaps the problem is with your >>> script, but I can't say for sure. >> >> Script as in bash script? >> >> #!/bin/bash >> source ~/bin/dpFunctions.sh >> # Run python markdown with following extensions >> # abbr, def_list >> # See /installation/python/pymarkdown >> >> bn=`nameonly $1` >> op=${bn}.html >> rmif tmp.tmp >> >> echo -e """ >> <!DOCTYPE html> >> <html> >> <head> >> <title> $1 </title> >> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /> >> <link rel=\"stylesheet\" href=\"/styles/css/md.css\" type=\"text/css\" /> >> >> </head> >> <body>""" >$op >> python -m markdown -v -e utf-8 -o xhtml1 -x extra -x toc --noisy -f >> tmp.tmp ${1} >> cat tmp.tmp >>${op} >> echo """ >> </body> >> </html> >> """ >> $op >> >> Now corrected. >> >> If you run that more simply >> >> echo -e "<html>" >op.html >> python -m markdown -v -e utf-8 -o xhtml1 -x extra -x toc --noisy >>op.html >> echo -e "</html>" >>op.html >> >> with a md file containing 0x2234 you should see the error. >> >> Running directly or with -f sidesteps this one. >> >> HTH >> >> >> >> -- >> Dave Pawson >> XSLT XSL-FO FAQ. >> Docbook FAQ. >> http://www.dpawson.co.uk >> >> ------------------------------------------------------------------------------ >> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial >> Remotely access PCs and mobile devices and provide instant support >> Improve your efficiency, and focus on delivering more value-add services >> Discover what IT Professionals Know. Rescue delivers >> http://p.sf.net/sfu/logmein_12329d2d >> _______________________________________________ >> Python-markdown-discuss mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > -- > ---- > \X/ /-\ `/ |_ /-\ |\| > Waylan Limberg -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |
From: Dave P. <dav...@gm...> - 2012-12-10 18:31:49
|
On 10 December 2012 18:05, Waylan Limberg <wa...@gm...> wrote: > Nope, I still can't replicate the problem. Just to be clear, I am not > using the latest release available on pypi, but the most recent code > in the git repo on GitHub [1] (which contains a patch for this kind of > issue). In order words, I believe the bug has been fixed - with the > fix becoming available in the next release. If however, you are using > the latest from Github and still seeing the problem, all I can suggest > is to set PYTHONENCODING (or avoid using pipes). > > [1]: https://github.com/waylan/Python-Markdown Installed (the hard way!). Now handles utf-8 as default, includeing redirection. Now working, shows ]$ python -m markdown --version __main__.py 2.2.0 Thanks. ps. Why is it called xhtml1 output, when there is no <html> and <head/> markup? It's clearly wrong. regards -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |
From: Waylan L. <wa...@gm...> - 2012-12-10 19:27:58
|
On Mon, Dec 10, 2012 at 1:31 PM, Dave Pawson <dav...@gm...> wrote: > > Now working, Glad to hear > ps. Why is it called xhtml1 output, when there is no <html> and <head/> markup? > It's clearly wrong. Python-Markdown has two goals: 1. Implement markdown in Python as closely as reasonably possible to the original perl implementation. 2. Provide an API which allows any of the behavior to be overriden. The answer to your question (which I believe is a follow-up to an earlier question you asked and I failed to respond to) relates to the first goal. Simply put, that is the way markdown.pl does it. It outputs xhtml style syntax (for example `<br />` rather than `<br>`) so we do too. Of course, unlike markdown.pl we also provide html style output as an option (and even allow the serializer to be overridden via python subclassing). But that's not the only reason. Python-Markdown is a library first and a commandline script second (the commandline script is a light wrapper around the library). Most users are using the library from within python based web server backends (like Django) and pass the output to various template systems. A oversimplfied `<html>` wrapper would be useless in such settings. Even when I use it from the commandline, I always want something more complex than I could justify including in a default implementation. In fact, whenever I use other tools which do wrap their output in `<html>` tags, I always find myself writing code to strip those wrappers and then passing the contents of the body to my own templates. Therefore, in Python-Markdown it is left for the user to implement however s/he wishes. Of course, given goal #2, it wouldn't be too much trouble to write an extension which wrapped the output. Although, if you're using the commandline, a shell script certainly makes more sense. -- ---- \X/ /-\ `/ |_ /-\ |\| Waylan Limberg |
From: Brian N. <bg...@gm...> - 2012-12-10 18:44:39
|
On Mon, Dec 10, 2012 at 12:18 PM, Dave Pawson <dav...@gm...> wrote: > Is there a Pythonic way to install from github please? $ pip install git+https://github.com/waylan/Python-Markdown.git I highly recommend using pip + virtualenv. http://www.pip-installer.org/en/latest/index.html http://www.virtualenv.org/en/latest/ Best, BN |
From: Dave P. <dav...@gm...> - 2012-12-10 19:10:50
|
On 10 December 2012 18:44, Brian Neal <bg...@gm...> wrote: > On Mon, Dec 10, 2012 at 12:18 PM, Dave Pawson <dav...@gm...> wrote: >> Is there a Pythonic way to install from github please? > > $ pip install git+https://github.com/waylan/Python-Markdown.git > > I highly recommend using pip + virtualenv. > > http://www.pip-installer.org/en/latest/index.html > http://www.virtualenv.org/en/latest/ .... OK pip installed. I now have the 2.3 version installed.... but seemingly only within this virtual environment? $ . myenv/bin/activate (myenv)[dpawson@homer python]$ python -m markdown --version __main__.py 2.3.dev But [dpawson@homer ~]$ python -m markdown --version __main__.py 2.2.0 So I guess I need to move pymarkdown from virtual to 'real'? 1. How to get rid of 2.2.0 please ? Installed via easy_install 2. How to replace it with 2.3? TIA -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk |