I need to perform simple text parsing and everything works fine except printing the output.
The code is the following:
# -*- coding: utf-8 -*- from pyparsing import Word, OneOrMore inFilename = 'out1.txt' FIN = open(inFilename, 'r') TEXT = FIN.read() myDigits = '0123456789' eng_alphas = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' rus_alphas = 'йцукенгшщзхъфывапролджэячсмитьбюЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮ' punctuation = '.,:;' myPrintables = myDigits + eng_alphas + rus_alphas + punctuation aWord = Word(myPrintables) someText = OneOrMore(aWord) outputText = someText.parseString(TEXT) print outputText
or here: https://github.com/evovch/Useful/blob/master/test2.py
I provide input text file with one line:
восстановление короткоживущих частиц, включая очень редкие, по продуктам их распадов;
And get the following output:
['\xd0\xb2\xd0\xbe\xd1\x81\xd1\x81\xd1\x82\xd0\xb0\xd0\xbd\xd0\xbe\xd0\xb2\xd0\xbb\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5', '\xd0\xba\xd0\xbe\xd1\x80\xd0\xbe\xd1\x82\xd0\xba\xd0\xbe\xd0\xb6\xd0\xb8\xd0\xb2\xd1\x83\xd1\x89\xd0\xb8\xd1\x85', '\xd1\x87\xd0\xb0\xd1\x81\xd1\x82\xd0\xb8\xd1\x86,', '\xd0\xb2\xd0\xba\xd0\xbb\xd1\x8e\xd1\x87\xd0\xb0\xd1\x8f', '\xd0\xbe\xd1\x87\xd0\xb5\xd0\xbd\xd1\x8c', '\xd1\x80\xd0\xb5\xd0\xb4\xd0\xba\xd0\xb8\xd0\xb5,', '\xd0\xbf\xd0\xbe', '\xd0\xbf\xd1\x80\xd0\xbe\xd0\xb4\xd1\x83\xd0\xba\xd1\x82\xd0\xb0\xd0\xbc', '\xd0\xb8\xd1\x85', '\xd1\x80\xd0\xb0\xd1\x81\xd0\xbf\xd0\xb0\xd0\xb4\xd0\xbe\xd0\xb2;']
How could I convert this into the readable text? I played a lot with encode/decode but could not get any result.
It looks like you are on Python 2, the unicode handling in Python 3 is much better. Can you try this?
for wd in outputText: print(wd)
or
print(u' '.join(outputText))
It may be that you are getting this because you are printing the parse results directly, which will use Python's repr function to display the strings.
repr
-- Paul
Thank you, Paul! The first solution works nice, while in the second I had to leave out the 'u' specificator.
So:
finalOutput = ' '.join(outputText) print finalOutput
Looks to be out of logic, but ok... You think upgrading to python 3 will help?
Log in to post a comment.
I need to perform simple text parsing and everything works fine except printing the output.
The code is the following:
or here:
https://github.com/evovch/Useful/blob/master/test2.py
I provide input text file with one line:
And get the following output:
How could I convert this into the readable text?
I played a lot with encode/decode but could not get any result.
It looks like you are on Python 2, the unicode handling in Python 3 is much better. Can you try this?
or
It may be that you are getting this because you are printing the parse results directly, which will use Python's
repr
function to display the strings.-- Paul
Thank you, Paul!
The first solution works nice, while in the second I had to leave out the 'u' specificator.
So:
Looks to be out of logic, but ok...
You think upgrading to python 3 will help?