I am newbie in python, I write a script which read the text file (d:\subsitutions.txt) and searh and replace the content to all files in the target folder (d:\temp\a), but the result is not found because of each search string has the Byte Order Mark in front of the search string
The subsitutions.txt file structure as following which save as UTF-8 BOM
an8 an7
Now my program is written in following which search and show the line first
#coding: UTF-8importosimportsysconsole.write('Program Start !!\n')filePathSrc=u'D:\\TEMP\\a'subsitutionFile=u'D:\\subsitutions.txt'console.write(u'Source: '+filePathSrc+'\n')forroot,dirs,filesinos.walk(filePathSrc):console.write('Searching '+root+'\n')forfninfiles:fileName=root+'\\'+fnconsole.write(u'fileName: '+fileName+'\n')notepad.open(fileName.encode('utf-8'))# replace value in subsitution file, separate values with space# Foramat# A Bwithopen(subsitutionFile)asf:forlinf:iflen(l)>1:s=l.split()console.write('from:'+'"'+s[0]+'"'+'\t to:'+'"'+s[1]+'"'+'\n')startPos=0whileTrue:pos=editor.findText(FINDOPTION.REGEXP,startPos,editor.getLength(),s[0])ifposisNone:ifstartPos==0:console.write(s[0]+' not found !!\n')breakelse:editor.gotoPos(pos[0])console.write(str(editor.lineFromPosition(editor.getCurrentPos()))+'['+str(editor.getCurrentPos())+']: '+editor.getCurLine())startPos=pos[0]+1#editor.replace(s[0], s[1])f.close()#notepad.save()notepad.close()
Result in the Console
Program Start !!
Source: D:\TEMP\a
Searching D:\TEMP\a
fileName: D:\TEMP\a\testing.txt
from:"an8" to:"an7"
an8 not found !!
The variable of s[0] has the \ufeff in front of an8
Because finally this subitution file content none english character (chinese word), so I want to keep in as UTF-8 encoding.
Thank you very much to help me.
Chris
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You are right, when I convert it to UTF-8, this issue solved, but I am thinking that how can solve it in program to let it can face different unicode format.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well than you need to find out what encoding has been used, which, btw, cannot be done
in a 100% save manner.
Npp uses chardet to identify the encoding, chardet is also available as python module.
If only utf8 with or without BOM is used, than you can use codecs module and do
something like
Hi all,
I am newbie in python, I write a script which read the text file (d:\subsitutions.txt) and searh and replace the content to all files in the target folder (d:\temp\a), but the result is not found because of each search string has the Byte Order Mark in front of the search string
The subsitutions.txt file structure as following which save as UTF-8 BOM
Now my program is written in following which search and show the line first
Result in the Console
The variable of s[0] has the \ufeff in front of an8
Because finally this subitution file content none english character (chinese word), so I want to keep in as UTF-8 encoding.
Thank you very much to help me.
Chris
Why not converting your file to UTF-8 (without the BOM).
Cheers
Claudia
You are right, when I convert it to UTF-8, this issue solved, but I am thinking that how can solve it in program to let it can face different unicode format.
Well than you need to find out what encoding has been used, which, btw, cannot be done
in a 100% save manner.
Npp uses chardet to identify the encoding, chardet is also available as python module.
If only utf8 with or without BOM is used, than you can use codecs module and do
something like
Cheers
Claudia