#546 sed 4.2.1 UTF8 with signature bug

Sed
open
GnuWin
Binaries (396)
5
2012-07-26
2011-12-11
Anonymous
No

Using sed 4.2.1 on Windows. Do following: use notepad, create a file with 3 lines, all starting with test. Save file as test-ansi.txt (as default will be saved as ANSI).
run: sed.exe -n "/^test.$/p" test-ansi.txt
Observe all 3 lines are printed.
Open file test-ansi.txt in notepad again, and now save as file test-utf8bom.txt and select UTF8 as encoding (saving as UTF8 in notepad = UTF8 with signature/BOM).
run: sed.exe -n "/^test.
$/p" test-utf8bom.txt
Observer the first line is not printed! This problem is not just with print, it's a general problem with delete, substitute etc. where you use the start of line regex (^).
UTF8 without signature seems to work, so sed seems to have a problem with the UTF8 signature.

Another but similar problem:
run: sed.exe -n "/^test.*$/p" test-utf8bom.txt > outfile.txt
Observe outfile.txt is not UTF8 with signature. Signature is lost. Sed should preserve the signature.

You don't have to use notepad off course, just use any text editor that allows you to select what encoding to save in. Have not tried sed in linux, so I don't know if the bug is there as well.

Discussion


Anonymous


Cancel   Add attachments