From: SourceForge.net <no...@so...> - 2006-08-09 20:00:38
|
Bugs item #1524181, was opened at 2006-07-18 01:47 Message generated for change (Comment added) made by ngc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100588&aid=1524181&group_id=588 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: editor core Group: None Status: Open Resolution: None Priority: 5 Submitted By: Ian Lewis (ian_lewis) Assigned to: Nobody/Anonymous (nobody) Summary: jEdit 4.3pre5 does not open files with right encoding Initial Comment: jEdit often opens files in the default encoding instead of opening them in the encoding saved in recent.xml. It also doesn't detect that the file is UTF-8 based on the UTF-8 magic numbers like it used to. You can reproduce it as follows. 1.) Create a new file. 2.) Open the buffer options and set the encoding to UTF-8 3.) Copy and paste some non-ascii characters in the buffer, such as German umlauts. 4.) Save the buffer. 5.) Close jEdit. 6.) Verify that the recent.xml file has the correct encoding in it. 7.) Start up jEdit again. 8.) Right click on the file in the browser and look at the encoding. It is now the DEFAULT encoding rather than UTF-8 9.) Open the file and see the characters as garbage. It fails to see that it is UTF-8 based on the magic characters as well. This has caused a number of files to inadvertently get saved in the wrong encoding because the files are read using the wrong encoding and then saved in the wrong encoding. This kills many characters that are non-ASCII. ---------------------------------------------------------------------- Comment By: Skeeve (ngc) Date: 2006-08-09 22:00 Message: Logged In: YES user_id=864970 I downloaded the file and it didn't open with UTF-8 when I switched my default encoding to MacRoman. When I copied the first 2 japanese characters (from .ok=...) to the comment in the first line, it opened as UTF-8. Then I concatenated all lines, up to this first characters to one line and found that the japanese characters appear at position 188 (or a bit later). Maybe jEdit doesn't check more than 128 characters to find the proper encoding? just wild guessing... ---------------------------------------------------------------------- Comment By: Ian Lewis (ian_lewis) Date: 2006-08-09 20:59 Message: Logged In: YES user_id=478898 I can reproduce this bug with these steps (I also added the recent.xml after running this scenario). 1.) Delete the recent.xml file. 2.) Open jEdit. 3.) Right click on the file in the file browser. Select encoding and see that it is set to the default encoding rather than UTF-8 (that by itself is a bug). Select UTF-8 as the encoding from the File Browser for this file. 4.) Open the file. 5.) Close the file. 6.) Close jEdit. 7.) Open jEdit. Right click on the file in the file browser and note the encoding (it's the default encoding). Don't select a new encoding. 8.) Open the file. Notice it's opened in the default encoding rather than UTF-8. ---------------------------------------------------------------------- Comment By: Marcelo Vanzin (vanza) Date: 2006-07-19 09:31 Message: Logged In: YES user_id=75113 That's beyond the point of the bug; jEdit will recognize the BOM if it's there, and should restore it with whatever enconding the history file says it was last edited with. So until we see the recent.xml that causes the problem no discussion here is gonna do any good. ---------------------------------------------------------------------- Comment By: Skeeve (ngc) Date: 2006-07-19 07:51 Message: Logged In: YES user_id=864970 So the Unicode organization is wrong when they show the BOM for UTF-8? http://www.unicode.org/unicode/faq/utf_bom.html#BOM :-) To be precise: UTF-8 doesn't *need* to have a BOM, but jEdit will know from it that the file is UTF-8. How else should it know? ---------------------------------------------------------------------- Comment By: Marcelo Vanzin (vanza) Date: 2006-07-19 04:36 Message: Logged In: YES user_id=75113 UTF-8 doesn't have any BOM; UTF-8Y does. As for the problem, I'd need a copy of the $HOME/.jedit/recent.xml file that causes the problem, otherwise, I can't reproduce it... ---------------------------------------------------------------------- Comment By: Skeeve (ngc) Date: 2006-07-18 21:33 Message: Logged In: YES user_id=864970 That file does not contain any Byte Order Mark ( http://en.wikipedia.org/wiki/Byte_Order_Mark ) so jEdit can't see that the file is supposed to be UTF-8 It should start with EF BB BF but starts with 23 20 4a ---------------------------------------------------------------------- Comment By: Marcelo Vanzin (vanza) Date: 2006-07-18 08:24 Message: Logged In: YES user_id=75113 Hi Ian, I can't reproduce the problem following your tests. By any chance, does the file *name* you're saving have any extra characters? I think the current code might mess things up in some cases if that happens... ---------------------------------------------------------------------- Comment By: Ian Lewis (ian_lewis) Date: 2006-07-18 02:07 Message: Logged In: YES user_id=478898 It seems to depend on the file but happens every time with a particular file I've created using jEdit. I attached it below. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100588&aid=1524181&group_id=588 |