[ jEdit-devel ] [ jedit-Bugs-1524181 ] jEdit 4.3pre5 does not open files with right encoding

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #1524181, was opened at 2006-07-18 01:47
Message generated for change (Comment added) made by ngc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100588&aid=1524181&group_id=588

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: editor core
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Ian Lewis (ian_lewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: jEdit 4.3pre5 does not open files with right encoding

Initial Comment:
jEdit often opens files in the default encoding instead
of opening them in the encoding saved in recent.xml. It
also doesn't detect that the file is UTF-8 based on the
UTF-8 magic numbers like it used to.

You can reproduce it as follows.

1.) Create a new file.
2.) Open the buffer options and set the encoding to UTF-8
3.) Copy and paste some non-ascii characters in the
buffer, such as German umlauts.
4.) Save the buffer.
5.) Close jEdit.
6.) Verify that the recent.xml file has the correct
encoding in it.
7.) Start up jEdit again.
8.) Right click on the file in the browser and look at
the encoding. It is now the DEFAULT encoding rather
than UTF-8
9.) Open the file and see the characters as garbage. It
fails to see that it is UTF-8 based on the magic
characters as well.

This has caused a number of files to inadvertently get
saved in the wrong encoding because the files are read
using the wrong encoding and then saved in the wrong
encoding. This kills many characters that are non-ASCII.

----------------------------------------------------------------------

Comment By: Skeeve (ngc)
Date: 2006-08-09 22:00

Message:
Logged In: YES 
user_id=864970

I downloaded the file and it didn't open with UTF-8 when I switched my default 
encoding to MacRoman.

When I copied the first 2 japanese characters (from .ok=...) to the comment in 
the first line, it opened as UTF-8.

Then I concatenated all lines, up to this first characters to one line and found 
that the japanese characters appear at position 188 (or a bit later). Maybe jEdit 
doesn't check more than 128 characters to find the proper encoding?

just wild guessing...

----------------------------------------------------------------------

Comment By: Ian Lewis (ian_lewis)
Date: 2006-08-09 20:59

Message:
Logged In: YES 
user_id=478898

I can reproduce this bug with these steps (I also added the
recent.xml after running this scenario).

1.) Delete the recent.xml file.
2.) Open jEdit.
3.) Right click on the file in the file browser. Select
encoding and see that it is set to the default encoding
rather than UTF-8 (that by itself is a bug). Select UTF-8 as
the encoding from the File Browser for this file.
4.) Open the file.
5.) Close the file.
6.) Close jEdit.
7.) Open jEdit. Right click on the file in the file browser
and note the encoding (it's the default encoding). Don't
select a new encoding.
8.) Open the file. Notice it's opened in the default
encoding rather than UTF-8.

----------------------------------------------------------------------

Comment By: Marcelo Vanzin (vanza)
Date: 2006-07-19 09:31

Message:
Logged In: YES 
user_id=75113

That's beyond the point of the bug; jEdit will recognize the
BOM if it's there, and should restore it with whatever
enconding the history file says it was last edited with. So
until we see the recent.xml that causes the problem no
discussion here is gonna do any good.

----------------------------------------------------------------------

Comment By: Skeeve (ngc)
Date: 2006-07-19 07:51

Message:
Logged In: YES 
user_id=864970

So the Unicode organization is wrong when they show the BOM
for UTF-8?
http://www.unicode.org/unicode/faq/utf_bom.html#BOM :-)

To be precise: UTF-8 doesn't *need* to have a BOM, but jEdit
will know from it that the file is UTF-8. How else should it
know?

----------------------------------------------------------------------

Comment By: Marcelo Vanzin (vanza)
Date: 2006-07-19 04:36

Message:
Logged In: YES 
user_id=75113

UTF-8 doesn't have any BOM; UTF-8Y does. As for the problem,
I'd need a copy of the $HOME/.jedit/recent.xml file that
causes the problem, otherwise, I can't reproduce it...

----------------------------------------------------------------------

Comment By: Skeeve (ngc)
Date: 2006-07-18 21:33

Message:
Logged In: YES 
user_id=864970

That file does not contain any Byte Order Mark (
http://en.wikipedia.org/wiki/Byte_Order_Mark ) so jEdit
can't see that the file is supposed to be UTF-8

It should start with EF BB BF but starts with 23 20 4a

----------------------------------------------------------------------

Comment By: Marcelo Vanzin (vanza)
Date: 2006-07-18 08:24

Message:
Logged In: YES 
user_id=75113

Hi Ian,

I can't reproduce the problem following your tests. By any
chance, does the file *name* you're saving have any extra
characters? I think the current code might mess things up in
some cases if that happens...

----------------------------------------------------------------------

Comment By: Ian Lewis (ian_lewis)
Date: 2006-07-18 02:07

Message:
Logged In: YES 
user_id=478898

It seems to depend on the file but happens every time with a
particular file I've created using jEdit. I attached it below.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100588&aid=1524181&group_id=588

[ jEdit-devel ] [ jedit-Bugs-1524181 ] jEdit 4.3pre5 does not open files with right encoding

jEdit is a programmer's text editor written in Java.

[ jEdit-devel ] [ jedit-Bugs-1524181 ] jEdit 4.3pre5 does not open files with right encoding