|
From: Petr P. <Pri...@sk...> - 2007-01-31 09:54:27
|
Dalibor Petri=E8evi=E6 > Petr Prikryl wrote: > > [...] > > I can explain the encoding to jEdit by adding > > explicitly another similar line; so, I can use > > [...] > > # :encoding=3Dwindows-1250: >=20 > Pardon my question but are you shure that=20 > :encoding=3Dwindows-1250: causes Jedit to set proper=20 > encoding when opening file? One can never be sure ;-). Double checking is always better. It seems that you found the bug. I have created the file a.txt like this ---------------------------------------- :encoding=3Dwindows-1250: P=F8=ED=B9ern=EC =BElu=BBou=E8k=FD k=F9=F2 =FAp=ECl =EF=E1belsk=E9 =F3dy (i.e. The quick brown fox.... for the Czech=20 language using the encoding.) ---------------------------------------- in another editor. Then I switched jEdit to use utf-8 as default encoding (this is important to show the bug) and exited jEdit (not running=20 in background, invisible mode). If I start jEdit and pass the file through the command line argument, it loads the file and displays the content in the prescribed=20 encoding correctly. However, when I open the file using Ctrl+O,=20 the prescribed encoding is ignored and the=20 content is displayed asuming the default jEdit encoding (here utf-8).=20 When I do File - Reload from the menu, it=20 is then reloaded and displayed correctly. It seems that the File - Open implementation forgot to interpret the explicit encoding prescription before displaying the content. > This is something that I wished for years > but experience shows that this is not actually=20 > working. I THINK the process goes this way:=20 > Jedit first opens file and then parses it.=20 > THEN Jedit can determine encoding specified=20 > as you are doing it and then it should reopen it=20 > with propper encoding and (re)show chars in=20 > proper encoding. It does not do that as=20 > far as I can see. You actually have to specify=20 > manually file encoding BEFORE opening file.=20 > This is how stuff works now. As far as I can say, the Dalibor's observation is true. I did not noticed the bug until now, because I use the FAR manager and usually open the file in jEdit by pointing to the file and=20 using Ctrl+F4 shortcut. It means that I pass the file to jEdit through command line and it works correctly even if jEdit is already=20 running. Also, normally I have the default encoding of jEdit set to windows-1250, so the bug=20 is masked. Could someone else confirm the bug? Would it be difficult to correct the bug? I am using jEdit 4.3pre9 with Java 1.5.0_10 with options -background -nogui -reuseview Thanks, pepr |
|
From: <dal...@is...> - 2007-01-31 10:13:41
|
Petr Prikryl wrote: > Also, normally I have the default encoding > of jEdit set to windows-1250, so the bug > is masked. Could someone else confirm the bug? > Would it be difficult to correct the bug? > > I am using jEdit 4.3pre9 with Java 1.5.0_10 > with options -background -nogui -reuseview > Saddle, I'm not shure it's a bug. It's just weired by design :-) OT here's something my wife makes me repet just to twist my tongue when she's bored (she is Czech by father's line): Tři sta třicet tři stříbrných stříkaček stříkalo přes tři sta třiatřicet stříbrných střech ;-) Cheers, -- Dalibor Petričević |
|
From: Slava P. <sl...@fa...> - 2007-01-31 15:37:44
|
On 31-Jan-07, at 4:53 AM, Petr Prikryl wrote: > ---------------------------------------- > :encoding=windows-1250: I wonder where people get the idea that this works. 'encoding' is not a buffer-local property and it cannot be set in this way, and the documentation does not mention it. Don't do this. Slava |
|
From: Matthieu C. <cho...@gm...> - 2007-01-31 15:43:44
|
2007/1/31, Slava Pestov <sl...@fa...>: > > > On 31-Jan-07, at 4:53 AM, Petr Prikryl wrote: > > > ---------------------------------------- > > :encoding=windows-1250: > > I wonder where people get the idea that this works. 'encoding' is not > a buffer-local property and it cannot be set in this way, and the > documentation does not mention it. > > Don't do this. > > Slava > > Hi, it is not used to load the buffer, but if you put an :encoding=windows-1250: jEdit will read that and change the encoding of the buffer (it can be seen in the status bar). In fact why not reading that to choose the encoding like it is done for the xml encoding detection ? Matthieu |
|
From: Marcelo V. <va...@us...> - 2007-02-01 04:26:23
|
Matthieu Casanova wrote: > In fact why not reading that to choose the encoding like it is done for > the xml encoding detection ? I might be repeating myself here, but the problem with using encoding as a buffer-local property embedded in the buffer is the "chicken and egg" problem. What encoding do you use to read the encoding string? XML parsing is not a very good example. If you look at the parser code in the JDK, it's really ugly. I've had to fix it at my last job and I still have nightmares about it. :-) Basically what it does is ready the first few bytes, does a big "if then else" and checks if that chacacter is the "<" character in several different encodings. Then tries to parse using that encoding, and if it then works, use the encoding that the XML declaration defines. This "works" for XML because the first character in an XML file (except for whitespace) always has to be a "<". But even then it's easy to get things wrong; try to parse an XML file encoded in UTF-16LE using the 1.4.2 JDK parser and watch it blow up (1.5 works fine, BTW). Trying to apply that to a file that doesn't have to respect any structure is, to say the least, very, very difficult. Even if most of the time you can get away with just treating everything as ASCII, there are always exceptions (the multi-byte unicode encodings being examples of where treating things as ASCII would fail). -- Marcelo Vanzin va...@us... "Life is too short to drink cheap beer" |
|
From: Slava P. <sl...@fa...> - 2007-02-01 05:33:08
|
You're exactly right. The best thing would be for people to gradually transition to UTF16 and UTF8 and slowly phase out legacy encodings. Slava On 31-Jan-07, at 11:26 PM, Marcelo Vanzin wrote: > I might be repeating myself here, but the problem with using > encoding as > a buffer-local property embedded in the buffer is the "chicken and > egg" > problem. What encoding do you use to read the encoding string? > > XML parsing is not a very good example. If you look at the parser code > in the JDK, it's really ugly. I've had to fix it at my last job and I > still have nightmares about it. :-) Basically what it does is ready > the > first few bytes, does a big "if then else" and checks if that > chacacter > is the "<" character in several different encodings. Then tries to > parse > using that encoding, and if it then works, use the encoding that > the XML > declaration defines. > > This "works" for XML because the first character in an XML file > (except > for whitespace) always has to be a "<". But even then it's easy to get > things wrong; try to parse an XML file encoded in UTF-16LE using the > 1.4.2 JDK parser and watch it blow up (1.5 works fine, BTW). > > Trying to apply that to a file that doesn't have to respect any > structure is, to say the least, very, very difficult. Even if most of > the time you can get away with just treating everything as ASCII, > there > are always exceptions (the multi-byte unicode encodings being examples > of where treating things as ASCII would fail). |
|
From: <dal...@is...> - 2007-02-01 13:44:32
|
Slava Pestov wrote: > You're exactly right. The best thing would be for people to gradually > transition to UTF16 and UTF8 and slowly phase out legacy encodings. > > Slava > > On 31-Jan-07, at 11:26 PM, Marcelo Vanzin wrote: > >> I might be repeating myself here, but the problem with using >> encoding as >> a buffer-local property embedded in the buffer is the "chicken and >> egg" It is not really down to checkens and eggs ... :encoding=windows-1250: This line does not have any "strange" characters. And it never should have one. (Problem will emerge with strange encodings like Chinese or something like that, and feather will fly ...) Now, file containing that line can be encoded in some one-byte or multy-byte encoding. First try to recognize sequence ":encoding" in any one-byte encoding and if you succeed you won. You got encoding. No chickens, no eggs. No flue. If you fail - try reading file as it is multy-byte encoded (UTF something). Go same way as described earlier. If sequence is not found after all the searches - it probably isn't in file. Yes, in worst case scenario you will parse file several times and if file is big ... it might be a performance problem (:encoding can be placed at the end of file - you will read whole file). Now, if there can be a config option "I want to use this" then if user wants he can use this feature. No flame, no war. I might be wrong about this but I would love to have this feature. Normally, I would try doing it by myself but my Java knowledge and experience is quite humble. Tnx, -- Dalibor Petricevic |
|
From: Matthieu C. <cho...@gm...> - 2007-02-01 08:05:24
|
2007/2/1, Marcelo Vanzin <va...@us...>: > > Matthieu Casanova wrote: > > In fact why not reading that to choose the encoding like it is done for > > the xml encoding detection ? > > I might be repeating myself here, but the problem with using encoding as > a buffer-local property embedded in the buffer is the "chicken and egg" > problem. What encoding do you use to read the encoding string? > > XML parsing is not a very good example. If you look at the parser code > in the JDK, it's really ugly. I've had to fix it at my last job and I > still have nightmares about it. :-) Basically what it does is ready the > first few bytes, does a big "if then else" and checks if that chacacter > is the "<" character in several different encodings. Then tries to parse > using that encoding, and if it then works, use the encoding that the XML > declaration defines. > > This "works" for XML because the first character in an XML file (except > for whitespace) always has to be a "<". But even then it's easy to get > things wrong; try to parse an XML file encoded in UTF-16LE using the > 1.4.2 JDK parser and watch it blow up (1.5 works fine, BTW). > > Trying to apply that to a file that doesn't have to respect any > structure is, to say the least, very, very difficult. Even if most of > the time you can get away with just treating everything as ASCII, there > are always exceptions (the multi-byte unicode encodings being examples > of where treating things as ASCII would fail). > Yes that's right, but look at my example. My jEdit use UTF-8 by default. But sometimes I open a file encoded in ISO-8859-1, there are some accentuated characters that are displayed as boxes (meaning the encoding was not the good one). But the :encoding=ISO-8859-1: was read correctly so it would have been possible to read it and to switch to that encoding. For your example, you're right, maybe it would not work well everytime but I think it could help. (and if it doesn't work with 1.4.2 we don't care since jEdit requires now Java 5 :) And there is an important problem about encoding in jEdit : if jEdit uses by default UTF-8. I open a file that contains this :encoding=someencoding: The file will be loaded using UTF-8 because it's the default encoding but the status bar will show the encoding found in the file that will also be used to save the file. Nowhere the user can know what encoding was used to load the file In fact I think it almost every case this encoding would be read correctly. I tried to read an UTF-16 or UTF-8 file using default encoding Cp1252, the UTF-16 was detected by the magic unicode characters, the UTF-8 was not detected but and some characters were wrong but the encoding=UTF-8 was fine So is there still examples where it fails ? |
|
From: Slava P. <sl...@fa...> - 2007-01-31 16:17:06
|
On 31-Jan-07, at 10:43 AM, Matthieu Casanova wrote: > Hi, it is not used to load the buffer, but if you put an > :encoding=windows-1250: > jEdit will read that and change the encoding of the buffer (it can be > seen in the status bar). > In fact why not reading that to choose the encoding like it is done > for the xml encoding detection ? Because buffer-local properties are only processed after the file is loaded. Slava |
|
From: Matthieu C. <cho...@gm...> - 2007-01-31 16:20:15
|
2007/1/31, Slava Pestov <sl...@fa...>: > > > On 31-Jan-07, at 10:43 AM, Matthieu Casanova wrote: > > > Hi, it is not used to load the buffer, but if you put an > > :encoding=windows-1250: > > jEdit will read that and change the encoding of the buffer (it can be > > seen in the status bar). > > In fact why not reading that to choose the encoding like it is done > > for the xml encoding detection ? > > Because buffer-local properties are only processed after the file is > loaded. > > Slava > That's right, but I think the encoding property could be processing during the buffer loading don't you ? Of course unless the other properties it would only work if the property is set in the ten first line and not the ten last lines Matthieu |