Re: [ jEdit-devel ] jEdit - file encodings

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

(Moving the discussion to jedit-devel with OK from Shlomy.)

Alan Ezust wrote:
> For #3, I know the defualt list of fallback encodings is empty for me
> at the moment. It's an experimental feature, but we should have it set
> by default to SOMETHING sensible. As for which ones are sensible, I
> think that k_satoda might have some ideas of good ones to put for the
> default installation.

We should be careful to put an encoding into the list, because the
feature works only the time of encoding error. Some popular encodings
like Windows-1252 accept wide range of arbitrary bytes, thus hardly
throw an encoding error.

For example, if Windows-1252 is in the list, jEdit will open almost
all files without error, but as garbled text if the actual encoding
is not Windows-1252. The user won't be able to see what happened
because no error is shown. This is very bad for those who don't live
in Latin domain.

UTF-8 is the most considerable one, because it looks to reject most
byte sequences in other encodings, and it is not domain specific.
But it still has some risk because it is possible that there is an
encoding I don't know which is ambiguous with UTF-8.

Sharing locale specific preferred lists looks interesting. But I
think it will require some UI work because setting it by default
(without UI) considered bad.

> for #1, that's another kind of encoding autodetector, and i know jedit
> is written to support additional ones, so it shouldn't be hard to add
> that too. Kazutoshi - do you have some time to implement that?

I had written a plugin which provides an EncodingDetector implemented
by juniversalchardet (http://code.google.com/p/juniversalchardet/).
But I stopped writing it because I got satisfied by the fallback
encodings with well trained list for it.

I think this plugin would give almost the same result with Mozilla
FireFox because of the origin of juniversalchardet. I'm happy if
this can help someone. See the attached file if you got interested.
Please note that you should download and put juniversalchardet.jar
manually, and the plugin was written for juniversalchardet-1.01. It
might have problems with the latest juniversalchardet.

Alan, could you please (or, should I) put the code into svn?
I'm not sure ...
  - whether the organization of files are OK,
  - how to register juniversalchardet as a library plugin
    like jsch.jar or jruby.jar
  - how the props should be set to use external jar

> On Wed, Apr 2, 2008 at 2:13 PM, Shlomy Reinstein <sre...@gm...> wrote:
>> Hi Alan,
>>
>>  I had a problem with some source file, which had a popular encoding of
>>  ISO-8859-1 or something like that. For some reason, this popular
>>  encoding (which, according to Matthieu, is normally used for source
>>  files), was not part of my "fallback encodings", nor was it recognized
>>  by one of the 2 default encoding detectors.
>>  This resulted in two problems:
>>  1. Opening this file in jEdit showed an error dialog about the encoding.
>>  2. Project-wide (or directory-tree) search using the Search dialog
>>  also opened this error dialog because it failed to search in that
>>  file.
>>
>>  I had no idea what encoding to use. In the end, I opened it in Mozilla
>>  FireFox (!!) and tries various options until I found one that worked,
>>  then used the same one successfully with jEdit.
>>  How does one normally have to deal with such issue?
>>  I thought that it might be nice to:
>>  1. Provide an option in jEdit for files whose encoding cannot be
>>  determined, to iterate through all available encodings and try them
>>  one by one instead of showing the error dialog.
>>  2. Let the jEdit installer ask the user about the locale, and set up
>>  the detectors or fallback encodings accordingly.
>>  3. Add all popular encodings to the list of fallback encodings, or
>>  maybe enrich the detectors.
>>
>>  All in all, this was a very frustrating issue, and I'd like to save
>>  this from other users.
>>
>>  Shlomy

-- 
k_satoda

Re: [ jEdit-devel ] jEdit - file encodings

jEdit is a programmer's text editor written in Java.

Re: [ jEdit-devel ] jEdit - file encodings