Menu

#256 Character encoding error on load doesn't report line number

open
nobody
None
5
2011-12-04
2008-12-03
Suresh Mani
No

Dear JEdit Developers,

I would like to report an issue in the world's best editor.

When I load a file which has a character encoding issue, I get an I/O Error dialog box that shows the following message:

"The following I/O operation could not be completed:

c:\temp\test.txt
The file could not be loaded correctly (some data might be lost) with encoding "Cp1252".
(java.nio.charset.UnmappableCharacterException: Input length = 1)
Try selecting a different encoding.
It can be selected with the menu File->Reload with Encoding.
If you want it to be done automatically, add the candidates into "List of fallback encodings" in Encodings pane of Global Options.

But if I open my file "text.txt" that has a character encoding problem in a different editor like plain and simple windows notepad.exe,
- the do "Select All"
- copy the entire contents of the file to clipboard
- then open a new file in JEdit
- then paste the contents of the clipboard in JEdit
- do save as in JEdit
- try saving it as c:\temp\test2.txt
- JEdit throws the following error:

The following I/O operation could not be completed:
c:\temp\test2.txt
Cannot save: java.nio.CharacterConversionException: Failed to encode the character 'some_junk_character_is_shown' (U+FFFD) at column 85 in line 456 with the encoding "Cp1252".

Do you see the difference? When opening a file that has a character encoding issue, JEdit does not show the line and column numbers. But when you try to do a "save as" on a file that has a character encoding issue, JEdit actually tells you the line and column number that contain the problematic character.

I would really appreciate if you could make JEdit point out the line and column numbers of problematic characters while loading files (that have such character encoding issues).

Thanks a million
Suresh

Discussion

  • Kazutoshi Satoda

    One workaround is to search for U+FFFD manually after a file is loaded
    with error. It can be done by using regex-search with "\uFFFD". But this
    is not a perfect solution because U+FFFD can be loaded from a file which
    was saved in UTF-x. FYI, please see the following links for U+FFFD.
    http://en.wikipedia.org/wiki/Replacement_character
    http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/CharsetDecoder.html

    It could be implemented if there was a way, which I couldn't find, to
    know how many characters were correctly decoded in Reader#read() method
    before throwing the exception.

    Writer.write() has the same issue, but I could implement the indication
    for saving with a simple loop because the source information is in our
    hand, not in a external stream and hidden buffers.
    http://jedit.svn.sourceforge.net/jedit/?rev=9494&view=rev

     
  • Alan Ezust

    Alan Ezust - 2009-12-30
    • assigned_to: nobody --> k_satoda
    • status: open --> closed-fixed
     
  • Kazutoshi Satoda

    • assigned_to: k_satoda --> nobody
    • milestone: 101608 --> 101609
    • status: closed-fixed --> open
     
  • Kazutoshi Satoda

    The error constantly happens when loading a file with an incorrect
    encoding. It is expected behavior therefore not the problem. The problem
    is that the error doesn't report line number, as the summary says.

     
  • Anonymous

    Anonymous - 2009-12-31

    As far as I understand, this is an unsolvable problem, right? At best, the error dialog can show which character breaks each type of encoding, but since none of the encodings on the list could be used, there will be several such places, and probably the reason is that the file follows a different encoding.
    I guess tools like Notepad show the line of the error because they support a single encoding (or some fixed way to detect a single encoding) so they can provide an indication of where exactly it failed.

     
  • Anonymous

    Anonymous - 2009-12-31

    Sorry, take that back. I noticed that the error dialog specifies the specific encoding that failed.

     
  • Alan Ezust

    Alan Ezust - 2011-12-04
    • assigned_to: k_satoda --> nobody
    • status: pending-wont-fix --> open
     
  • Jarek Czekalski

    Jarek Czekalski - 2011-12-04

    Moving to feature requests. I think everyone agrees it is not a bug.

     
  • Jarek Czekalski

    Jarek Czekalski - 2011-12-04
    • summary: Character encoding error doesn't report line number --> Character encoding error on load doesn't report line number
    • milestone: 101609 -->
    • labels: 102668 -->
     

Log in to post a comment.