Menu

charset dbf

Help
2013-10-17
2013-10-20
  • Eduardo Jones

    Eduardo Jones - 2013-10-17

    When the application load String data with special characters (spanish ñ or ó by example), shows in a not good way.

    I'm using this configuration:

    jdbc.url=jdbc:relique:csv:C:/BDs/company?fileExtension=.dbf&charset=ISO8859-1

    and the shown data is (example):

    VI¥A DEL MAR for VIÑA DEL MAR
    CON CàN for CON CÓN
    PE¥ABLANCA for PEÑABLANCA

    What is the charset to use?
    am i using the configuration in a good way?

    Thanks in advance

     

    Last edit: Eduardo Jones 2013-10-17
  • Simon Chenery

    Simon Chenery - 2013-10-17

    This is a bug in CsvJdbc. I logged it as #94, "Wrong strings read from DBF file with ISO-8859-1 extended chars"

     
  • Simon Chenery

    Simon Chenery - 2013-10-17

    Bug #94 is now fixed and will be included in the next CsvJdbc version.

    Note that correct charset name is ISO-8859-1 not ISO8859-1.

     
  • Eduardo Jones

    Eduardo Jones - 2013-10-17

    Ok. Thanks.

     
  • Eduardo Jones

    Eduardo Jones - 2013-10-18

    I've tested the version published in git, but the problem persists.

     
  • Simon Chenery

    Simon Chenery - 2013-10-18

    Maybe your DBF file is UTF-8, UTF-16, or some other encoding?

    Can you please make a hex dump of one of the strings in your DBF file
    that is wrong in CsvJdbc and paste into this Discussion.

    If you can also add the correct string value, that would also be helpful.

    For example, in my file csvjdbc/src/testdata/hotel.dbf I see the
    string Córdoba as:

    0000500               C 363   r   d   o   b   a       H   o   t   e   l
             20  20  20  43  f3  72  64  6f  62  61  20  48  6f  74  65  6c
    

    This is ISO-8859-1 encoding. From http://en.wikipedia.org/wiki/ISO_8859-1
    I see that ó is 0xF3.

    I have a text file containing the same string:

    0000000   C 303 263   r   d   o   b   a  \n
             43  c3  b3  72  64  6f  62  61  0a
    

    From http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=hex
    I see that ó is 0xC3 0xB3 in UTF-8 encoding.

     
  • Eduardo Jones

    Eduardo Jones - 2013-10-20

    The hex code are:

    VI¥A DEL MAR: 56 49 a5 41 20 44 45 4c 20 4d 41 52
    CON CàN: 43 4f 4e 20 43 e0 4e

    Ñ = a5
    Ó = e0

    It's not UTF-8 or UTF-16, neither ISO-8869-1.

    How can i know what is the correct charset?

     

    Last edit: Eduardo Jones 2013-10-20
  • Simon Chenery

    Simon Chenery - 2013-10-20

    Some google searching for "n with tilde 0xA5 encoding" shows that this is probably charset CP850. See http://www.ascii-codes.com/cp850.html

    Try CP850 as the charset with CsvJdbc.

     

    Last edit: Simon Chenery 2013-10-20
  • Eduardo Jones

    Eduardo Jones - 2013-10-20

    Thanks a log Simon. It works ok

     

Log in to post a comment.

MongoDB Logo MongoDB