charset dbf

Brought to you by: jackerm, mfrasca, sbrienen, simoc

charset dbf

Forum: Help

Creator: Eduardo Jones

Created: 2013-10-17

Updated: 2013-10-20

Eduardo Jones - 2013-10-17

When the application load String data with special characters (spanish ñ or ó by example), shows in a not good way.

I'm using this configuration:

jdbc.url=jdbc:relique:csv:C:/BDs/company?fileExtension=.dbf&charset=ISO8859-1

and the shown data is (example):

VI¥A DEL MAR for VIÑA DEL MAR
CON CàN for CON CÓN
PE¥ABLANCA for PEÑABLANCA

What is the charset to use?
am i using the configuration in a good way?

Thanks in advance

Last edit: Eduardo Jones 2013-10-17

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Simon Chenery - 2013-10-17

This is a bug in CsvJdbc. I logged it as #94, "Wrong strings read from DBF file with ISO-8859-1 extended chars"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Simon Chenery - 2013-10-17

Bug #94 is now fixed and will be included in the next CsvJdbc version.

Note that correct charset name is ISO-8859-1 not ISO8859-1.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eduardo Jones - 2013-10-17

Ok. Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eduardo Jones - 2013-10-18

I've tested the version published in git, but the problem persists.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Simon Chenery - 2013-10-18

Maybe your DBF file is UTF-8, UTF-16, or some other encoding?

Can you please make a hex dump of one of the strings in your DBF file
that is wrong in CsvJdbc and paste into this Discussion.

If you can also add the correct string value, that would also be helpful.

For example, in my file csvjdbc/src/testdata/hotel.dbf I see the
string Córdoba as:

0000500 C 363 r d o b a H o t e l 20 20 20 43 f3 72 64 6f 62 61 20 48 6f 74 65 6c

This is ISO-8859-1 encoding. From http://en.wikipedia.org/wiki/ISO_8859-1
I see that ó is 0xF3.

I have a text file containing the same string:

0000000 C 303 263 r d o b a \n 43 c3 b3 72 64 6f 62 61 0a

From http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=hex
I see that ó is 0xC3 0xB3 in UTF-8 encoding.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eduardo Jones - 2013-10-20

The hex code are:

VI¥A DEL MAR: 56 49 a5 41 20 44 45 4c 20 4d 41 52
CON CàN: 43 4f 4e 20 43 e0 4e

Ñ = a5
Ó = e0

It's not UTF-8 or UTF-16, neither ISO-8869-1.

How can i know what is the correct charset?

Last edit: Eduardo Jones 2013-10-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Simon Chenery - 2013-10-20

Some google searching for "n with tilde 0xA5 encoding" shows that this is probably charset CP850. See http://www.ascii-codes.com/cp850.html

Try CP850 as the charset with CsvJdbc.

Last edit: Simon Chenery 2013-10-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eduardo Jones - 2013-10-20

Thanks a log Simon. It works ok

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.