When the application load String data with special characters (spanish ñ or ó by example), shows in a not good way.
I'm using this configuration:
jdbc.url=jdbc:relique:csv:C:/BDs/company?fileExtension=.dbf&charset=ISO8859-1
and the shown data is (example):
VI¥A DEL MAR for VIÑA DEL MAR CON CàN for CON CÓN PE¥ABLANCA for PEÑABLANCA
VI¥A DEL MAR
CON CàN
PE¥ABLANCA
What is the charset to use? am i using the configuration in a good way?
Thanks in advance
This is a bug in CsvJdbc. I logged it as #94, "Wrong strings read from DBF file with ISO-8859-1 extended chars"
Bug #94 is now fixed and will be included in the next CsvJdbc version.
Note that correct charset name is ISO-8859-1 not ISO8859-1.
Ok. Thanks.
I've tested the version published in git, but the problem persists.
Maybe your DBF file is UTF-8, UTF-16, or some other encoding?
Can you please make a hex dump of one of the strings in your DBF file that is wrong in CsvJdbc and paste into this Discussion.
If you can also add the correct string value, that would also be helpful.
For example, in my file csvjdbc/src/testdata/hotel.dbf I see the string Córdoba as:
0000500 C 363 r d o b a H o t e l 20 20 20 43 f3 72 64 6f 62 61 20 48 6f 74 65 6c
This is ISO-8859-1 encoding. From http://en.wikipedia.org/wiki/ISO_8859-1 I see that ó is 0xF3.
I have a text file containing the same string:
0000000 C 303 263 r d o b a \n 43 c3 b3 72 64 6f 62 61 0a
From http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=hex I see that ó is 0xC3 0xB3 in UTF-8 encoding.
The hex code are:
VI¥A DEL MAR: 56 49 a5 41 20 44 45 4c 20 4d 41 52 CON CàN: 43 4f 4e 20 43 e0 4e
Ñ = a5 Ó = e0
It's not UTF-8 or UTF-16, neither ISO-8869-1.
How can i know what is the correct charset?
Some google searching for "n with tilde 0xA5 encoding" shows that this is probably charset CP850. See http://www.ascii-codes.com/cp850.html
Try CP850 as the charset with CsvJdbc.
Thanks a log Simon. It works ok
Log in to post a comment.
When the application load String data with special characters (spanish ñ or ó by example), shows in a not good way.
I'm using this configuration:
jdbc.url=jdbc:relique:csv:C:/BDs/company?fileExtension=.dbf&charset=ISO8859-1and the shown data is (example):
VI¥A DEL MARfor VIÑA DEL MARCON CàNfor CON CÓNPE¥ABLANCAfor PEÑABLANCAWhat is the charset to use?
am i using the configuration in a good way?
Thanks in advance
Last edit: Eduardo Jones 2013-10-17
This is a bug in CsvJdbc. I logged it as #94, "Wrong strings read from DBF file with ISO-8859-1 extended chars"
Bug #94 is now fixed and will be included in the next CsvJdbc version.
Note that correct charset name is ISO-8859-1 not ISO8859-1.
Ok. Thanks.
I've tested the version published in git, but the problem persists.
Maybe your DBF file is UTF-8, UTF-16, or some other encoding?
Can you please make a hex dump of one of the strings in your DBF file
that is wrong in CsvJdbc and paste into this Discussion.
If you can also add the correct string value, that would also be helpful.
For example, in my file csvjdbc/src/testdata/hotel.dbf I see the
string Córdoba as:
This is ISO-8859-1 encoding. From http://en.wikipedia.org/wiki/ISO_8859-1
I see that ó is 0xF3.
I have a text file containing the same string:
From http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=hex
I see that ó is 0xC3 0xB3 in UTF-8 encoding.
The hex code are:
VI¥A DEL MAR: 56 49 a5 41 20 44 45 4c 20 4d 41 52
CON CàN: 43 4f 4e 20 43 e0 4e
Ñ = a5
Ó = e0
It's not UTF-8 or UTF-16, neither ISO-8869-1.
How can i know what is the correct charset?
Last edit: Eduardo Jones 2013-10-20
Some google searching for "n with tilde 0xA5 encoding" shows that this is probably charset CP850. See http://www.ascii-codes.com/cp850.html
Try CP850 as the charset with CsvJdbc.
Last edit: Simon Chenery 2013-10-20
Thanks a log Simon. It works ok