Originally created by: martinkr...@gmail.com
What steps will reproduce the problem?
1. Query some database table
2. Export data from query output as .csv
3. Settings for .csv output is:
Field-seperator: ;
Line-terminator: \r\n
What is the expected output? What do you see instead?
See attached .txt file with HEX view from UltraEdit.
The file browsed is a file with some detail of an .csv export done by
HeidiSQL and the same output as generated by MS Excel.
It seems that HeidiSQL is adding some illegal characters (00 hex) inbetween
valid characters.
What version of the product are you using? On what operating system?
I'm using HeidiSQL rev 1603 at the moment.
Please provide any additional information below.
See attached file with comparison between HeidiSQL csv export and Excel csv
export of same data.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: spero.sh...@gmail.com
This seems to have started in rev 1549, rev 1548 and earlier looks to be unaffected.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
The attached file was saved as an ANSI-file while the original CSV output creates a
UTF16-LE file. Do not open multibyte charset files with a ANSI only editor or the
file will look corrupted.
If we would save CSV files as ANSI it would silently (!) convert critical characters
to their representation in the ANSI charset. That's what [r1549] fixes. Now, please
don't tell me Excel doesn't support Unicode files? What a shitty peace of software
is that?
Labels: -Priority-Medium Severity-Problematic
Status: Accepted
Related
Commit: [r1549]
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
At least Excel 2007 loads Unicode CSV files correctly (even if I leave the ANSI
option enabled while importing which is funny but anyway). So, do we have a problem
here at all? The original report says nothing about some real error here, does it?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rosenfie...@gmail.com
There is not enough information in the original report to say whether there is a bug
or not. Specifically, there is no mention of whether the source data contains a NUL
character or not.
I'm closing this bug. To reopen, please provide an SQL export of the original data,
and we'd be happy to re-investigate. (Make an SQL export with version 3.2 also,
just so we're sure that it's pristine and unaffected by Unicode issues.)
...
And on a more theoretical note, I don't think there exists a CSV standard, so
precisely how NUL characters should be handled is probably not documented anywhere,
and will be handled different by different applications.
Since HeidiSQL aims to be user friendly, I guess it could pop up a warning after
data export to tell the user that his/her data contains NUL characters, and that
some applications may be unable to handle this.
Perhaps such a dialog could even have a link to a help file or web page containing
advice on how to convert NUL to SPACE with for example an SQL sentence against the
original data or a hex editor against the exported file.
Status: NeedInfo
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: martinkr...@gmail.com
Thanks for the feedback gents.
This problem was actually picked up when using data exported from HeidiSQL as .csv as
input for a Perl script. The Perl script is "blind" for ANSI / UTF format settings as
it only makes use of the function split(';', $line) to break the data apart as needed
by the script.
Since Heidi rev 1549 I have to open up the HeidiSQL .csv export from Excel and save
again as .csv to be able to use it in my Perl script. I can of course change the
script source to "fix" the .csv file before using it, but thought that since it
worked before there might be a new "bug" in HeidiSQL.
I did a comparison between the same data as exported from HeidiSQL and Excel to try
and debug the problem myself. I could only see that the header of the file exported
by HeidiSQL contains hex FF FE which Excel doesn't, and also the hex 00 inbetween
other characters as mentioned before. Perhaps the problem actually lies with the FF
FE header part and not the 00 inbetween characters?
Attached the original data in SQL as well as .csv format (exported by HeidiSQL).
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rosenfie...@gmail.com
Ok, so we've got two issues:
1) Perl script does not understand UTF-8.
2) Perl script chokes on NUL characters.
Neither of which seems to be a bug in HeidiSQL.
It's true that older versions of HeidiSQL used the local code page for exports
(often iso-8859-1), but that's not a viable solution going forward, since it cannot
express many international characters, and therefore did not work well for people
operating in multilingual environments.
My suggestion would be to fix the Perl script. Either open the file as UTF-8:
open(FH, "<:encoding(UTF-8)", "mydata.csv");
Or run an iconv/tr pair to do a lossy conversion to ASCII (or your local codepage)
and remove NUL characters:
system("iconv", "-f utf8 -t ascii --unicode-subst=' ' mydata.csv > mydata2.csv");
system("tr", "-d '\000' < mydata2.csv > mydata3.csv");
// (open mydata3.csv which is now in ascii and without NULs found in original data)
Hope this helps?