Menu

#648 Export to CSV corrupt

NeedInfo
nobody
None
Problematic
Defect
2008-08-01
2008-07-30
Anonymous
No

Originally created by: martinkr...@gmail.com

What steps will reproduce the problem?
1. Query some database table
2. Export data from query output as .csv
3. Settings for .csv output is:

Field-seperator: ;
Line-terminator: \r\n

What is the expected output? What do you see instead?
See attached .txt file with HEX view from UltraEdit.
The file browsed is a file with some detail of an .csv export done by
HeidiSQL and the same output as generated by MS Excel.
It seems that HeidiSQL is adding some illegal characters (00 hex) inbetween
valid characters.

What version of the product are you using? On what operating system?
I'm using HeidiSQL rev 1603 at the moment.

Please provide any additional information below.
See attached file with comparison between HeidiSQL csv export and Excel csv
export of same data.

1 Attachments

Discussion

  • Anonymous

    Anonymous - 2008-07-31

    Originally posted by: spero.sh...@gmail.com

    This seems to have started in rev 1549, rev 1548 and earlier looks to be unaffected.

     
  • Anonymous

    Anonymous - 2008-07-31

    Originally posted by: a...@anse.de

    The attached file was saved as an ANSI-file while the original CSV output creates a
    UTF16-LE file. Do not open multibyte charset files with a ANSI only editor or the
    file will look corrupted.

    If we would save CSV files as ANSI it would silently (!) convert critical characters
    to their representation in the ANSI charset. That's what [r1549] fixes. Now, please
    don't tell me Excel doesn't support Unicode files? What a shitty peace of software
    is that?

    Labels: -Priority-Medium Severity-Problematic
    Status: Accepted

     

    Related

    Commit: [r1549]

  • Anonymous

    Anonymous - 2008-07-31

    Originally posted by: a...@anse.de

    At least Excel 2007 loads Unicode CSV files correctly (even if I leave the ANSI
    option enabled while importing which is funny but anyway). So, do we have a problem
    here at all? The original report says nothing about some real error here, does it?

     
  • Anonymous

    Anonymous - 2008-08-01

    Originally posted by: rosenfie...@gmail.com

    There is not enough information in the original report to say whether there is a bug
    or not.  Specifically, there is no mention of whether the source data contains a NUL
    character or not.

    I'm closing this bug.  To reopen, please provide an SQL export of the original data,
    and we'd be happy to re-investigate.  (Make an SQL export with version 3.2 also,
    just so we're sure that it's pristine and unaffected by Unicode issues.)

    ...

    And on a more theoretical note, I don't think there exists a CSV standard, so
    precisely how NUL characters should be handled is probably not documented anywhere,
    and will be handled different by different applications.

    Since HeidiSQL aims to be user friendly, I guess it could pop up a warning after
    data export to tell the user that his/her data contains NUL characters, and that
    some applications may be unable to handle this.

    Perhaps such a dialog could even have a link to a help file or web page containing
    advice on how to convert NUL to SPACE with for example an SQL sentence against the
    original data or a hex editor against the exported file.

    Status: NeedInfo

     
  • Anonymous

    Anonymous - 2008-08-01

    Originally posted by: martinkr...@gmail.com

    Thanks for the feedback gents.

    This problem was actually picked up when using data exported from HeidiSQL as .csv as
    input for a Perl script. The Perl script is "blind" for ANSI / UTF format settings as
    it only makes use of the function split(';', $line) to break the data apart as needed
    by the script.
    Since Heidi rev 1549 I have to open up the HeidiSQL .csv export from Excel and save
    again as .csv to be able to use it in my Perl script. I can of course change the
    script source to "fix" the .csv file before using it, but thought that since it
    worked before there might be a new "bug" in HeidiSQL.

    I did a comparison between the same data as exported from HeidiSQL and Excel to try
    and debug the problem myself. I could only see that the header of the file exported
    by HeidiSQL contains hex FF FE which Excel doesn't, and also the hex 00 inbetween
    other characters as mentioned before. Perhaps the problem actually lies with the FF
    FE header part and not the 00 inbetween characters?

    Attached the original data in SQL as well as .csv format (exported by HeidiSQL).

     
  • Anonymous

    Anonymous - 2008-08-01

    Originally posted by: rosenfie...@gmail.com

    Ok, so we've got two issues:

    1) Perl script does not understand UTF-8.

    2) Perl script chokes on NUL characters.

    Neither of which seems to be a bug in HeidiSQL.

    It's true that older versions of HeidiSQL used the local code page for exports
    (often iso-8859-1), but that's not a viable solution going forward, since it cannot
    express many international characters, and therefore did not work well for people
    operating in multilingual environments.

    My suggestion would be to fix the Perl script.  Either open the file as UTF-8:

      open(FH, "<:encoding(UTF-8)", "mydata.csv");

    Or run an iconv/tr pair to do a lossy conversion to ASCII (or your local codepage)
    and remove NUL characters:

      system("iconv", "-f utf8 -t ascii --unicode-subst=' ' mydata.csv > mydata2.csv");
      system("tr", "-d '\000' < mydata2.csv > mydata3.csv");
      // (open mydata3.csv which is now in ascii and without NULs found in original data)

    Hope this helps?