Hm, that's too bad... unfortunately, from a web search, it looks like there's no guaranteed way to determine the encoding of a file. So a dropdown might be the only solution; either that or requiring files to be in UTF-8 encoding.


> I have trouble with ImportCSV function in DataTranfer extension. The
> ImportCSV works fine with english csv file but fails with Chinese csv file.
> The version of DataTransfer is the latest 0.3.2. Both the MySql and wiki-db
> are set to UTF-8 character set.

I can confirm this. Importing the attached CSV file with the Data Transfer
extension creates the following page:

whereas it should have created

This looks like double UTF-8 encoding...

After some digging I found that probably Yaron's "fix" in version 0.3.1 causes
this problem:

Changing the following in specials/DT_ImportCSV.php makes DataTransfer work
correctly with UTF-8 files:

@@ -102,8 +102,7 @@
                       // fix values in case the file wasn't UTF-8 encoded -
                       // hopefully the UTF-8 value will work across all
                       // database encodings
-                       $encoded_line = array_map('utf8_encode', $line);
-                       array_push($table, $encoded_line);
+                       array_push($table, $line);
               // check header line to make sure every term is in the

As I see it, DataTransfer either needs a dropdown to specify the encoding of
the file prior to pressing the Import button (which would then trigger a
conversion from the specified encoding to UTF-8 on the server), or
DataTransfer's documentation must dictate that all files must be encoded in an
encoding that supports all languages - meaning UTF-8 in practise. UTF-16,
which is the default on Windows when you choose "Unicode" in Notepad, would be
another option, but would require conversion on the server, since the server
uses UTF-8.


