Hm, that's too bad... unfortunately, from a web search, it looks like there's no guaranteed way to determine the encoding of a file. So a dropdown might be the only solution; either that or requiring files to be in UTF-8 encoding.

-Yaron


2009/7/17 Patrick Nagel <mail@patrick-nagel.net>
Hi,

On 2009-07-17 15:02, mingkai dong wrote:
> I have trouble with ImportCSV function in DataTranfer extension. The
> ImportCSV works fine with english csv file but fails with Chinese csv file.
>
> The version of DataTransfer is the latest 0.3.2. Both the MySql and wiki-db
> are set to UTF-8 character set.

I can confirm this. Importing the attached CSV file with the Data Transfer
extension creates the following page:

http://patrick-nagel.net/wiki/%C3%86%C2%B5%C2%8B%C3%A8%C2%AF%C2%95

whereas it should have created

http://patrick-nagel.net/wiki/%E6%B5%8B%E8%AF%95

This looks like double UTF-8 encoding...

After some digging I found that probably Yaron's "fix" in version 0.3.1 causes
this problem:

Changing the following in specials/DT_ImportCSV.php makes DataTransfer work
correctly with UTF-8 files:

@@ -102,8 +102,7 @@
                       // fix values in case the file wasn't UTF-8 encoded -
                       // hopefully the UTF-8 value will work across all
                       // database encodings
-                       $encoded_line = array_map('utf8_encode', $line);
-                       array_push($table, $encoded_line);
+                       array_push($table, $line);
               }
               fclose($csv_file);
               // check header line to make sure every term is in the

As I see it, DataTransfer either needs a dropdown to specify the encoding of
the file prior to pressing the Import button (which would then trigger a
conversion from the specified encoding to UTF-8 on the server), or
DataTransfer's documentation must dictate that all files must be encoded in an
encoding that supports all languages - meaning UTF-8 in practise. UTF-16,
which is the default on Windows when you choose "Unicode" in Notepad, would be
another option, but would require conversion on the server, since the server
uses UTF-8.

Patrick

--
Key ID: 0x86E346D4            http://patrick-nagel.net/key.asc
Fingerprint: 7745 E1BE FA8B FBAD 76AB 2BFC C981 E686 86E3 46D4

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Semediawiki-user mailing list
Semediawiki-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-user