Here's a description of a situation.
Recently, phpMyAdmin was automatically updated to
version 2.6.3-pl1 (patch level 1) which is the most stable
release as of this writing.
Immediately phpMyAdmin was not able to correctly
display data stored in MySQL tables encoded as Arabic
Windows-1256 (or cp-1256)
In the new home page of phpMyAdmin
(http://www.domain.com/phpMyAdmin/index.php) there
are:
a. A "Language" drop-dwon list which identifies the
language in which further phpMyAdmin pages will
be "rendered" by the browser. The options in this drop-
dwon list include many languages, unfortuantely all of
them in the xx-utf-8 format. This is a key issue here.
Earlier versions allowed non-unicode rendering
possibilities such as Arabic (win-1256) along with many
others. I don't understand why this was removed, forcing
users into using unicode only.
b. A "MySQL charset" with only the option of "UTF-8
Unicode (utf8)"
c. A "MySQL connection collation" and the options
range from choosing no collation at all to various
unicode and non-unicode collation.
I'd like to stress here that we'er tackling a "rendring
problem" i.e how the browser renders the characters
sent to it by the webserver, not the format in which the
data are actually stored in MySQL. This clear
distinction explains many things.
All the site pages that read data from or writes data into
MySQL display and store the characters correctly,
because they tell the browser explicitly how they are
encoded. This is achieved by this html line present in all
of the site pages:
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1256">
Now phpMyAdmin pages doesn't have such a line. In
fact a "view->source" of any phpMyAdmin page will
show a line such as this one:
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />
with a subesquent
<link rel="stylesheet" type="text/css"
href="./css/phpmyadmin.css.php?lang=xx-utf-
8&js_frame=right" />
defining the style sheet to be used to render the page --
where xx-utf-8 corresponds to the chosen "Language"
originally set in the phpMyAdmin home page, e.g. en-utf-
8, ar-utf-8, etc.
The above explains everything. phpMyAdmin pages
WILL NOT BE ABLE TO DISPLAY ANY NON-ENGLISH
CHARACTERS RETRIEVED FROM MYSQL UNLESS
THEY WERE SAVED IN THE DATABASE AS UTF-
ENCODED ENTITIES. If unfortunately you store your
non-english data in anyway other than utf, phpMyAdmin
will display them as gebberish. Plain and simple.
Now how we can overcome this problem?
1. Ask phpMyAdmin development team to allow the
option of non-utf rendering from the "Language" drop-
down list in its home page. This was available in
previous versions of phpMyAdmin.
2. Ask the browser (here, Internet Explorer) to render
the phpMyAdmin pages with the encoding we want
using its function "view->encoding->xxxxx".
Unfortunately this is not attainable. The phpMyAdmin
development team intentionally chose to include
a "lang=xx-utf-8" in the http request header of each and
every phpMyAdmin page, where the value xx-utf-8 is the
one set at the very begining home page of phpMyAdmin
by your choice of the "Language"
Do this exercise:
Suppose you have a database called my_schema, and
a table in that database storing non-English content and
is called my_table. Now ...
a. Login to phpMyAdmin, choose whatever combination
of language and collation you want .. it doesn't really
matter
b. From the left-hand drop-down list chose
your "my_schema" database
c. A list of tables with "my_table" now should appear in
the left hand frame of the phpMyAdmin home page.
d. Now move your mouse over "my_table" (whcih is a
link), right click and choose "copy shortcut"
e. Start "Notepad" or your favorite plain-text editor, and
do a paste.
What you would get is something like this:
http://www.domain.com/phpMyAdmin/tbl_properties_stru
cture.php?lang=xx-utf-
8&server=1&collation_connection=xxxxxxx&db=my_sch
ema&table=my_table
As you can see, the http request explicitly states page
encoding with this parameter in the string "lang=xx-utf-
8" and "collation_connection=xxxxxx" exactly as set by
the first chosices in step #a. We are stuck with these
options all through the phpMyAdmin session.
f. Now click on the my_table link. You should now see
the table stucture in the right-hand frame of the
phpMyAdmin page.
g. Reach up to the "Browse" link, and repeat the
procedure in #d and #e
What you would get is something like this:
http://www.domain.com/phpMyAdmin/sql.php?lang=xx-
utf-
8&server=1&collation_connection=xxxxxxxx&db=my_sc
hema&table=my_table&goto=tbl_properties_structure.ph
p&back=tbl_properties_structure.php&sql_query=SELE
CT+%2A+FROM+%60my_table%60&pos=0
h. Click on the "Browse" link. You now see the first 30
records of your table, with gebberish replacing your non-
English conent of my_table records.
i. Now tell Internet Explorer to use another encoding by
using "view->encoding->xxxxx". Since this page is a
dynamically generated one, what IE will really do is
submit the http request to the server again exactly as
shown in #g resulting in the page dispalyed
with "lang=xx-utf-8" encoding, and effectively nullifying
the purpose of the request.
j. Now someone might say let's try this with FireFox.
Great. This will only help to solidfy the explanation of
this "bug". FireFox instead of resubmitting the http
request to the server will try to work on the locally
cached version first. Alas, in this situation FireFox has
a problem in translating this gebberish to win-1256
codes properly. It does translate this gebbberish into
Arabic, but with mismatched characters, resulting in a
meaningless content. In other cases, it might not have
this translation problem. So this might really work.
Now we come to the last option
3. Re-encode the content in the database into UTF-8.
This is really such an awkward solution. In order to
overcome a glitch/lack of feature in phpMyAdmin we're
going to destory a functioning site. Unless we really
intended to do this conversin of a non-English/non-
Unicode site and non-Enlgish/non-Unicode database
content into Unicode, but never had the time to do it,
this might be the correct time. Other than this, I don't
see -regardless of the means or the tricks needed to do
the actual conversion- how we can break the code of the
site pages that read and write into MySQL correctly, in
whatever encoding that was origianlly chosen when the
site was originally developed, only to get phpMyAdmin
to display the data correctly.
Conclusion: phpMyAdmin in its current version supports
only utf connection to MySQL and utf rendring of the
resulting pages. phpMyAdmin should allow rendering of
data encoded in ways other than xx-utf-8
Your comments are welcome.
moallam
Logged In: YES
user_id=326580
current stable is 2.6.4.PL.2 not 2.6.3.PL.1
can you please send a short dump of any db/table which
inherits this problem?
dev: shouldnt go all output from MySQL through
htmlentities()? and so it should always be viewable or
non'glitchy'?
Logged In: YES
user_id=192186
Sorry didn't read whole stuff, try to be more brief, but
after reading first half, it is obvious that you don't have
correct collation settings for your tables. Moving to
support requests.
Sebastian: I don't see any relevance to this, but yes,
everything should go through htmlentities (except output
from MIME transformations, which handle this as they need).
Logged In: YES
user_id=1363580
As per the request of cybot_tm a short dump of one of the
affected tables is attached
Sample dump, 100 records, data are Arabic win-1256
Logged In: YES
user_id=192186
You have set CHARSET=latin1 for that table, so you can't
expect that it can handle arabic character set. Please ready
MySQL documentation about charsets and collations.
Logged In: YES
user_id=1363580
Yes I know, and I've read the sections you're referring to --
the full length of chapter 10 in MySQL documentation.
The table sample sent was created before MySQL
implemented charsets and collations. The
default "latin1_swedish_ci" collation was added to all fields of
all tables, and all tables in all databases automatically upon
the latest MySQL migration.
Point is: Earlier versions of PMA handled the display of
contents of such tables gracefully when they didn't
mandate "xx-utf-8" for page encoding allowing the user to set
his own encoding using the browser's view->encoding and
when they didn't manade a MySQL charset as "UTF-8
Unicode (utf8)". This was true even after MySQL
implemented charsets and collations.
Now I see the PMA team wants to conform to unicode as the
standard for "page display" and "MySQL" server-wide
charset, leaving the handling of "foreign code pages"
to "connection collations" and the rendering to htmlentities().
But you have to cosider that many non-latin MySQL
databases were migrated to the newer version and their
tables and fields were defaulted to the totally
wrong "latin1_swedish_ci" collation.
Due to this "wrong default", trying to convert the "collation
type" of the field "name" on the sample table I sent you to
the correct "cp1256_gnereal_ci" will only result in converting
all those Arabic characters to "?". Try it. Here's the SQL:
ALTER TABLE `consultation` CHANGE `name` `name`
VARCHAR( 50 ) CHARACTER SET cp1256 COLLATE
cp1256_general_ci NOT NULL
And unless you have a backup, this "can not be undone".
This resulted in catastrophic data losses to many
webdevelopers with non-latin content databases. Ask me.
The same is true if you go about doing this conversion from
PMA. If you have database tables created before MySQL
implemented charsets and collations, and you wanted to
alter the "default collation attribute" to the correct type of
encoding used for the data in a field using PMA, this will
result in instructing mysql to "translate" the stored data to
the chosen collation type (e.g. cp1256_general_ci). Since
MySQL has the wrong information, to begin with, that these
data are default "latin1_swedish_ci", it will do a terrible job
trying to re-encode them to the chosen collation. And, when
it miserably fails, it will just insert a "?" in place of each and
every character it failed to re-encode!
Hence, the request to let PMA not be so strict about
unicode, and to allow, like earlier versions, setting the
rendering of the rendered content, whether hard coded or
dynamically read from MySQL to the user preferred
encoding. This will solve lots of problems, and I think is
a "users" right.
In the mean time, if you know of a "graceful" way that would
let me change the dreadful default
collation "latin1_swedich_ci" currently set for all the fields of
all my tables of all my non-latin databases WITHOUT
converting them to "???" please let me know.
Regards,
moallam
Logged In: YES
user_id=192186
Previous MySQL version didn't allow getting data in
specified charset, while upgrading to MySQL with charset
support you have to set correct charset on all tables (or
changing default if all are same).
About conversion: RTFM
http://dev.mysql.com/doc/refman/4.1/en/alter-table.html#id2777213,
it's all written there.
Logged In: YES
user_id=1363580
Hey Nijel,
Thanks a lot, you've made my day! After RTFMing, these two
commands did the conversion trick
ALTER TABLE `consultaion` CHANGE `name` `name` BLOB;
ALTER TABLE `consultaion` CHANGE `name` `name` TEXT
CHARACTER SET cp1256;
However, they need to be done on each and every column of
each table of the database. But no prob. Also, now
everything is correct in PMA, including inserting and updating
records. This is because PMA correctly handles connection
collation and charset. Unfortunately, the site pages now
display the database records as "???? ???" because these
php pages were not programmed with collation/charsets in
mind.
Now would you tell me where to RTFM about a "quick and
dirty" way to fix displaying/entering site data correctly, as
does PMA now? This is until I finally get around sometime to
recode everything into Unicode (I swear upon my scouts
honour, and am crossing my fingers).
I'm thinking mysql_query(" SET NAMES 'cp1256' "); after
establishing MySQL connection. Am I thinking correct?
Many thanks,
Mo
Logged In: YES
user_id=192186
Basically yes, see
http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html
Logged In: YES
user_id=1312539
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).