Menu

#789 phpMyAdmin rendering glitch

closed
5
2014-08-12
2005-10-17
moallam
No

Here's a description of a situation.

Recently, phpMyAdmin was automatically updated to
version 2.6.3-pl1 (patch level 1) which is the most stable
release as of this writing.

Immediately phpMyAdmin was not able to correctly
display data stored in MySQL tables encoded as Arabic
Windows-1256 (or cp-1256)

In the new home page of phpMyAdmin
(http://www.domain.com/phpMyAdmin/index.php) there
are:

a. A "Language" drop-dwon list which identifies the
language in which further phpMyAdmin pages will
be "rendered" by the browser. The options in this drop-
dwon list include many languages, unfortuantely all of
them in the xx-utf-8 format. This is a key issue here.
Earlier versions allowed non-unicode rendering
possibilities such as Arabic (win-1256) along with many
others. I don't understand why this was removed, forcing
users into using unicode only.

b. A "MySQL charset" with only the option of "UTF-8
Unicode (utf8)"

c. A "MySQL connection collation" and the options
range from choosing no collation at all to various
unicode and non-unicode collation.

I'd like to stress here that we'er tackling a "rendring
problem" i.e how the browser renders the characters
sent to it by the webserver, not the format in which the
data are actually stored in MySQL. This clear
distinction explains many things.

All the site pages that read data from or writes data into
MySQL display and store the characters correctly,
because they tell the browser explicitly how they are
encoded. This is achieved by this html line present in all
of the site pages:

<meta http-equiv="Content-Type" content="text/html;
charset=windows-1256">

Now phpMyAdmin pages doesn't have such a line. In
fact a "view->source" of any phpMyAdmin page will
show a line such as this one:

<meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />

with a subesquent

<link rel="stylesheet" type="text/css"
href="./css/phpmyadmin.css.php?lang=xx-utf-
8&amp;js_frame=right" />

defining the style sheet to be used to render the page --
where xx-utf-8 corresponds to the chosen "Language"
originally set in the phpMyAdmin home page, e.g. en-utf-
8, ar-utf-8, etc.

The above explains everything. phpMyAdmin pages
WILL NOT BE ABLE TO DISPLAY ANY NON-ENGLISH
CHARACTERS RETRIEVED FROM MYSQL UNLESS
THEY WERE SAVED IN THE DATABASE AS UTF-
ENCODED ENTITIES. If unfortunately you store your
non-english data in anyway other than utf, phpMyAdmin
will display them as gebberish. Plain and simple.

Now how we can overcome this problem?

1. Ask phpMyAdmin development team to allow the
option of non-utf rendering from the "Language" drop-
down list in its home page. This was available in
previous versions of phpMyAdmin.

2. Ask the browser (here, Internet Explorer) to render
the phpMyAdmin pages with the encoding we want
using its function "view->encoding->xxxxx".
Unfortunately this is not attainable. The phpMyAdmin
development team intentionally chose to include
a "lang=xx-utf-8" in the http request header of each and
every phpMyAdmin page, where the value xx-utf-8 is the
one set at the very begining home page of phpMyAdmin
by your choice of the "Language"

Do this exercise:
Suppose you have a database called my_schema, and
a table in that database storing non-English content and
is called my_table. Now ...

a. Login to phpMyAdmin, choose whatever combination
of language and collation you want .. it doesn't really
matter
b. From the left-hand drop-down list chose
your "my_schema" database
c. A list of tables with "my_table" now should appear in
the left hand frame of the phpMyAdmin home page.
d. Now move your mouse over "my_table" (whcih is a
link), right click and choose "copy shortcut"
e. Start "Notepad" or your favorite plain-text editor, and
do a paste.
What you would get is something like this:

http://www.domain.com/phpMyAdmin/tbl_properties_stru
cture.php?lang=xx-utf-
8&server=1&collation_connection=xxxxxxx&db=my_sch
ema&table=my_table

As you can see, the http request explicitly states page
encoding with this parameter in the string "lang=xx-utf-
8" and "collation_connection=xxxxxx" exactly as set by
the first chosices in step #a. We are stuck with these
options all through the phpMyAdmin session.

f. Now click on the my_table link. You should now see
the table stucture in the right-hand frame of the
phpMyAdmin page.

g. Reach up to the "Browse" link, and repeat the
procedure in #d and #e
What you would get is something like this:

http://www.domain.com/phpMyAdmin/sql.php?lang=xx-
utf-
8&server=1&collation_connection=xxxxxxxx&db=my_sc
hema&table=my_table&goto=tbl_properties_structure.ph
p&back=tbl_properties_structure.php&sql_query=SELE
CT+%2A+FROM+%60my_table%60&pos=0

h. Click on the "Browse" link. You now see the first 30
records of your table, with gebberish replacing your non-
English conent of my_table records.

i. Now tell Internet Explorer to use another encoding by
using "view->encoding->xxxxx". Since this page is a
dynamically generated one, what IE will really do is
submit the http request to the server again exactly as
shown in #g resulting in the page dispalyed
with "lang=xx-utf-8" encoding, and effectively nullifying
the purpose of the request.

j. Now someone might say let's try this with FireFox.
Great. This will only help to solidfy the explanation of
this "bug". FireFox instead of resubmitting the http
request to the server will try to work on the locally
cached version first. Alas, in this situation FireFox has
a problem in translating this gebberish to win-1256
codes properly. It does translate this gebbberish into
Arabic, but with mismatched characters, resulting in a
meaningless content. In other cases, it might not have
this translation problem. So this might really work.

Now we come to the last option

3. Re-encode the content in the database into UTF-8.
This is really such an awkward solution. In order to
overcome a glitch/lack of feature in phpMyAdmin we're
going to destory a functioning site. Unless we really
intended to do this conversin of a non-English/non-
Unicode site and non-Enlgish/non-Unicode database
content into Unicode, but never had the time to do it,
this might be the correct time. Other than this, I don't
see -regardless of the means or the tricks needed to do
the actual conversion- how we can break the code of the
site pages that read and write into MySQL correctly, in
whatever encoding that was origianlly chosen when the
site was originally developed, only to get phpMyAdmin
to display the data correctly.

Conclusion: phpMyAdmin in its current version supports
only utf connection to MySQL and utf rendring of the
resulting pages. phpMyAdmin should allow rendering of
data encoded in ways other than xx-utf-8

Your comments are welcome.
moallam

Discussion

  • moallam

    moallam - 2005-10-17
    • priority: 5 --> 9
     
  • Sebastian Mendel

    Logged In: YES
    user_id=326580

    current stable is 2.6.4.PL.2 not 2.6.3.PL.1

    can you please send a short dump of any db/table which
    inherits this problem?

    dev: shouldnt go all output from MySQL through
    htmlentities()? and so it should always be viewable or
    non'glitchy'?

     
  • Michal Čihař

    Michal Čihař - 2005-10-18

    Logged In: YES
    user_id=192186

    Sorry didn't read whole stuff, try to be more brief, but
    after reading first half, it is obvious that you don't have
    correct collation settings for your tables. Moving to
    support requests.

    Sebastian: I don't see any relevance to this, but yes,
    everything should go through htmlentities (except output
    from MIME transformations, which handle this as they need).

     
  • Michal Čihař

    Michal Čihař - 2005-10-18
    • priority: 9 --> 5
    • labels: 509105 --> Internationalization
    • milestone: 506930 -->
     
  • Michal Čihař

    Michal Čihař - 2005-10-18
    • assigned_to: nobody --> nijel
     
  • moallam

    moallam - 2005-10-18

    Logged In: YES
    user_id=1363580

    As per the request of cybot_tm a short dump of one of the
    affected tables is attached

     
  • moallam

    moallam - 2005-10-18

    Sample dump, 100 records, data are Arabic win-1256

     
  • Michal Čihař

    Michal Čihař - 2005-10-18

    Logged In: YES
    user_id=192186

    You have set CHARSET=latin1 for that table, so you can't
    expect that it can handle arabic character set. Please ready
    MySQL documentation about charsets and collations.

     
  • moallam

    moallam - 2005-10-18

    Logged In: YES
    user_id=1363580

    Yes I know, and I've read the sections you're referring to --
    the full length of chapter 10 in MySQL documentation.

    The table sample sent was created before MySQL
    implemented charsets and collations. The
    default "latin1_swedish_ci" collation was added to all fields of
    all tables, and all tables in all databases automatically upon
    the latest MySQL migration.

    Point is: Earlier versions of PMA handled the display of
    contents of such tables gracefully when they didn't
    mandate "xx-utf-8" for page encoding allowing the user to set
    his own encoding using the browser's view->encoding and
    when they didn't manade a MySQL charset as "UTF-8
    Unicode (utf8)". This was true even after MySQL
    implemented charsets and collations.

    Now I see the PMA team wants to conform to unicode as the
    standard for "page display" and "MySQL" server-wide
    charset, leaving the handling of "foreign code pages"
    to "connection collations" and the rendering to htmlentities().
    But you have to cosider that many non-latin MySQL
    databases were migrated to the newer version and their
    tables and fields were defaulted to the totally
    wrong "latin1_swedish_ci" collation.

    Due to this "wrong default", trying to convert the "collation
    type" of the field "name" on the sample table I sent you to
    the correct "cp1256_gnereal_ci" will only result in converting
    all those Arabic characters to "?". Try it. Here's the SQL:

    ALTER TABLE `consultation` CHANGE `name` `name`
    VARCHAR( 50 ) CHARACTER SET cp1256 COLLATE
    cp1256_general_ci NOT NULL

    And unless you have a backup, this "can not be undone".
    This resulted in catastrophic data losses to many
    webdevelopers with non-latin content databases. Ask me.

    The same is true if you go about doing this conversion from
    PMA. If you have database tables created before MySQL
    implemented charsets and collations, and you wanted to
    alter the "default collation attribute" to the correct type of
    encoding used for the data in a field using PMA, this will
    result in instructing mysql to "translate" the stored data to
    the chosen collation type (e.g. cp1256_general_ci). Since
    MySQL has the wrong information, to begin with, that these
    data are default "latin1_swedish_ci", it will do a terrible job
    trying to re-encode them to the chosen collation. And, when
    it miserably fails, it will just insert a "?" in place of each and
    every character it failed to re-encode!

    Hence, the request to let PMA not be so strict about
    unicode, and to allow, like earlier versions, setting the
    rendering of the rendered content, whether hard coded or
    dynamically read from MySQL to the user preferred
    encoding. This will solve lots of problems, and I think is
    a "users" right.

    In the mean time, if you know of a "graceful" way that would
    let me change the dreadful default
    collation "latin1_swedich_ci" currently set for all the fields of
    all my tables of all my non-latin databases WITHOUT
    converting them to "???" please let me know.

    Regards,
    moallam

     
  • Michal Čihař

    Michal Čihař - 2005-10-18

    Logged In: YES
    user_id=192186

    Previous MySQL version didn't allow getting data in
    specified charset, while upgrading to MySQL with charset
    support you have to set correct charset on all tables (or
    changing default if all are same).

    About conversion: RTFM
    http://dev.mysql.com/doc/refman/4.1/en/alter-table.html#id2777213,
    it's all written there.

     
  • moallam

    moallam - 2005-10-18

    Logged In: YES
    user_id=1363580

    Hey Nijel,

    Thanks a lot, you've made my day! After RTFMing, these two
    commands did the conversion trick

    ALTER TABLE `consultaion` CHANGE `name` `name` BLOB;
    ALTER TABLE `consultaion` CHANGE `name` `name` TEXT
    CHARACTER SET cp1256;

    However, they need to be done on each and every column of
    each table of the database. But no prob. Also, now
    everything is correct in PMA, including inserting and updating
    records. This is because PMA correctly handles connection
    collation and charset. Unfortunately, the site pages now
    display the database records as "???? ???" because these
    php pages were not programmed with collation/charsets in
    mind.

    Now would you tell me where to RTFM about a "quick and
    dirty" way to fix displaying/entering site data correctly, as
    does PMA now? This is until I finally get around sometime to
    recode everything into Unicode (I swear upon my scouts
    honour, and am crossing my fingers).
    I'm thinking mysql_query(" SET NAMES 'cp1256' "); after
    establishing MySQL connection. Am I thinking correct?

    Many thanks,
    Mo

     
  • Michal Čihař

    Michal Čihař - 2005-10-18
    • status: open --> pending
     
  • SourceForge Robot

    • status: pending --> closed
     
  • SourceForge Robot

    Logged In: YES
    user_id=1312539

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.