I'm new to PHPLOT and I'm trying to learn how charsets work before I do further development.
I've enabled TTF fonts and I'm using the Verdana font bundled with Windows. According to "The Font Thing", the font has the euro (€) symbol at 0x80 and the currency (¤) symbol at 0xA4. The system runs the Spanish version of Windows. To sum up, my setup is ready for the Windows-1252 codepage.
However, with PHPLOT, 0xA4 gets rendered as "currency" (as expected) but 0x80 is a "character not found" square.
My test script is pretty simple, the file is encoded as ANSI and I've tried several approaches to provide the characters: "\x080", chr(0x80), '€'...
What can the problem be?
My setup:
Windows Server 2003 R2
PHP 5.2.6
GD bundled (2.0.34 compatible)
FreeType Version 2.1.9
Verdana 2.43
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are two magic incantations that summon a Euro: "€" and "\xe2\x82\xac".
I can try to explain. If you already understand, you are probably better off not reading this...
On non-Unicode systems (some Linux, e.g.) the ISO-8859-1 character set has the "currency" symbol (looks like a square with 4 lines radiating from the corners) is at position 0xa4. ISO-8859-1 pre-dates the whole Euro thing. If you want Euros, you have to use ISO-8859-15 which is an updated version of ISO-8859-1 and contains a Euro at position 0xa4.
On Unicode systems like Windows, the Euro is at U+20AC in Unicode fonts. Your Windows code page doesn't matter, as far as I know, when you use Unicode. Now PHP doesn't support Unicode directly yet. (It's coming in PHP6.) But it works with fonts if you manually encode the characters. You can do this in one of two ways: "&#xxxx;" where xxxx is the decimal Unicode value, for example 0x20AC = decimal 8364. Method 2 is to encode the value in UTF-8 which produces 1 or more bytes, which are then represented as hex characters in a PHP string. How you get from 0x20AC to the UTF-8 sequence "\xe2\x82\xac" is left as an exercise for the reader, but it's just a matter of plugging the bits into the right positions.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've browsed the source code and the clue is in PHP's imagettftext() function, which is where the text gets rendered. Its manual page explains that it expects UTF-8 encoding and also accepts numeric HTML entities.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm new to PHPLOT and I'm trying to learn how charsets work before I do further development.
I've enabled TTF fonts and I'm using the Verdana font bundled with Windows. According to "The Font Thing", the font has the euro (€) symbol at 0x80 and the currency (¤) symbol at 0xA4. The system runs the Spanish version of Windows. To sum up, my setup is ready for the Windows-1252 codepage.
However, with PHPLOT, 0xA4 gets rendered as "currency" (as expected) but 0x80 is a "character not found" square.
My test script is pretty simple, the file is encoded as ANSI and I've tried several approaches to provide the characters: "\x080", chr(0x80), '€'...
What can the problem be?
My setup:
Windows Server 2003 R2
PHP 5.2.6
GD bundled (2.0.34 compatible)
FreeType Version 2.1.9
Verdana 2.43
Thanks for following up.
> '€'
Numeric character entities in hex. Probably best to avoid these, since I
don't think they work in other contexts (HTML, XHTML).
> html_entity_decode('€', ENT_NOQUOTES, 'UTF-8')
That one is very useful. Self-explaining rather than the cryptic codes.
> chr(0xE2) . chr(0x82) . chr(0xAC)
Same as "\xe2\x82\xac"
> iconv('iso-8859-15', 'utf-8', chr(0xA4))
Could be useful if you have a whole string in iso-8859-15.
There are two magic incantations that summon a Euro: "€" and "\xe2\x82\xac".
I can try to explain. If you already understand, you are probably better off not reading this...
On non-Unicode systems (some Linux, e.g.) the ISO-8859-1 character set has the "currency" symbol (looks like a square with 4 lines radiating from the corners) is at position 0xa4. ISO-8859-1 pre-dates the whole Euro thing. If you want Euros, you have to use ISO-8859-15 which is an updated version of ISO-8859-1 and contains a Euro at position 0xa4.
On Unicode systems like Windows, the Euro is at U+20AC in Unicode fonts. Your Windows code page doesn't matter, as far as I know, when you use Unicode. Now PHP doesn't support Unicode directly yet. (It's coming in PHP6.) But it works with fonts if you manually encode the characters. You can do this in one of two ways: "&#xxxx;" where xxxx is the decimal Unicode value, for example 0x20AC = decimal 8364. Method 2 is to encode the value in UTF-8 which produces 1 or more bytes, which are then represented as hex characters in a PHP string. How you get from 0x20AC to the UTF-8 sequence "\xe2\x82\xac" is left as an exercise for the reader, but it's just a matter of plugging the bits into the right positions.
It seems the key is Unicode. However, I was not using the proper syntax in my previous tests. All these do work:
'€'
'€'
html_entity_decode('€', ENT_NOQUOTES, 'UTF-8')
chr(0xE2) . chr(0x82) . chr(0xAC)
iconv('iso-8859-15', 'utf-8', chr(0xA4)
I've browsed the source code and the clue is in PHP's imagettftext() function, which is where the text gets rendered. Its manual page explains that it expects UTF-8 encoding and also accepts numeric HTML entities.