Encodings with Excelreader

Help
J White
2008-07-17
2013-05-02
  • J White
    J White
    2008-07-17

    Hi all,

    I have an Excel file with some spanish names in it. eg. penthouse in Puerto Banús.  So th 'u' of Banus is with an accent.  The Excel file itself was created from an XML file, which had its encoding set to ISO-8859-1.

    When my PHP program uses the Excelreader to import the data from the Excel file, it writes it to my MySQL database, to UTF-8 encoded fields.  This is because my database is based on Unicode, and UTF-8 was the best for this.

    Now, the issue is that when I display the data - it displays as penthouse in Puerto Ban?pan>.

    So, its not handling the conversion.

    I tried changing functions.cls.php with:

    //$data->setOutputEncoding('CP1251'); // jw 24/9/07
    //$data->setOutputEncoding('CP1252'); // jw 17/7/08
    $data->setOutputEncoding('UTF-8'); // jw 24/9/07

    Initially it was UTF-8 and this never used to give problems. I am now seeing the problem after importing this latest Excel file, based on an XML file with ISO-8859-1 encoding.
    I tried importing with CP1251 and CP1252 (I understand CP1252 is equivalent to ISO-8859-1), but this didnt make any difference.

    Does anyone have some idea what I can do?

    Regards, Jon

     
    • J White
      J White
      2008-09-01

      I have investigated further, and what I see is that the Excelreader expects the input file to be in the encoding UTF-16LE.

      This function is in the reader.php:

      [code]
      function _encodeUTF16($string){
              $result = $string;
              if ($this->_defaultEncoding){
                  switch ($this->_encoderFunction){
                      case 'iconv' :     $result = @iconv('UTF-16LE', $this->_defaultEncoding, $string);
                                      break;
                      case 'mb_convert_encoding' :     $result = mb_convert_encoding($string, $this->_defaultEncoding, 'UTF-16LE' );
                                      break;
                  }
              }
              return $result;
          }
      [/code]

      So, what if the input file to the Excel reader is not in UTF-16LE?  In my case, I took an XML file with ISO-8859-1 encoding and opened in Excel. So it remains in an ISO-8859-1 encoding.  To be able to convert this from ISO-8859-1 to UTF-8, should I change the above function so that it converts FROM ISO-8859-1 instead of UTF-16LE, and set the OutputEncoding call to UTF-8?

      Any pointers will be helpful, as this ExcelReader seems to lack any useful documentation whatsoever in this important area.

      Thanks, Jon

       
      • siranm
        siranm
        2008-09-01

        i've used the class to read latin1, save latin1 on the db, then show latin1 on the webpage

        are you sure that on your explorer you are displaying things in the correct encoding ?

        in firefox, on the menu view->character encoding->_yourencoding_... if changing it here "solves" the problem, then you might think on checking the charset headers you are sending on your webpage...

         
    • J White
      J White
      2008-09-06

      If you change the web browser encoding in IE, by setting View -
      Encoding - Western European (Windows) - and then look at the text, it displays correctly.

      What this suggests is that the encoding format stored, is ISO-8859-1.  Since this is Western Europe (Windows).

      The problem is, that I use UTF-8 encoding for my web pages, and in MySQL and this is the encoding I WANT to use.

      The issue is that one of my clients provided an XML file with ISO-8859-1 encoding, which I opened and saved in Excel, and then this was read in by Excelreader.

      So I need to find the way to tell Excelreader to read it in ISO-8859-1 format - but so far, nothing has worked.

      Ideas?

       
      • siranm
        siranm
        2008-09-06

        Can you convert the xml file before you open it with excel ? Use the iconv tool.... Does excel always save in ISO-8859-1 ?

        You can also to open the xml with another tool, like OpenOffice's Calc, which has the option to select in which character encoding you wish to save the xls...

        Hope this helps...

        siranm

         
    • J White
      J White
      2008-09-06

      Correction: I mean I want Excelreader to read it in ISO-8859-1, but to successfully convert it to UTF-8, so the accent characters in French, Spanish etc can be displayed in UTF-8 encoding on the web pages.

       
    • I've run across an XLS file sent by a client (in Excel 97+ format), and I wished to read it with UTF-8 encoding, but it always returned CP1252-encoded data despite setOutputEncoding.

      I've looked across the reader code, and I found out that $asciiEncoding was true, despite the data not being in ASCII (ASCII is supposed to be only 0x00 thru 0x7F). BTW, $optionFlags was 0 in all cells.

      Excel-97+ files created by gnumeric don't have that problem.

      I had to convert it from CP1252 to UTF-8 afterwards, but I bet that's not the answer, since it might happen with other encodings.

       

  • Anonymous
    2011-04-11

    Hello all,
    I have the same problem, with UTF-8 output. Tried to look at the reader.php code, and found that _encodeUTF16( ) never gets in action in my XLS.
    I've solved my problem by adding this line:

    $retstr = iconv ( "ISO-8859-1", $this->_defaultEncoding, $retstr );
    

    after line 561 (

    $retstr = ($asciiEncoding) ? $retstr : $this->_encodeUTF16($retstr);
    

    )
    or by using iconv() in my code, after getting the result from the class!

    Maybe I'm missing something, or this is a bug. Any help will be appreciated!