Menu

#1065 Some Non-ASCII character causing writeHTML empty output

v1.0_(example)
open
nobody
None
1
2015-10-09
2015-07-13
imTigger
No

It can easily reproduced by adding Chinese word like "電池" (Very common word, means "battery") to example_001.php
The whole HTML block will be disappeared no matter what font you use.

Debugged by myself and found these characters could cause getHtmlDomArray() to mess up.

After Commenting these rows in tcpdf.php, it works fine.
But I am not sure it that harms other functionality (Maybe mess up with spaces?)

$html = preg_replace('/<([^\>\/]*)>[\s]/', '<\\1>&nbsp;', $html); // preserve some spaces
$html = preg_replace('/[\s]<\/([^\>]*)>/', '&nbsp;</\\1>', $html); // preserve some spaces

Discussion

  • Mathieu Masseboeuf

    Maybe that regex shall have the u parameter added in order to handle utf-8 when it's detected ?

     
  • imTigger

    imTigger - 2015-07-31

    I tried to add the u flag to preg_match, still not working.

    It seems related to this PHP bug: https://bugs.php.net/bug.php?id=53823

     
  • Mathieu Masseboeuf

    Indeed, the patch has been committed.
    From reading the comments, it seems a temporary work-arround would be to use + instead of * in that regex (with the u modifier) : this space preservation is required before and after tags (you don't care about <> - so it won't break things)

    I was wondering, performance wise, wouldn't it be better to pass an array to pereg_replace to reduce it to a single call ?

     
  • Mathieu Masseboeuf

    That PHP bug is fixed in 5.6.9
    In the meantime, using the unicode option with the + (instead of the *) works as expected.

     

Log in to post a comment.

MongoDB Logo MongoDB