TCPDF - PHP class for PDF / Bugs / #1065 Some Non-ASCII character causing writeHTML empty output

#1065 Some Non-ASCII character causing writeHTML empty output

Milestone: v1.0_(example)

Status: open

Owner: nobody

Labels: None

Priority: 1

Updated: 2015-10-09

Created: 2015-07-13

Creator: imTigger

Private: No

It can easily reproduced by adding Chinese word like "電池" (Very common word, means "battery") to example_001.php
The whole HTML block will be disappeared no matter what font you use.

Debugged by myself and found these characters could cause getHtmlDomArray() to mess up.

After Commenting these rows in tcpdf.php, it works fine.
But I am not sure it that harms other functionality (Maybe mess up with spaces?)

$html = preg_replace('/<([^\>\/]*)>[\s]/', '<\\1>&nbsp;', $html); // preserve some spaces
$html = preg_replace('/[\s]<\/([^\>]*)>/', '&nbsp;</\\1>', $html); // preserve some spaces

Discussion

Mathieu Masseboeuf - 2015-07-30

Maybe that regex shall have the u parameter added in order to handle utf-8 when it's detected ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

imTigger - 2015-07-31

I tried to add the u flag to preg_match, still not working.

It seems related to this PHP bug: https://bugs.php.net/bug.php?id=53823

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mathieu Masseboeuf - 2015-07-31

Indeed, the patch has been committed.
From reading the comments, it seems a temporary work-arround would be to use + instead of * in that regex (with the u modifier) : this space preservation is required before and after tags (you don't care about <> - so it won't break things)

I was wondering, performance wise, wouldn't it be better to pass an array to pereg_replace to reduce it to a single call ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mathieu Masseboeuf - 2015-10-09

That PHP bug is fixed in 5.6.9
In the meantime, using the unicode option with the + (instead of the *) works as expected.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Some Non-ASCII character causing writeHTML empty output

PHP class for PDF

Group

Searches

Help

#1065 Some Non-ASCII character causing writeHTML empty output

Discussion