Some Non-ASCII character causing writeHTML empty output
PHP class for PDF
Brought to you by:
nicolaasuni
It can easily reproduced by adding Chinese word like "電池" (Very common word, means "battery") to example_001.php
The whole HTML block will be disappeared no matter what font you use.
Debugged by myself and found these characters could cause getHtmlDomArray() to mess up.
After Commenting these rows in tcpdf.php, it works fine.
But I am not sure it that harms other functionality (Maybe mess up with spaces?)
$html = preg_replace('/<([^\>\/]*)>[\s]/', '<\\1> ', $html); // preserve some spaces
$html = preg_replace('/[\s]<\/([^\>]*)>/', ' </\\1>', $html); // preserve some spaces
Maybe that regex shall have the u parameter added in order to handle utf-8 when it's detected ?
I tried to add the u flag to preg_match, still not working.
It seems related to this PHP bug: https://bugs.php.net/bug.php?id=53823
Indeed, the patch has been committed.
From reading the comments, it seems a temporary work-arround would be to use + instead of * in that regex (with the u modifier) : this space preservation is required before and after tags (you don't care about <> - so it won't break things)
I was wondering, performance wise, wouldn't it be better to pass an array to pereg_replace to reduce it to a single call ?
That PHP bug is fixed in 5.6.9
In the meantime, using the unicode option with the + (instead of the *) works as expected.