Thread: [htmltmpl] HTML::Template support for Unicode
Brought to you by:
samtregar
From: Erik v. K. <ek...@xs...> - 2007-07-19 19:44:38
|
This is to propose for inclusion in HTML::Template a patch that adds unicode support. The 2.8 release of Html::Template opens templates as raw files. That means every byte is interpreted as an individual character. If a parameter contains wide characters (katakana, or accented latin characters for example), then the bytes from the templates are made to match the wide characters by translating the bytes to Unicode. This is done by interpreting the bytes as Latin-1 characters. If the template file happens to contain Unicode already, this breaks: the bytes making up an UTF-8 character are fed to the Latin => unicode transformation, and you end up with characters that are encoded twice. There are some ways to handle this situation: ** demand that parameters supplied to template processing don't contain wide characters. All parameters must have been processed by Encode::encode before template expansion. This is inconvenient, especially if the parameters are used in more than one place. ** Supply a filter subroutine to the template that will do UTF decoding after the template file has been read, as follows: my $tmpl = HTML::Template->new (filename => 'test.tmpl', filter => sub { my $ref = shift; ${$ref} = Encode::decode_utf8(${$ref}); }); This works, but is a bit ad-hoc: it was not immediately obvious to me that this filter is an opportunity to make Unicode work. ** Add a feature to HTML::Template to specify the encoding of template files. I cooked up a patch that adopts the latter approach by adding an optional "encoding" argument to the Template->new() function, like so: my $t = HTML::Template->new( filename => 'file.tmpl', encoding => ':encoding(UTF-8)'); The specified encoding is used not only for the template itself, but also for any templates included from within the template. Possible values for encoding are defined in perlio(3perl). This also works fine with templates encoded in character sets other than unicode or latin1. The attached patch was made against 2.8, but applies to 2.9 with a small offset. For now, a larger version is at: http://www.xs4all.nl/~ekonijn/html-template-unicode.patch (The larger version contains tests with a number of non-ascii characters, so is tricky to send reliably over a mailing list) Regards, Erik diff -urN org/libhtml-template-perl-2.8/Template.pm new/libhtml-template-perl-2.8/Template.pm --- org/libhtml-template-perl-2.8/Template.pm 2007-07-05 21:40:40.000000000 +0200 +++ new/libhtml-template-perl-2.8/Template.pm 2007-07-09 17:27:11.000000000 +0200 @@ -885,6 +885,15 @@ HTML::Template will apply the specified escaping to all variables unless they declare a different escape in the template. +=item * + +encoding - Set this to the name of a perlio layer to be used when +doing open() on the template or an included template; default is ":bytes". +As an example, to read a template containing unicode: + + my $template = HTML::Template->new(filename => 'zap.tmpl', + encoding => ':utf8'); + =back =back 4 @@ -949,6 +958,7 @@ vanguard_compatibility_mode => 0, associate => [], path => [], + encoding => ':bytes', strict => 1, loop_context_vars => 0, max_includes => 10, @@ -1635,8 +1645,9 @@ $options->{filepath} = $filepath; } + my $encoding = $options->{encoding}; confess("HTML::Template->new() : Cannot open included file $options->{filename} : $!") - unless defined(open(TEMPLATE, $filepath)); + unless defined(open(TEMPLATE, "<$encoding", $filepath)); $self->{mtime} = $self->_mtime($filepath); # read into scalar, note the mtime for the record @@ -2240,8 +2251,9 @@ } die "HTML::Template->new() : Cannot open included file $filename : file not found." unless defined($filepath); + my $encoding = $options->{encoding}; die "HTML::Template->new() : Cannot open included file $filename : $!" - unless defined(open(TEMPLATE, $filepath)); + unless defined(open(TEMPLATE, "<$encoding", $filepath)); # read into the array my $included_template = ""; |
From: Sven N. <sve...@sv...> - 2007-07-20 07:58:42
|
Erik van Konijnenburg schrieb: > This is to propose for inclusion in HTML::Template a patch that > adds unicode support. > If the template file happens to contain Unicode already, this breaks: > the bytes making up an UTF-8 character are fed to the Latin => unicode > transformation, and you end up with characters that are encoded twice. I have seen this problem "in the wild", too. > There are some ways to handle this situation: There is another way that is not quite as inconvenient: You can pass a filehandle to the constructor: open($fh, '<:utf8', 'template-file'); my $t = HTML::Template->new(filehandle => $fh); However, I agree that the option you patch adds is quite convenient. Care to add some tests for this problem, too? -Sven |
From: Erik v. K. <ek...@xs...> - 2007-07-20 21:33:14
|
On Fri, 2007-07-20 at 09:58 +0200, Sven Neuhaus wrote: > Erik van Konijnenburg schrieb: > > This is to propose for inclusion in HTML::Template a patch that > > adds unicode support. > > > If the template file happens to contain Unicode already, this breaks: > > the bytes making up an UTF-8 character are fed to the Latin => unicode > > transformation, and you end up with characters that are encoded twice. > > I have seen this problem "in the wild", too. > > > There are some ways to handle this situation: > > There is another way that is not quite as inconvenient: You can pass a > filehandle to the constructor: > > open($fh, '<:utf8', 'template-file'); > my $t = HTML::Template->new(filehandle => $fh); Yep, that would work, provided you don't need include files. > However, I agree that the option you patch adds is quite convenient. Thanks :-) > Care to add some tests for this problem, too? Sure, apply http://www.xs4all.nl/~ekonijn/html-template-unicode.patch and have a look at t/04charset.t; this contains tests of katakana, devangari and cyrillic, provided both in UTF-8 and Latin-5. Non-ascii in a patch might be tricky; if the patch won't apply cleanly let me know & I'll post a tarball. > > -Sven Regards, Erik |
From: Sven N. <sve...@sv...> - 2007-07-21 16:15:47
|
Erik van Konijnenburg schrieb: > On Fri, 2007-07-20 at 09:58 +0200, Sven Neuhaus wrote: >> Erik van Konijnenburg schrieb: >>> This is to propose for inclusion in HTML::Template a patch that >>> adds unicode support. >>> If the template file happens to contain Unicode already, this breaks: >>> the bytes making up an UTF-8 character are fed to the Latin => unicode >>> transformation, and you end up with characters that are encoded twice. >> I have seen this problem "in the wild", too. >> >>> There are some ways to handle this situation: >> There is another way that is not quite as inconvenient: You can pass a >> filehandle to the constructor: >> >> open($fh, '<:utf8', 'template-file'); >> my $t = HTML::Template->new(filehandle => $fh); > Yep, that would work, provided you don't need include files. Now that you mention it, I guess the TMPL_INCLUDE tag would need an attribute "encoding" to allow mixed setups. Or is that more of a hypothetical scenario? Thanks, -Sven |
From: Erik v. K. <ek...@xs...> - 2007-07-22 09:22:31
|
On Sat, 2007-07-21 at 18:15 +0200, Sven Neuhaus wrote: > Erik van Konijnenburg schrieb: > > On Fri, 2007-07-20 at 09:58 +0200, Sven Neuhaus wrote: > >> Erik van Konijnenburg schrieb: > >>> This is to propose for inclusion in HTML::Template a patch that > >>> adds unicode support. > >>> If the template file happens to contain Unicode already, this breaks: > >>> the bytes making up an UTF-8 character are fed to the Latin => unicode > >>> transformation, and you end up with characters that are encoded twice. > >> I have seen this problem "in the wild", too. > >> > >>> There are some ways to handle this situation: > >> There is another way that is not quite as inconvenient: You can pass a > >> filehandle to the constructor: > >> > >> open($fh, '<:utf8', 'template-file'); > >> my $t = HTML::Template->new(filehandle => $fh); > > Yep, that would work, provided you don't need include files. > > Now that you mention it, I guess the TMPL_INCLUDE tag would need an > attribute "encoding" to allow mixed setups. Or is that more of a > hypothetical scenario? Hmm, good point. The proposed patch allows nested includes in say unicode or latin-5, but does not allow you to include a latin-5 template from within a unicode template. An "encoding" attribute for TMPL_INCLUDE would make that possible, but it would be necessary to define what happens with deeply nested tmpl_includes, where only some of them have an encoding attribute. Obviously such constructs are error-prone in practice and would require quite a few different test cases in the suite. Without bug-reports from users that actually try to run such a setup, I would declare the issue hypothetical and avoid the added complexity. > Thanks, > -Sven Regards, Erik |
From: Chris H. <chr...@bb...> - 2007-08-01 15:45:46
|
Hi all, I'm looking for a nice tutorial on HTML::Template authoring, aimed at a client-side developer, without any references to perl techniques (or perl references...). All the ones I can find have explanations of the tags, and loops - but mixed up with confusing talk about arrays etc, which ideally our html developer doesn't need to know about. Can anyone point me in the right direction? many thanks in advance, Chris |
From: Karen <kar...@gm...> - 2007-08-01 16:09:31
|
On 8/1/07, Chris Henden <chr...@bb...> wrote: > I'm looking for a nice tutorial on HTML::Template authoring, aimed at > a client-side developer, without any references to perl techniques > (or perl references...). There's some documentation on that order in the python "port", though you'll have to allow for differences in the implementation. But it's a start: http://htmltmpl.sourceforge.net/lang.html |
From: Chris H. <chr...@bb...> - 2007-08-01 16:53:17
|
Thanks to all for offering suggestions, they've been very useful. Chris On 1 Aug 2007, at 17:22, Philip Tellis wrote: > On 01/08/07, Chris Henden <chr...@bb...> wrote: >> Hi all, >> I'm looking for a nice tutorial on HTML::Template authoring, aimed at >> a client-side developer, without any references to perl techniques >> (or perl references...). > > I have some docs on the java port here: > http://html-tmpl-java.sourceforge.net/howto.shtml > > If you ignore the java code, the HTML code is identical to what you'd > use in any of the other language ports. > > HTH, > > Philip |