Re: [q-lang-users] More Unicode queries.
Brought to you by:
agraef
From: John C. <co...@cc...> - 2008-01-18 00:52:58
|
Albert Graef scripsit: > I've attached my Q script. It expects the w3centities.ent file in the > current dir, output is written to w3centities.c. Could be interesting to > compare the two scripts, if you're willing to share your Perl solution. Sure. Note that & and < must be special-cased, because the definition of an entity may not contain an explicit & or <. #!/usr/bin/perl -w # Process W3 .ent file into tssl style # Sample input: # <!ENTITY AElig "Æ" ><!--LATIN CAPITAL LETTER AE --> # <entity name='AElig' codepoint='00C6'/> use strict; while (<>) { chomp; my ($entity, $name, $string) = split; next unless defined($entity); next unless $entity eq "<!ENTITY"; # reject cruft next if $name eq "%"; # sample declaration next unless length($string) == 11; # reject non-singletons my $codepoint = substr($string, 4, 5); $codepoint = substr($codepoint, 1, 4) if substr($codepoint, 0, 1) eq "0"; $codepoint = "0026" if $name eq "amp"; $codepoint = "003C" if $name eq "lt"; print " <entity name='$name' codepoint='$codepoint'/>\n"; } -- A mosquito cried out in his pain, John Cowan "A chemist has poisoned my brain!" http://www.ccil.org/~cowan The cause of his sorrow co...@cc... Was para-dichloro- Diphenyltrichloroethane. (aka DDT) |