Re: [q-lang-users] More Unicode queries.
Brought to you by:
agraef
From: John C. <co...@cc...> - 2008-01-17 09:35:55
|
Albert Graef scripsit: > BTW, John, thanks for spotting this. That W3C draft just came out, > what a lucky coincidence. ;-) Indeed. Someone's blog pointed me to it, I'm not sure who, and then I incorporated it into the latest release of my TagSoup parser, a SAX parser written in Java that processes arbitrary HTML rather than XML. (plug: see http://tagsoup.info ). > If you happen to keep an eye on this, it would be nice if you could > let me know when the draft gets revised, so that the support in Q can > be updated accordingly. I'll let you know, as I'll be updating TagSoup as well. > (I wrote a little Q script to generate the C code in src/w3centities.c > automatically from the .ent file, which makes this easy. The script > isn't included in the sources right now, but if anyone wants to have > it, just let me know.) Just what I did, except that being in a hurry I wrote it in Perl. > Rob Hubbard wrote: > I'd strip the historical duplicates. > > I left them in. The full list of names is just some 15KB now, not a > big deal even on embedded devices nowadays. > > > I think its okay for an entity to have more than one character. > > I only included the single-char entities for now. This simplifies the > implementation, and is also consistent with the other escapes which > all represent single Unicode characters. If this is a problem then > please let me know. I made the same decisions. -- John Cowan http://www.ccil.org/~cowan co...@cc... Please leave your values Check your assumptions. In fact, at the front desk. check your assumptions at the door. --sign in Paris hotel --Cordelia Vorkosigan |