Re: [q-lang-users] More Unicode queries.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Albert Graef scripsit:

> BTW, John, thanks for spotting this. That W3C draft just came out,
> what a lucky coincidence. ;-)

Indeed.  Someone's blog pointed me to it, I'm not sure who, and then I
incorporated it into the latest release of my TagSoup parser, a SAX
parser written in Java that processes arbitrary HTML rather than XML.
(plug: see http://tagsoup.info ).

> If you happen to keep an eye on this, it would be nice if you could
> let me know when the draft gets revised, so that the support in Q can
> be updated accordingly.

I'll let you know, as I'll be updating TagSoup as well.

> (I wrote a little Q script to generate the C code in src/w3centities.c
> automatically from the .ent file, which makes this easy. The script
> isn't included in the sources right now, but if anyone wants to have
> it, just let me know.)

Just what I did, except that being in a hurry I wrote it in Perl.

> Rob Hubbard wrote:
> I'd strip the historical duplicates.
> 
> I left them in. The full list of names is just some 15KB now, not a
> big deal even on embedded devices nowadays.
> 
> > I think its okay for an entity to have more than one character.
> 
> I only included the single-char entities for now. This simplifies the
> implementation, and is also consistent with the other escapes which
> all represent single Unicode characters. If this is a problem then
> please let me know.

I made the same decisions.

-- 
John Cowan        http://www.ccil.org/~cowan          co...@cc...
Please leave your values                Check your assumptions.  In fact,
   at the front desk.                      check your assumptions at the door.
     --sign in Paris hotel                   --Cordelia Vorkosigan