Re: [Sbcl-devel] GSoC 2013 Mentoring - Unicode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Tom Emerson <tre...@gm...> writes:

> On Tue, Apr 16, 2013 at 3:13 PM, Christophe Rhodes <cs...@ca...> wrote:
>
>> Great!  Welcome.  I have a work-in-progress local branch which
>> storesmore of the necessary information about code points from
>> UnicodeData.txt and implements normalization, and might hand that branch
>> over to a student (or use it as introductory reading material) for the
>> SoC, if a sufficiently interested one shows up.  (If not, well, I'll
>> keep on working on it in my own slow way -- but would also be open to
>> direct collaboration)
>
> I'd certainly be interested in working on it with you: having all the
> normalization forms supported efficiently has been on my personal task list
> for a while.

OK; I've got round to pushing a branch (volatile, could be rebased at
any instant, at least if I can work out how) to github.  It supports
NFD and NFKD semi-efficiently; there are some low-hanging fruit to
improve them (by precomputing the recursive decomposition at build-time
rather than decomposing recursively at run-time, for example; also by
doing a first pass just checking codepoints).

It does not yet support NFC or NFKC; I'm still contemplating coming up
with a viciously clever indexing scheme for the primary composition
lookup (hashing pairs of build-time allocated integers somehow to lookup
compositions in a table).  However, the NFD/NFKD support is tested using
the Unicode normalization test vectors, so improvements to it can be
made and tested with a reasonable amount of confidence.

The tree's at
<https://github.com/csrhodes/sbcl/tree/unicode-improvements>.  Let me
know how/whether it works for you.

Best,

Christophe

Re: [Sbcl-devel] GSoC 2013 Mentoring - Unicode

Common Lisp compiler and runtime

Re: [Sbcl-devel] GSoC 2013 Mentoring - Unicode