Joseph Reagle - 2009-05-16

I want to define a word (a bibtex key) that might include u'ć'. The following doesn't seem to work:

<code>
ident_chars = "-_'" + alphanums + alphas8bit + u'ć'
</code>

I think the corresponding hex for that char is \xc4\x87 and pyparse matches only the first byte. In any case, I'm confused, so how to refer to accented/unicode characters beyond alphas8bit? Or that, less other characters?

<code>
ident_chars = unicode_chars - '{}, '
</code>