From: Donal K. F. <don...@ma...> - 2008-04-28 12:03:13
|
Lars Hellström wrote: > Just verifying that I understood this correctly, because your > objections sounded really strange on a first read-through. Is it true that: > 1. It's not possible to iterate over the String intRep like this, > unless one reimplements it to have a separate refcount, because > shimmering could cause it to go away. > 2. The same problem exists for bytearray intReps. Correct on both counts. > 3. It would be possible to iterate over the string representation > *bytes instead, because that cannot go away unless the value is > modified. (An earlier remark "index into buf? Module unicode char > widths, etc." by Larry gave me the impression he intended to iterate > over UTF-8 encoded data.) No. The problem is actually that the current StringRep code is written to assume that there is a single bytes member that it refers to; it contains fields that contain additional information about the bytes (e.g. whether there are any multibyte characters in it). The bytearray doesn't have this problem, true, but I'm not convinced that just adjusting that would be a good value investment (it might be possible to justify it with additional functionality, such as supporting mapped files, but that's a lot more work). I still don't see why everyone's so set on tackling this. It feels to me like optimizing too early. Memory allocation isn't *that* expensive! Donal. |