From: Pascal J. B. <pj...@in...> - 2009-11-28 18:42:21
|
On Nov 28, 2009, at 3:34 PM, Fred Cohen wrote: > On Nov 28, 2009, at 4:09 AM, Pascal J. Bourguignon wrote: > >> ... >> Search about unicode combining characters for example. >> http://en.wikipedia.org/wiki/Combining_character >> >> A single character may need several unicode code points to be >> represented. So already, you cannot use a scalar to represent >> unicode >> characters in general. Then if you want to store the code points >> in a >> vector, that means that the index of the vector doesn't necessarily >> match the index in the string, so string indexing may not be O(1). >> >> Of course, the problem is even more complex when you try to fit the >> unicode model into the lisp model of characters and strings. > > Right - but this has nothing to do with the complexity of characters > in lisp. There are a virtually unlimited number of different ways to > encode information. > >>> [...] I just want the things contained within the strings to be >>> bytes (values 0-255) and not altered by any processing except the >>> specific calls to alter them (no hidden automatic changes please). >> >> You need to implement your own type to do that, don't use lisp >> character and string, they're more sophisticated. As mentionned, you >> can still have "strings" in source files compiled to your own type of >> strings, with reader macros. > > That's what I was asking about - where can I read about the > sophistication of strings in lisp? > > ... In the source code of the implementations! Or if you don't wnat to dig there, just compare CHAR_MAX (in C) vs. CHAR-CODE-LIMIT (in CL) and infer the consequences. Compare: int i=42; char c=i; vs. (let ((i 42) (c (make-array 1 :element-type 'character :initial-element #\a))) (setf (aref c 0) i)) and think about what it involves. Hint: to begin with, character in CL will take 32 bits while char in C takes 8 bits, therefore any string manipulation will already four time slower than in C. Of course when doing I/O it also means converting the internal unicode codepoints to the external encoding. In C, if you input utf-8 bytes, you keep them like this (and of course you don't try to take the nth character of a string, since that'd be O(n)). In Lisp, there will be decoding and encoding on each I/O. > All the RAM will not be enough for many of the things I ultimately > need to do. When you work with multi-terabyte disks, and you are > trying to do analysis linking things across the entire disk to other > such things, you need multiple terabytes of memory - which I > obviously need to do with file systems. I just want a way to not > have to write everything for file systems, RAM, databases, etc. just > to get better performance at smaller sizes. Since everybody > presumably will run into these limits on their programs at some > point, I was thinking that a virtual RAM that extends to disk > storage (and ultimately multiple disks across infrastructure) built > into lisp would be easier (for me) that doing it myself... and for > everyone who ultimately will have to do it themselves as their > problems grow in size. Yes, you will probably be disk bound anyway, so clisp will be as good as another implementation. -- __Pascal Bourguignon__ http://www.informatimago.com |