From: Vebjorn L. <ljo...@lj...> - 2008-09-25 15:10:18
|
Hi, I would like to make an mmapped file appear as a simple-array of (unsigned-byte 8). In SBCL, I can make this happen by putting an appropriate header before the mmaped file in memory and then calling sb-kernel:%make-lisp-obj on the address where the header starts. Is there something similar in CLISP? Thanks, Vebjorn |
From: Sam S. <sd...@gn...> - 2008-09-25 15:30:00
|
Vebjorn Ljosa wrote: > I would like to make an mmapped file appear as a simple-array of > (unsigned-byte 8). In SBCL, I can make this happen by putting an > appropriate header before the mmaped file in memory and then calling > sb-kernel:%make-lisp-obj on the address where the header starts. Is > there something similar in CLISP? no, and not likely to be implementable because CLISP has a copying GC. (actually Vladimir has introduced "pinned" objects not movable by GC for MT) why do you want to do this? |
From: Vebjorn L. <ljo...@lj...> - 2008-09-25 16:35:34
|
Sam Steingold <sd...@gn...> writes: > no, and not likely to be implementable because CLISP has a copying GC. > (actually Vladimir has introduced "pinned" objects not movable by GC > for MT) I was hoping it would be possible to have Lisp objects outside Lisp's heap (do you call this "foreign space" in CLISP?) and therefore out of the reach of the GC. > why do you want to do this? I need random access to a large number of cytometric measurements stored in a file. I would like to avoid reading the entire file into memory. It would be convenient to simply index a Lisp array and leave it to the memory manager to decide which chunks of the file to read and when, and mmapping would accomplish that. I noticed just now that CLISP's array dimension limit is only 2^24 - 1, so the question of mmapping is moot, as I'll have to read chunks into smaller arrays anyway. Thanks for responding. Vebjorn |
From: Pascal J. B. <pj...@in...> - 2008-09-25 19:36:27
|
Vebjorn Ljosa writes: > Sam Steingold <sd...@gn...> writes: > > > no, and not likely to be implementable because CLISP has a copying GC. > > (actually Vladimir has introduced "pinned" objects not movable by GC > > for MT) > > I was hoping it would be possible to have Lisp objects outside Lisp's > heap (do you call this "foreign space" in CLISP?) and therefore out of > the reach of the GC. > > > why do you want to do this? > > I need random access to a large number of cytometric measurements stored > in a file. I would like to avoid reading the entire file into memory. > It would be convenient to simply index a Lisp array and leave it to the > memory manager to decide which chunks of the file to read and when, and > mmapping would accomplish that. > > I noticed just now that CLISP's array dimension limit is only 2^24 - 1, > so the question of mmapping is moot, as I'll have to read chunks into > smaller arrays anyway. So you don't need to mmap any lisp object. I assume your cytometric measurements are no lisp object. Just mmap them and access the data structure thru CFFI (or FFI). Alternatively, you would need just a PEEK and a POKE (well, I guess only a PEEK in this application). http://darcs.informatimago.com/public/lisp/clisp/raw-memory.lisp http://darcs.informatimago.com/public/lisp/clisp/raw-memory.c http://darcs.informatimago.com/public/lisp/clisp/Makefile There is little difference between (aref *mem* i) and (peek-uint8 (+ *mem* i)). (And you can always wrap peek-uint8 in some mref to add bounds checking). -- __Pascal Bourguignon__ http://www.informatimago.com/ HEALTH WARNING: Care should be taken when lifting this product, since its mass, and thus its weight, is dependent on its velocity relative to the user. |
From: Sam S. <sd...@gn...> - 2008-09-26 13:58:28
|
Pascal J. Bourguignon wrote: > > Alternatively, you would need just a PEEK and a POKE (well, I guess > only a PEEK in this application). > > http://darcs.informatimago.com/public/lisp/clisp/raw-memory.lisp > http://darcs.informatimago.com/public/lisp/clisp/raw-memory.c actually, http://darcs.informatimago.com/public/lisp/clisp/raw-memory-lib.c because of ffi forms in raw-memory.lisp. actually, raw-memory-lib.c is so small and simple that it could be folded into raw-memory.lisp using c-lines. (it is not because you support other lisp implementations too) > http://darcs.informatimago.com/public/lisp/clisp/Makefile > > There is little difference between (aref *mem* i) and (peek-uint8 (+ *mem* i)). > (And you can always wrap peek-uint8 in some mref to add bounds checking). |
From: Hoehle, Joerg-C. <Joe...@t-...> - 2008-09-26 16:14:11
|
Hi, >> Alternatively, you would need just a PEEK and a POKE (well, I guess >> only a PEEK in this application). >> >> http://darcs.informatimago.com/public/lisp/clisp/raw-memory.lisp >> http://darcs.informatimago.com/public/lisp/clisp/raw-memory.c > >actually, >http://darcs.informatimago.com/public/lisp/clisp/raw-memory-lib.c >because of ffi forms in raw-memory.lisp. >actually, raw-memory-lib.c is so small and simple that it >could be folded into raw-memory.lisp using c-lines. Actually, you don't need any C functions for something as trivial. 1. You could play C-like tricks using WITH-C-VAR to pointer and CAST. 2. You could use MEMORY-AS. That said, using a foreign function is fastest because you typically avoid PARSE-C-TYPE at run-time (and you avoid producing a garbage #<FOREIGN-ADDRESS> obejcts too). Still, you can manage to write peek and poke using the "remember parse-c-type values at compile-time"-trick I mentioned in this list moons ago, which would then be as fast as a function call. MEMORY-AS is just another word for PEEK, and (SETF MEMORY-AS) is POKE - except of course, they take a foreign-pointers, not an integer as address. Using compiler macros, you could write PEEK&POKE based on MEMORY-AS which would not even perform a CASE or table-based dispatch. (define-compiler-macro peek-uint32 (integer-address) `(memory-as (unsigned-foreign-address ,integer-address)) (parse-c-type 'uint32)) The compiler-macro existing for PARSE-C-TYPE should eliminate its invocation. GC-cost: one #<FOREIGN-ADDRESS> per run-time invocation. Note that using MEMORY-AS, you get vector conversion for free: Much faster than the DO loop Pascal implements in his PEEK and POKE functions. I believe Pascal's code predates the introduction of MEMORY-AS to CLISP's FFI. Use a type of `(array ,c-type ,(length obj)) as in (memory-as <address> (parse-c-type `(array uint32 ,(* 17 42)))) (setf (memory-as ... (parse-c-type '(c-array uint8 4))) #(1 2 3 4)) That said, PEEK & POKE in CLISP is still an order of a magnitude slower in CLISP than in anything that compiles to native code -- obviously. Regards, Jorg Hohle. |
From: Sam S. <sd...@gn...> - 2008-09-26 16:30:03
|
Hoehle, Joerg-Cyril wrote: > Actually, you don't need any C functions for something as trivial. > 1. You could play C-like tricks using WITH-C-VAR to pointer and CAST. > 2. You could use MEMORY-AS. can you optimize this code in clisp/modules/matlab/matlab.lisp? ;; get/set an individial array element ;; real (c-lines "double mx_aref_r (const mxArray *array_ptr, int i, int j, int n) { return mxGetPr(array_ptr)[i+n*j]; }~%") (def-call-out mx-aref-r (:return-type double-float) (:name "mx_aref_r") (:arguments (array_ptr (c-pointer mxArray)) (i int) (j int) (n int))) (c-lines "void set_mx_aref_r (const mxArray *array_ptr, int i, int j, int n, double val) { mxGetPr(array_ptr)[i+n*j] = val; }~%") (ffi:def-call-out set_mx_aref_r (:return-type nil) (:arguments (array_ptr (c-pointer mxArray)) (i int) (j int) (n int) (val double-float))) (defsetf mx-aref-r set_mx_aref_r) ;; imaginary (c-lines "double mx_aref_i (const mxArray *array_ptr, int i, int j, int n) { return mxGetPr(array_ptr)[i+n*j]; }~%") (def-call-out mx-aref-i (:return-type double-float) (:name "mx_aref_i") (:arguments (array_ptr (c-pointer mxArray)) (i int) (j int) (n int))) (c-lines "void set_mx_aref_i (const mxArray *array_ptr, int i, int j, int n, double val) { mxGetPr(array_ptr)[i+n*j] = val; }~%") (ffi:def-call-out set_mx_aref_i (:return-type nil) (:arguments (array_ptr (c-pointer mxArray)) (i int) (j int) (n int) (val double-float))) (defsetf mx-aref-i set_mx_aref_i) |
From: Hoehle, Joerg-C. <Joe...@t-...> - 2008-09-26 17:03:16
|
Sam Steingold asked: >can you optimize this code in clisp/modules/matlab/matlab.lisp? >;; get/set an individial array element As I said, I don't think there's a lot of optimization opportunities when accessing a single element. Invoking a #<foreign-function> from DEF-CALL-OUT is quite fast (except for all this foreign-call and callback setup, compared to MEMORY-AS) and unlike Pascal's peek&poke, you don't have run-time overhead from CASE of other table-based dispatch in the code snipped you posted. >(c-lines "double mx_aref_r (const mxArray *array_ptr, int i, int j, int n) { > return mxGetPr(array_ptr)[i+n*j]; }~%") >(def-call-out mx-aref-r (:return-type double-float) (:name "mx_aref_r") > (:arguments (array_ptr (c-pointer mxArray)) > (i int) (j int) (n int))) You can only win large when you design an API which can operate on blocks or arrays. E.g. IF Matlab's matrices or arrays were laid out exactly like C's, then you could use (MEMORY-AS pointer `(c-array double-float ,d1 ,d2)) to convert efficiently between both worlds. However the "mxGetPtr" abstraction hides the true format of the layout. >[i+n*j] Looks more like Fortran's format, so no 1:1 conversion of the whole matrix in one go. CL is row-major(-aref), like C. Regards, Jörg. |
From: Hoehle, Joerg-C. <Joe...@t-...> - 2008-09-29 15:44:58
|
Sam Steingold asked: >can you optimize this code in clisp/modules/matlab/matlab.lisp? There's one thing that I noticed there: low-level transformation: use MEMORY-AS (defun matfile-content (mf) "return the vector of strings naming variables in MATfile" (multiple-value-bind (buf num) (matGetDir mf) (unwind-protect (memory-as buf (parse-c-type `(c-array c-string ,num))) (mxFree buf)))) instead of (with-c-var (names `(c-ptr (c-array c-string ,num))) (setf (cast names 'c-pointer) buf) names) Benefit: with-c-var has overhead similar to foreign-funcall + callback, memory-as is more lightweight. Before the addition of MEMORY-AS, the WITH-C-VAR+CAST trick was indeed one I recommended. Beside that, since matlab seems to use Fortran-style ordering of arrays, and CL follows C usage, I see little opportunity for block operations. Well, maybe there are fast transpositions tricks, so that (memory-as `(c-array double-flaot ,x ,y)) can still be used? That might be faster than a nested loop, depending on matrix size and speed of the 2 transpositions. Regards, Jörg Höhle. |
From: Hoehle, Joerg-C. <Joe...@t-...> - 2008-09-26 16:28:55
|
Vebjorn Ljosa wrote: >I noticed just now that CLISP's array dimension limit is only 2^24 - 1, >so the question of mmapping is moot, as I'll have to read chunks into >smaller arrays anyway. That's no reason. You can still use mmap and do something similar to windowing. To make a decision as to your implementation path, you'll have to think about what's important to you: o raw speed? o portabity? o ease of use? o understandability of code? Then you can go for mixtures of what's been proposed in this and the other thread, o Use your own primitive access functions; o Use block-based accessors, instead of element by elt.; o Use CLtL sequence functions using CLISP's DEFSEQ extension, delegating to MEM-READ (note that CLISP's sequence functions are overly "generic" and hence not quite fast). I once wrote the ~50-100 lines required for providing an FFI-based-array <-> Lisp sequence integration via DEFSEQ. The one open issue I remember was the unclear semantics of copying such an array: What is SUBSEQ on a mmap'ed array (or any FFI array, for that matter)? Regards, Jorg. |
From: Sam S. <sd...@gn...> - 2008-09-26 16:46:34
|
Hoehle, Joerg-Cyril wrote: > > I once wrote the ~50-100 lines required for providing an FFI-based-array > <-> Lisp sequence integration via DEFSEQ. sweet! why is it not in the CVS? > The one open issue I remember was the unclear semantics of copying such > an array: What is SUBSEQ on a mmap'ed array (or any FFI array, for that > matter)? a displaced array? |
From: Hoehle, Joerg-C. <Joe...@t-...> - 2008-09-26 17:48:03
|
Hi, >> I once wrote the ~50-100 lines required for providing an >FFI-based-array >> <-> Lisp sequence integration via DEFSEQ. >sweet! >why is it not in the CVS? Precisely because of the one open issue below. Didn't I post it to the list, moons ago? Somehow I hate releasing half-baked stuff, so it tends to pile up on my HD. >> The one open issue I remember was the unclear semantics of >> copying such >> an array: What is SUBSEQ on a mmap'ed array (or any FFI >> array, for that matter)? >a displaced array? That would be legitimate with the foreign array, but it's not compliant with CLHS. "subseq always allocates a new sequence for a result; it never shares storage with an old sequence." Alas, a shared=displaced array is the only sensible thing to do in the presence of the FFI, so it seems to me. That's why I considered the issue open, and probably never published the code. The thing I seem to remember is that CLISP's (via DEFSEQ) expects the array-copying to work for some of the sequence functions, so the FFI<->DEFSEQ was incomplete. I'll dig into my archives over the week-end. Regards, Jörg. |
From: Sam S. <sd...@gn...> - 2008-09-26 18:02:01
|
Hoehle, Joerg-Cyril wrote: > Hi, > >>> I once wrote the ~50-100 lines required for providing an >> FFI-based-array >>> <-> Lisp sequence integration via DEFSEQ. >> sweet! >> why is it not in the CVS? > Precisely because of the one open issue below. > Didn't I post it to the list, moons ago? "I have no recollections of that." > Somehow I hate releasing half-baked stuff, so it tends to pile up on my HD. "release early, release often". >>> The one open issue I remember was the unclear semantics of >>> copying such >>> an array: What is SUBSEQ on a mmap'ed array (or any FFI >>> array, for that matter)? >> a displaced array? > That would be legitimate with the foreign array, but it's not compliant with CLHS. oh yeah... an alternative would be malloc+finialize/free. I think this is actually preferable because it would be compatible with the invariant subseq = make-seq + copy-seq > Alas, a shared=displaced array is the only sensible > thing to do in the presence of the FFI, so it seems to me. > That's why I considered the issue open, and probably never published the code. you can make the alternatives controllable by a global switch. > The thing I seem to remember is that CLISP's (via DEFSEQ) > expects the array-copying to work for some of the sequence > functions, so the FFI<->DEFSEQ was incomplete. |
From: Hoehle, Joerg-C. <Joe...@t-...> - 2008-09-29 14:53:38
|
Hi, >> I once wrote the ~50-100 lines required for providing an >FFI-based-array >> <-> Lisp sequence integration via DEFSEQ. >sweet! Well, FWIW, here's my 4 year old unfinished code. My summary of the situation back then was that sequences types are not the best match for foreign arrays. Alas, CLISP does not offer an extension to array types (nor any other CL so it seems). An important implementation issue is what data structure to choose to hold the state needed by the defseq iterator. I choose a foreign-variable of type (c-array x l) because that embeds both the needed pointer and limits; ffi:element performs boundary checking and I don't need to cope with element size. Other choices may be more performant. Only benchmarking would tell. (defstruct (foreign-array (:copier nil)) (pointer nil;; :type ffi:foreign-variable )) (setq bar (ffi:allocate-shallow '(ffi:c-array ffi:uint16 5))) (setq foo (make-foreign-array :pointer bar)) (defun unimplemented (&rest args) (cerror "Retry" "Unimplemented -- args: ~S" args)) (defconstant +ffi-array-seq+ (vector 'foreign-array #'unimplemented ;init #'unimplemented ;upd #'unimplemented;sys::vector-endtest #'unimplemented;sys::vector-fe-init #'unimplemented;sys::vector-fe-upd #'unimplemented;sys::vector-fe-endtest #'unimplemented;char #'unimplemented;sys::store #'identity;copy pointer #'(lambda(seq)(ffi::%sizeof(ffi::foreign-type seq)));sys::vector-length #'unimplemented;make-string #'unimplemented;char #'unimplemented;sys::store #'unimplemented;sys::vector-init-start #'unimplemented;sys::vector-fe-init-end )) (assert (= (length +ffi-array-seq+) 16)) (sys::%defseq +ffi-array-seq+) Back in 2004, there was a bug in CLISP with using own structures and the %DEFSEQ functions. IIRC, I reported it, and Sam Steingold fixed it. (sys::sequencep foo) (setf (svref +ffi-array-seq+ 10) (lambda(seq)(ffi::%sizeof(ffi::foreign-type (foreign-array-pointer seq))))) (setf (svref +ffi-array-seq+ 10) ;length (lambda(seq) (svref (ffi::foreign-type (foreign-array-pointer seq)) 2))) ;(length foo) (setf (svref +ffi-array-seq+ 12) ;seq-elt (lambda (seq index) (ffi:foreign-value (ffi::%element (foreign-array-pointer seq) index)))) ;(elt foo 5) (setf (svref +ffi-array-seq+ 13) ;seq-set-elt (lambda (seq index value) (setf (ffi:foreign-value (ffi::%element (foreign-array-pointer seq) index)) value))) ;(setf (elt foo 0) 3) ;(elt foo 0) ;(elt foo 5) ;; seq-pointer: just use Index ;; TODO better: increment offset of #<foreign-address> ;; but a) foreign-offset is not accessible from Lisp ;; and b) end-test would then need more context (setf (svref +ffi-array-seq+ 14) ;seq-init-start (lambda (seq index) index)) (setf (svref +ffi-array-seq+ 1) ;seq-init (lambda (seq) 0)) (setf (svref +ffi-array-seq+ 2) ;seq-upd (lambda (seq pointer) (1+ pointer))) (setf (svref +ffi-array-seq+ 3) ;seq-endtest (lambda (seq pointer) (= pointer (length seq)))) (setf (svref +ffi-array-seq+ 7) ;seq-access (lambda (seq pointer) (ffi:foreign-value (ffi::%element (foreign-array-pointer seq) pointer)))) ;(position 0 foo :start 2) (setf (svref +ffi-array-seq+ 8) ;seq-access-set (lambda (seq pointer value) (setf (ffi:foreign-value (ffi::%element (foreign-array-pointer seq) pointer)) value))) ;(fill foo 4 :start 2 :end 4) ;(ffi::foreign-value bar) Well, even if this code is of no general use, it can serve as regression test for the DEFSEQ API. Regards, Jörg Höhle |
From: Vladimir T. <vtz...@gm...> - 2008-09-25 17:11:09
|
Hi, On Sep 25, 2008, at 6:29 PM, Sam Steingold wrote: > Vebjorn Ljosa wrote: >> I would like to make an mmapped file appear as a simple-array of >> (unsigned-byte 8). In SBCL, I can make this happen by putting an >> appropriate header before the mmaped file in memory and then calling >> sb-kernel:%make-lisp-obj on the address where the header starts. Is >> there something similar in CLISP? > > no, and not likely to be implementable because CLISP has a copying GC. > (actually Vladimir has introduced "pinned" objects not movable by > GC for MT) > why do you want to do this? > Pinned object will not help since there is no way to mmap file directly into lisp heap. Since the lisp heap itself is mmap-ed (except in SPVW_PAGES) - the only way to implement memory mapped files is to have the mapping outside the heap. However the memory range at which the files can be mmap-ed should not interfere with lisp heap (that may grow and potentially overlap other mmap-ed regions). Probably the best way to go is with FFI. You may look at Pascal Bourguignon's code: http://informatimago.com/develop/lisp/index.html (COM.INFORMATIMAGO.CLISP.RAW-MEMORY, COM.INFORMATIMAGO.COMMON- LISP.MEMORY). Vladimir |
From: <don...@is...> - 2008-09-25 16:49:47
|
Vebjorn Ljosa writes: > I noticed just now that CLISP's array dimension limit is only 2^24 - 1, > so the question of mmapping is moot, as I'll have to read chunks into > smaller arrays anyway. This limit is larger on 64 bit machines, so maybe not as moot as you thought. I've recently asked how hard it would be to increase the limit but I didn't see any replies. I'd very much like to see it increased. My view is that an array should be allowed to be close to the size of VM. I still think you can do what you want via FFI. You probably can't treat the file as a lisp array, but I expect you'll be able to write a similar accessing function taking a byte index and returning a byte. |
From: Sam S. <sd...@gn...> - 2008-09-25 17:03:38
|
Don Cohen wrote: > Vebjorn Ljosa writes: > > > I noticed just now that CLISP's array dimension limit is only 2^24 - 1, > > so the question of mmapping is moot, as I'll have to read chunks into > > smaller arrays anyway. > This limit is larger on 64 bit machines, so maybe not as moot as you > thought. I've recently asked how hard it would be to increase the > limit but I didn't see any replies. I'd very much like to see it Bruno should be able to elucidate this. (as well an a sudden decrease in string size a few years ago). > increased. My view is that an array should be allowed to be close to > the size of VM. this is non-trivial since array limit constants are constrained to be fixnums and clisp fixnums are 24 bits on 32-bit platforms and 48 bits on 64-bit ones. (I don't see any reason for them to be less than 56 bits there though). > I still think you can do what you want via FFI. > You probably can't treat the file as a lisp array, but I expect you'll > be able to write a similar accessing function taking a byte index and > returning a byte. you might want to take a look at the matlab interface where I have to access C arrays from lisp. |
From: Gabriel D. R. <gd...@in...> - 2008-09-27 17:49:40
|
On Thu, Sep 25, 2008 at 12:03 PM, Sam Steingold <sd...@gn...> wrote: >> I still think you can do what you want via FFI. >> You probably can't treat the file as a lisp array, but I expect you'll >> be able to write a similar accessing function taking a byte index and >> returning a byte. > > you might want to take a look at the matlab interface where I have to access C > arrays from lisp. I have similar needs for OpenAxiom, where I need to have essentially a pointer to a large array of numbers (byte, float, double) passed to C functions and a guarantee that the object does not move while control flow is in C. Out of the four Lisps I use to build OpenAxiom (GCL, SBCL, ECL, CLisp), only CLisp does not seem to allow that with its FFI interface. That tends to make CLisp a suboptimal choice for OpenAxiom. It would be good if there were a way to do this with only the FFI interface. -- Gaby |
From: Bruno H. <br...@cl...> - 2008-09-25 20:15:28
|
> > I've recently asked how hard it would be to increase the > > limit but I didn't see any replies. I'd very much like to see it > > (as well an a sudden decrease in string size a few years ago). The maximum size for arrays and strings is driven by the desire to have only two header words per instance. It is a compromise. One could also choose to have all arrays and all strings with three header words per instance. This would increase the memory size by a certain percentage and the run time of all programs by twice this percentage (according to Michael Monagan's rule that when the memory use of a program increases by a factor X, its use of time increases by a factor X^2). So, assuming that not penalizing the _average_ program is important, we have two header words. One is needed by the GC. The other one has to combine the length and the following bits (see lispbibl.d for reference): Arrays: - 8 bits are used for the type of array. Probably less bits would be sufficient, but as Sam mentioned, ARRAY-DIMENSION-LIMIT must be a fixnum. and MOST-POSITIVE-FIXNUM is 2^24-1. 24 = oint_data_len. Strings: - 8 bits are used for the type of array, as above. Additionally, 2 bits are used to implement the automatic extension of width of a string: Strings exist in 8-bits-per-element, 16-bits-per-element, 32-bits-per-element flavours, and when a character is stored in a string of narrower width, the width is extended automatically. This process consumes 2 bits. > > increased. My view is that an array should be allowed to be close to > > the size of VM. You can write an abstraction of very long arrays, say in terms of an array of arrays, in about 50 lines of code. With clisp's extensible sequences, you can can even use the normal sequence functions on it. Refer to SYS::%DEFSEQ. Bruno |
From: Bruno H. <br...@cl...> - 2008-09-26 14:24:20
|
Sam wrote: > but this begs a suggestion: just like there are 3 kinds of strings by element > type, there should be 2 kinds of arrays (and strings) by storage size. > i.e., 2-word-header small arrays and strings and 3-word-header big arrays In my opinion, the 3 kinds of strings by element type were a strong requirement (combining Unicode support with low memory use), whereas multi-megabyte arrays and strings were not a strong requirement. If your opinion varies, feel free to add strings and vectors that have 3 header words. Bruno |
From: Sam S. <sd...@gn...> - 2008-09-26 14:46:06
|
Bruno Haible wrote: > Sam wrote: >> but this begs a suggestion: just like there are 3 kinds of strings by element >> type, there should be 2 kinds of arrays (and strings) by storage size. >> i.e., 2-word-header small arrays and strings and 3-word-header big arrays > > In my opinion, the 3 kinds of strings by element type were a strong requirement > (combining Unicode support with low memory use), whereas multi-megabyte arrays > and strings were not a strong requirement. If your opinion varies, feel free > to add strings and vectors that have 3 header words. do we have a spare flag bit for strings and arrays? |
From: <don...@is...> - 2008-09-25 23:02:06
|
Bruno Haible writes: > The maximum size for arrays and strings is driven by the desire to have only > two header words per instance. It is a compromise. One could also choose to > have all arrays and all strings with three header words per instance. This > would increase the memory size by a certain percentage and the run time of > all programs by twice this percentage (according to Michael Monagan's > rule that when the memory use of a program increases by a factor X, its > use of time increases by a factor X^2). I'm not familar with that rule. What ever became of time/space tradeoffs? Certainly there are lots of small arrays (especially strings), and it would be nice to be able to represent them with low overhead. In fact, I suggest that for strings that are short enough, it would make sense to have a representation analogous to fixnum, i.e., one other 8 bit type code for short string. By the time you get to large arrays, the difference between 2 and 3 header words doesn't matter. So it would make sense to have another type for large arrays. > but as Sam mentioned, ARRAY-DIMENSION-LIMIT must be a fixnum. and > MOST-POSITIVE-FIXNUM is 2^24-1. 24 = oint_data_len. Even worse, array total size must be a fixnum. I wonder why the writers of the spec made that decision. (Any ideas?) It seems absurd to me right now. Perhaps there could be yet another type for BIG Arrays. I know, I should implement that type in CLOS. I'm tempted to do so. If someone out there beats me to it I won't object. > You can write an abstraction of very long arrays, say in terms of > an array of arrays, in about 50 lines of code. With clisp's > extensible sequences, you can can even use the normal sequence > functions on it. Refer to SYS::%DEFSEQ. I see, it might have been worth reading the rest of the message before typing the last few sentences. I guess I can't change functions like aref without violating the spec. So I'd have to change my code to use the new functions. Does anyone out there already have a very large array package? |
From: Sam S. <sd...@gn...> - 2008-09-26 14:00:33
|
Don Cohen wrote: > Bruno Haible writes: > > > but as Sam mentioned, ARRAY-DIMENSION-LIMIT must be a fixnum. and > > MOST-POSITIVE-FIXNUM is 2^24-1. 24 = oint_data_len. > Even worse, array total size must be a fixnum. > I wonder why the writers of the spec made that decision. (Any ideas?) > It seems absurd to me right now. the rationale is pretty clear: iteration over arrays should not cons by itself, so all array index computations must be done in the fixnum realm. I think this is even spelled out in an issue write-up. |
From: Bruno H. <br...@cl...> - 2008-09-26 00:56:03
|
Don Cohen wrote: > Even worse, array total size must be a fixnum. > I wonder why the writers of the spec made that decision. (Any ideas?) A loop like (dotimes (i (length array)) ...) should not cons a bignum at every iteration. That would be a performance killer in all implementations. That's the primary reason why fixnums exist in the first place. > What ever became of time/space tradeoffs? You have a space/time tradeoff when you have two different algorithms, like fuzzy search of a short pattern in a long text via linear search or via an index. Monagan's rule is valid when you don't change your algorithm, when you just let it consume more memory. > By the time you get to large arrays, the difference between 2 and 3 > header words doesn't matter. So it would make sense to have another > type for large arrays. This would still have a performance overhead on every access of the small arrays. Namely, at every AREF the code would have to distinuish the two cases. It's not impossible to do, it's just a bit hard to implement it in such a way that the cost for the many small arrays is low. > Perhaps there could be yet another type for BIG Arrays. If you want very big arrays with clisp on 32-bit machines, you can always compile clisp with -DWIDE. It will be definitely slower, but has the big arrays. Bruno |
From: <don...@is...> - 2008-09-26 17:39:53
|
Bruno Haible writes: > A loop like (dotimes (i (length array)) ...) > should not cons a bignum at every iteration. ... ok, so the bigarray package should preferably stick to fixnums. That explains the maximum index, but not total size. > > By the time you get to large arrays, the difference between 2 and 3 > > header words doesn't matter. So it would make sense to have another > > type for large arrays. > This would still have a performance overhead on every access of the small > arrays. Namely, at every AREF the code would have to distinuish the two > cases. It's not impossible to do, it's just a bit hard to implement it > in such a way that the cost for the many small arrays is low. There are already so many different cases for arrays that I assume you're dispatching on some type field, so there should be no additional cost. > If you want very big arrays with clisp on 32-bit machines, you can always > compile clisp with -DWIDE. It will be definitely slower, but has the big > arrays. Thanks for reminding me - I used to do that. I should probably still do that, e.g., for my lisp based servers running on 32 bit machines. I guess that would get me from 4M strings to 16M ? Now I have a 64 bit machine and am having trouble building on it. Also still trying to figure out how to run the 32 bit applications ... |