From: <don...@is...> - 2006-02-27 21:18:24
|
I find myself for the first time in possession of a 64 bit machine (AMD Athlon eMachines, would you believe) and thought I'd try using some of the expanded address space. I've installed Fedora Core 4 with 10G swap space. (I assume this is really used for "paging" rather than "swapping") I have 512M RAM. The first thing I tried was making large strings. One of my complaints with recent clisp versions was that the limit was too small. I think it's currently about 2MB whereas it used to be about 8MB (for 32 bit machines). So I was happy to see that the limit on the 64 bit machine was much higher - 64M. (Aside - I prefer to think of lisp as not imposing such artificial limits. Wouldn't it be more reasonable to allow any variable size datum to be as large as the largest chunk of VM the OS is willing to allocate?) The result below shows one minor bug in TIME. Evidently it's not prepared for numbers of bytes that don't fit in 32 bits. My focus here is performance issues related to paging and GC. [1]> (lisp-implementation-version) "2.38 (2006-01-24) (built on hammer1.fedora.redhat.com)" [2]> *features* (:READLINE :REGEXP :SYSCALLS :I18N :LOOP :COMPILER :CLOS :MOP :CLISP :ANSI-CL :COMMON-LISP :LISP=CL :INTERPRETER :SOCKETS :GENERIC-STREAMS :LOGICAL-PATHNAMES :SCREEN :FFI :GETTEXT :UNICODE :BASE-CHAR=CHARACTER :PC386 :UNIX) [3]> (setf s nil) NIL [4]> (loop (time (length (push (make-string 64000000) s))) (room) (run-shell-command "cat /proc/meminfo |grep swap") ) The cat|grep command above didn't work as intended. The output I wanted uses upper case for the S in Swap. Real time: 0.00558 sec. Run time: 0.008001 sec. Space: 64001776 Bytes GC: 2, GC time: 0.008001 sec. Bytes permanently allocated: 159712 Bytes currently in use: 67109592 Bytes available until next GC: 772688 Real time: 1.864564 sec. Run time: 0.508031 sec. Space: 64001776 Bytes GC: 2, GC time: 0.508031 sec. Bytes permanently allocated: 159712 Bytes currently in use: 131099760 Bytes available until next GC: 16771626 ... I attempt to save space and cognitive effort by replacing the output as illustrated above with a more concise table below. real run gc gc time MB used MB left .0056 0.008 2 0.008 64 0 1.865 0.508 2 0.508 125 15 3.227 0.472 2 0.472 186 31 2.029 0.32 2 0.32 247 46 2.095 0.384 1 0.384 308 0 .0058 0.004 1 0.004 369 0 10.97 0.552 1 0.552 430 0 15.84 1.312 1 1.296 491 46 .1198 0.008 1 0.008 552 46 3.088 0.592 1 0.592 613 46 20.88 1.32 1 1.316 674 92 .0001 0.0 0 0.0 735 31 9.566 0.656 1 0.656 796 92 .0001 0.0 0 0.0 857 31 18.17 1.476 1 1.476 918 92 .0001 0.0 0 0.0 979 31 46.52 2.248 1 2.248 1040 183 .1286 0.0 0 0.0 1101 122 .0001 0.0 0 0.0 1162 61 .0001 0.0 0 0.0 1223 0 31.43 1.48 1 1.48 1284 183 .0001 0.0 0 0.0 1345 122 .0001 0.0 0 0.0 1406 61 .0001 0.0 0 0.0 1467 0 27.53 2.716 1 2.712 1528 183 .0001 0.0 0 0.0 1589 122 .0001 0.0 0 0.0 1650 61 .0001 0.0 0 0.0 1711 0 235.3 5.216 1 5.216 1772 366 .0001 0.0 0 0.0 1834 305 .0001 0.0 0 0.0 1895 244 .0001 0.0 0 0.0 1956 183 .0001 0.0 0 0.0 2017 122 .0001 0.0 0 0.0 2078 61 .0001 0.0 0 0.0 2139 0 72.81 2.928 1 2.928 2200 366 .0001 0.0 0 0.0 2261 305 .0001 0.0 0 0.0 2322 244 .0001 0.0 0 0.0 2383 183 .0001 0.0 0 0.0 2444 122 .0001 0.0 0 0.0 2505 61 .0001 0.0 0 0.0 2566 0 32.62 3.524 1 3.524 2627 366 .0001 0.0 0 0.0 2688 305 .0001 0.0 0 0.0 2749 244 .0001 0.0 0 0.0 2810 183 .0001 0.0 0 0.0 2871 122 .0001 0.0 0 0.0 2932 61 .0001 0.0 0 0.0 2993 0 230.5 16.98 1 16.98 3054 687 .0001 0.0 0 0.0 3115 626 .0001 0.0 0 0.0 3176 565 .0001 0.0 0 0.0 3237 504 .0001 0.0 0 0.0 3298 443 .0001 0.0 0 0.0 3359 382 .0001 0.0 0 0.0 3420 321 .0001 0.0 0 0.0 3482 260 .0001 0.0 0 0.0 3543 199 .0001 0.0 0 0.0 3604 137 .0001 0.0 0 0.0 3665 76 .0001 0.0 0 0.0 3726 15 49.01 7.208 1 7.208 3787 687 .0207 0.0 0 0.0 3848 626 .0001 0.0 0 0.0 3909 565 .0001 0.0 0 0.0 3970 504 .0001 0.0 0 0.0 4031 443 .0001 0.0 0 0.0 4092 382 At this point we hit a bug. Run time: 0.0 sec. Space: 64001776 Bytes Bytes permanently allocated: 159712 Bytes currently in use: 4291152312 Bytes available until next GC: 400722650 *** - SYSTEM::DELTA4: negative difference: [255 13651352] > [3 10544264] The following restarts are available: ABORT :R1 ABORT Break 1 [5]> where <1> #<SPECIAL-OPERATOR MULTIPLE-VALUE-CALL> EVAL frame for form (MULTIPLE-VALUE-CALL #'SYSTEM::%TIME (SYSTEM::%%TIME) #:G3622 #:G3623 #:G3624 #:G3625 #:G3626 #:G3627 #:G3628 #:G3629 #:G3630) Break 1 [5]> up <1> #<SPECIAL-OPERATOR UNWIND-PROTECT> EVAL frame for form (UNWIND-PROTECT (LENGTH (PUSH (MAKE-STRING 64000000) S)) (MULTIPLE-VALUE-CALL #'SYSTEM::%TIME (SYSTEM::%%TIME) #:G3622 #:G3623 #:G3624 #:G3625 #:G3626 #:G3627 #:G3628 #:G3629 #:G3630)) Break 1 [5]> return Values: nil Bytes permanently allocated: 159712 Bytes currently in use: 4355591872 ... and now the rest of the data summarized real run gc gc time MB used MB left .0001 0.0 0 0.0 4214 259 .0001 0.0 0 0.0 4275 198 .0001 0.0 0 0.0 4336 137 .0008 0.0 0 0.0 4397 76 .0001 0.0 0 0.0 4459 15 47.6 10.1 1 10.07 4519 687 .0001 0.0 0 0.0 4580 626 .0001 0.0 0 0.0 4641 565 .0001 0.0 0 0.0 4702 504 .0001 0.0 0 0.0 4763 443 .0001 0.0 0 0.0 4824 382 .0001 0.0 0 0.0 4885 321 .0001 0.0 0 0.0 4946 260 .0001 0.0 0 0.0 5007 199 .0001 0.0 0 0.0 5068 137 .0001 0.0 0 0.0 5130 76 .0001 0.0 0 0.0 5191 15 2139. 83.69 1 83.69 5251 1236 .1932 0.0 0 0.0 5313 1175 .0001 0.0 0 0.0 5374 1114 .0001 0.0 0 0.0 5435 1053 .0001 0.004 0 0.0 5496 992 .0001 0.0 0 0.0 5557 931 .0001 0.0 0 0.0 5618 870 .0001 0.0 0 0.0 5679 809 .0001 0.0 0 0.0 5740 748 .0001 0.0 0 0.0 5801 687 .0001 0.0 0 0.0 5862 626 .0001 0.0 0 0.0 5923 565 .0001 0.0 0 0.0 5984 504 .0001 0.0 0 0.0 6045 443 .0001 0.0 0 0.0 6106 382 .0001 0.0 0 0.0 6167 321 .0001 0.0 0 0.0 6228 259 .0001 0.0 0 0.0 6289 198 .0001 0.0 0 0.0 6350 137 .0001 0.0 0 0.0 6411 76 .0001 0.0 0 0.0 6472 15 152.1 26.31 1 26.31 6533 1236 .0338 0.0 0 0.0 6594 1175 .0001 0.0 0 0.0 6655 1114 .0001 0.0 0 0.0 6716 1053 .0001 0.0 0 0.0 6777 992 .0001 0.0 0 0.0 6838 931 .0001 0.0 0 0.0 6899 870 .0001 0.0 0 0.0 6961 809 .0001 0.0 0 0.0 7022 748 .0001 0.0 0 0.0 7083 687 .0001 0.0 0 0.0 7144 626 .0001 0.0 0 0.0 7205 565 .0001 0.0 0 0.0 7266 504 .0001 0.0 0 0.0 7327 443 .0001 0.0 0 0.0 7388 382 .0001 0.0 0 0.0 7449 321 .0001 0.0 0 0.0 7510 259 .0001 0.0 0 0.0 7571 198 .0001 0.0 0 0.0 7632 137 .0001 0.0 0 0.0 7693 76 .0001 0.0 0 0.0 7754 15 [Here we run out of space.] The result is in some respects what I expected. If there's enough space available then the allocation takes almost no time. Otherwise there's a gc and more space is allocated. Since the non-gc cases are relatively uninteresting, I collect below the gc cases. real run gc gc time MB used MB left notes .0056 0.008 2 0.008 64 0 ==== 1.865 0.508 2 0.508 125 15 2 gc's 3.227 0.472 2 0.472 186 31 2.029 0.32 2 0.32 247 46 ==== 2.095 0.384 1 0.384 308 0 VM<RAM .0058 0.004 1 0.004 369 0 10.97 0.552 1 0.552 430 0 15.84 1.312 1 1.296 491 46 ==== .1198 0.008 1 0.008 552 46 real/run 3.088 0.592 1 0.592 613 46 5 20.88 1.32 1 1.316 674 92 16 9.566 0.656 1 0.656 796 92 15 18.17 1.476 1 1.476 918 92 12 46.52 2.248 1 2.248 1040 183 21 31.43 1.48 1 1.48 1284 183 21 27.53 2.716 1 2.712 1528 183 10 235.3 5.216 1 5.216 1772 366 45 72.81 2.928 1 2.928 2200 366 25 32.62 3.524 1 3.524 2627 366 9 230.5 16.98 1 16.98 3054 687 14 49.01 7.208 1 7.208 3787 687 7 47.6 10.1 1 10.07 4519 687 5 2139. 83.69 1 83.69 5251 1236 26 152.1 26.31 1 26.31 6533 1236 6 One interesting point is that the first few allocations cause 2 GC's. I suppose that the first one allocates less space than is needed and it takes another to allocate the rest. It seems that this could be easily avoided by telling gc how much space you're trying to allocate and have it allocate enough to satisfy the request. It seems clear that what is reported as gc time is gc "run" time as opposed to gc "real" time. I'm not sure how accurate these run times are (what exactly they measure) in cases that involve a lot of paging. It's possible that some other processes might have been competing for the disk during some of the GC's but I think this would have been the exception. It's not surprising that the times are generally increasing. It is interesting that they are not monotonic. Rather they seem to go in a cycle of a large time followed by a few smaller ones. I may be hallucinating but I imagine this might be related to generational gc, with the smaller times looking only at more recently allocated data, and after a few of those a gc that looks at more data. Actually I was somewhat surprised that GC took a major amount of time in this example at all. The entire run allocated about 100 strings, and I expected gc to have to read only the pages containing the headers of those objects. The times above suggest that instead it is reading in and scanning every page. Does that make sense? What is there to look for in the middle of a string? I also wonder how many times a given page is read and scanned. In this case I don't see why it should be more than once. But 2000 sec seems much longer than should be needed to read 5GB. |
From: Stephen C. <s1...@me...> - 2006-02-28 06:54:58
|
On Mon, 2006-02-27 at 13:18 -0800, Don Cohen wrote: > (Aside - I prefer to think of lisp as not imposing such artificial > limits. Wouldn't it be more reasonable to allow any variable size > datum to be as large as the largest chunk of VM the OS is willing to > allocate?) Unfortunately, this issue is made a little more difficult by ANSI. Read http://www.lisp.org/HyperSpec/Issues/iss014-writeup.html (ARRAY-DIMENSION-LIMIT-IMPLICATIONS) for details; suffice to say, array-total-size-limit et al must be fixnums. I remember someone saying on clisp-list (or maybe c.l.lisp) a while ago (sorry, can't find the particular message) that lots of places in clisp code depend on fixnums fitting into 24 bits, and changing that would be a lot of work. -- Stephen Compall http://scompall.nocandysw.com/blog |
From: <don...@is...> - 2006-02-28 07:39:10
|
Stephen Compall writes: > On Mon, 2006-02-27 at 13:18 -0800, Don Cohen wrote: > > (Aside - I prefer to think of lisp as not imposing such artificial > > limits. Wouldn't it be more reasonable to allow any variable size > > datum to be as large as the largest chunk of VM the OS is willing to > > allocate?) > > Unfortunately, this issue is made a little more difficult by ANSI. Read > http://www.lisp.org/HyperSpec/Issues/iss014-writeup.html > (ARRAY-DIMENSION-LIMIT-IMPLICATIONS) for details; suffice to say, > array-total-size-limit et al must be fixnums. Yes, I noticed that. Seems unreasonable to me. (My ideas of what lisp is or should be actually predate common lisp.) But I notice that fixnums on AMD 64 are 48 bits, which conveniently is also the max size VM the machine supports. |