From: Sam S. <sd...@gn...> - 2008-10-30 17:59:24
|
Vladimir, here is what I get on x86_64: gcc -I/home/ssteingold/src/top/include -Igllib -W -Wswitch -Wcomment -Wpointer-arith -Wimplicit -Wreturn-type -Wmissing-declarations -Wno-sign-compare -falign-functions=4 -pthread -g -DDEBUG_OS_ERROR -DDEBUG_SPVW -DDEBUG_BYTECODE -DSAFETY=3 -DUNICODE -DMULTITHREAD -DPOSIX_THREADS -DDYNAMIC_FFI -I. -c spvw.c In file included from ../src/spvw.d:991: ../src/spvw_circ.d: In function 'mlb_add': ../src/spvw_circ.d:342: warning: large integer implicitly truncated to unsigned type /tmp/cc6QZA3S.s: Assembler messages: /tmp/cc6QZA3S.s:4326: Error: Incorrect register `%ebx' used with `q' suffix make: *** [spvw.o] Error 1 you can get access to a x86_64 machine using http://gcc.gnu.org/wiki/CompileFarm Thanks Sam |
From: Vladimir T. <vtz...@gm...> - 2008-10-30 21:10:21
|
Hi Sam, On Oct 30, 2008, at 7:59 PM, Sam Steingold wrote: > Vladimir, > here is what I get on x86_64: > > gcc -I/home/ssteingold/src/top/include -Igllib -W -Wswitch - > Wcomment -Wpointer-arith -Wimplicit -Wreturn-type -Wmissing- > declarations -Wno-sign-compare -falign-functions=4 -pthread -g - > DDEBUG_OS_ERROR -DDEBUG_SPVW -DDEBUG_BYTECODE -DSAFETY=3 -DUNICODE - > DMULTITHREAD -DPOSIX_THREADS -DDYNAMIC_FFI -I. -c spvw.c > In file included from ../src/spvw.d:991: > ../src/spvw_circ.d: In function 'mlb_add': > ../src/spvw_circ.d:342: warning: large integer implicitly truncated > to unsigned type > /tmp/cc6QZA3S.s: Assembler messages: > /tmp/cc6QZA3S.s:4326: Error: Incorrect register `%ebx' used with > `q' suffix > make: *** [spvw.o] Error 1 It's a problem in the inline assembly (the assembler complains, gcc things everything is fine) in spinlock implementation. I am wondering whether the I30836 version of spinlock will not work fine on AMD64 as well. Not sure. Hope tomorrow to get hands on x86_64 and figure it out. btw: there is POSIXOLD_THREADS that are not entirely implemented. From comments: POSIX_THREADS POSIX.1c pthread_* POSIXOLD_THREADS POSIX.1c draft 4 pthread_* Should we support them (any OS with this version os pthreads)? Vladimir |
From: Sam S. <sd...@gn...> - 2008-10-31 14:20:52
|
Hi Vladimir Vladimir Tzankov wrote: > > btw: there is POSIXOLD_THREADS that are not entirely implemented. > From comments: > POSIX_THREADS POSIX.1c pthread_* > POSIXOLD_THREADS POSIX.1c draft 4 pthread_* > > Should we support them (any OS with this version os pthreads)? I think all the various threads flavors should be supported based on the gnulib thread module (and locking will come from gnulib/modules/lock). to avoid extra work later, we might as well pull those modules from gnulib now. actually, gnulib also provides spinlocks - maybe we should use them instead of ours? Bruno, WDYT? Sam. |
From: Vladimir T. <vtz...@gm...> - 2008-10-31 10:59:50
|
On Oct 30, 2008, at 7:59 PM, Sam Steingold wrote: > Vladimir, > here is what I get on x86_64: > > gcc -I/home/ssteingold/src/top/include -Igllib -W -Wswitch - > Wcomment -Wpointer-arith -Wimplicit -Wreturn-type -Wmissing- > declarations -Wno-sign-compare -falign-functions=4 -pthread -g - > DDEBUG_OS_ERROR -DDEBUG_SPVW -DDEBUG_BYTECODE -DSAFETY=3 -DUNICODE - > DMULTITHREAD -DPOSIX_THREADS -DDYNAMIC_FFI -I. -c spvw.c > In file included from ../src/spvw.d:991: > ../src/spvw_circ.d: In function 'mlb_add': > ../src/spvw_circ.d:342: warning: large integer implicitly truncated > to unsigned type > /tmp/cc6QZA3S.s: Assembler messages: > /tmp/cc6QZA3S.s:4326: Error: Incorrect register `%ebx' used with > `q' suffix > make: *** [spvw.o] Error 1 It's fixed in the [threads1] branch (however the whole build is not tested). Also SPVW_PAGES build is fixed (was broken in GC). Vladimir |
From: Sam S. <sd...@gn...> - 2008-10-31 14:22:35
|
Vladimir Tzankov wrote: > On Oct 30, 2008, at 7:59 PM, Sam Steingold wrote: >> >> ../src/spvw_circ.d: In function 'mlb_add': >> ../src/spvw_circ.d:342: warning: large integer implicitly truncated >> to unsigned type >> /tmp/cc6QZA3S.s: Assembler messages: >> /tmp/cc6QZA3S.s:4326: Error: Incorrect register `%ebx' used with >> `q' suffix >> make: *** [spvw.o] Error 1 > > It's fixed in the [threads1] branch (however the whole build is not > tested). thanks, the error is now gone, but the warning is still there: In file included from ../src/spvw.d:991: ../src/spvw_circ.d: In function 'mlb_add': ../src/spvw_circ.d:342: warning: large integer implicitly truncated to unsigned type looks scary... |
From: Sam S. <sd...@gn...> - 2008-10-31 14:33:57
|
Sam Steingold wrote: > Vladimir Tzankov wrote: >> On Oct 30, 2008, at 7:59 PM, Sam Steingold wrote: >>> ../src/spvw_circ.d: In function 'mlb_add': >>> ../src/spvw_circ.d:342: warning: large integer implicitly truncated >>> to unsigned type >>> /tmp/cc6QZA3S.s: Assembler messages: >>> /tmp/cc6QZA3S.s:4326: Error: Incorrect register `%ebx' used with >>> `q' suffix >>> make: *** [spvw.o] Error 1 >> It's fixed in the [threads1] branch (however the whole build is not >> tested). > > thanks, the error is now gone, but the warning is still there: > > In file included from ../src/spvw.d:991: > ../src/spvw_circ.d: In function 'mlb_add': > ../src/spvw_circ.d:342: warning: large integer implicitly truncated to unsigned > type > > looks scary... > and may be causing a very early crash Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1090521408 (LWP 4114)] 0x0000000000417f4c in mlb_add (bitmap=0x40ffe210, obj= {one_o = 18014453590891552}) at ../src/spvw_circ.d:354 354 *p4 = (uintL****)room; room += bit(mlbs3)*sizeof(uintL***); (gdb) p room $5 = 0x41f5207c0 <Address 0x41f5207c0 out of bounds> (gdb) list 349 var char* room = (char*)bitmap->base+bitmap->used_size; 350 *p6 = (uintL******)room; room += bit(mlbs5)*sizeof(uintL*****); 351 var uintL****** p5 = &(*p6)[(addr >> mlb5) & (bit(mlbs5)-1)]; 352 *p5 = (uintL*****)room; room += bit(mlbs4)*sizeof(uintL****); 353 var uintL***** p4 = &(*p5)[(addr >> mlb4) & (bit(mlbs4)-1)]; 354 *p4 = (uintL****)room; room += bit(mlbs3)*sizeof(uintL***); 355 var uintL**** p3 = &(*p4)[(addr >> mlb3) & (bit(mlbs3)-1)]; 356 *p3 = (uintL***)room; room += bit(mlbs2)*sizeof(uintL**); 357 var uintL*** p2 = &(*p3)[(addr >> mlb2) & (bit(mlbs2)-1)]; 358 *p2 = (uintL**)room; room += bit(mlbs1)*sizeof(uintL*); (gdb) where #0 0x0000000000417f4c in mlb_add (bitmap=0x40ffe210, obj= {one_o = 18014453590891552}) at ../src/spvw_circ.d:354 #1 0x0000000000418f2c in subst_circ_mark (ptr=0x2aaaaacd1198, env=0x40ffe210) at ../src/spvw_circ.d:1390 #2 0x0000000000419005 in subst_circ (ptr=0x2aaaaacd1198, alist= {one_o = 18014453590888624}) at ../src/spvw_circ.d:1423 #3 0x00000000004f6317 in make_references (obj={one_o = 18014453590891552}) at ../src/io.d:2214 #4 0x00000000004f6e41 in read_top (stream_=0x2aaaaacd1100, whitespace_p= {one_o = 1125899916511488}) at ../src/io.d:2255 #5 0x00000000004f747d in stream_read (stream_=0x2aaaaacd1100, recursive_p= {one_o = 1125899916511488}, whitespace_p={one_o = 1125899916511488}) at ../src/io.d:2282 #6 0x00000000005a5ab8 in C_load () at ../src/debug.d:598 #7 0x000000000044215b in eval_subr (fun={one_o = 281474986332304}) at ../src/eval.d:3579 #8 0x000000000043e865 in eval1 (form={one_o = 18014453590951152}) at ../src/eval.d:3071 #9 0x000000000043e057 in eval (form={one_o = 18014453590951152}) at ../src/eval.d:2953 #10 0x000000000047116f in C_and () at ../src/control.d:2464 #11 0x000000000043f6f5 in eval_fsubr (fun={one_o = 3377713888287280}, args= {one_o = 18014453590951120}) at ../src/eval.d:3250 #12 0x000000000043e971 in eval1 (form={one_o = 18014453590951168}) at ../src/eval.d:3088 #13 0x000000000043e057 in eval (form={one_o = 18014453590951168}) at ../src/eval.d:2953 #14 0x00000000005a3bc8 in C_read_eval_print () at ../src/debug.d:409 #15 0x000000000044c657 in funcall_subr (fun={one_o = 281474986332192}, args_on_stack=1) at ../src/eval.d:5215 #16 0x000000000044af41 in funcall (fun={one_o = 281474986332192}, args_on_stack=1) at ../src/eval.d:4848 #17 0x00000000005a462a in driver () at ../src/debug.d:490 #18 0x00000000004274fa in main_actions (p=0x9582e0) at ../src/spvw.d:3661 #19 0x00000000004248e7 in mt_main_actions (param=0x1f517010) at ../src/spvw.d:3679 #20 0x00000033c8e062f7 in start_thread () from /lib64/libpthread.so.0 #21 0x00000033c82ce85d in clone () from /lib64/libc.so.6 (gdb) |
From: Sam S. <sd...@gn...> - 2008-11-04 01:12:24
Attachments:
spvw_circ_uintM.diff
|
Sam Steingold wrote: >>>> ../src/spvw_circ.d: In function 'mlb_add': >>>> ../src/spvw_circ.d:342: warning: large integer implicitly truncated >>>> to unsigned type the attached patch removed the warning, but the loading init still fails: gcc -W -Wswitch -Wcomment -Wpointer-arith -Wimplicit -Wreturn-type -Wmissing-declarations -Wno-sign-compare -Wno-format-nonliteral -falign-functions=4 -pthread -g -DDEBUG_OS_ERROR -DDEBUG_SPVW -DDEBUG_BYTECODE -DSAFETY=3 -DUNICODE -DMULTITHREAD -DPOSIX_THREADS -DDYNAMIC_FFI -I. -x none spvw.o spvwtabf.o spvwtabs.o spvwtabo.o eval.o control.o encoding.o pathname.o stream.o socket.o io.o funarg.o array.o hashtabl.o list.o package.o record.o weak.o sequence.o charstrg.o debug.o error.o misc.o time.o predtype.o symbol.o lisparit.o i18n.o foreign.o unixaux.o zthread.o built.o gllib/uniwidth/width.o gllib/uniname/uniname.o gllib/localcharset.o modules.o -lreadline -lncurses -ldl /home/ssteingold/src/top/lib/libavcall.a /home/ssteingold/src/top/lib/libcallback.a -lsigsegv -o lisp.run ./lisp.run -B . -N locale -E UTF-8 -Epathname 1:1 -Emisc 1:1 -norc -m 2MW -lp ../src/ -x '(and (load "../src/init.lisp") (sys::%saveinitmem) (ext::exit)) (ext::exit t)' STACK depth: 262062 [0x2aaaaaed0e00 0x2aaaaacd1090] i i i i i i i ooooo o ooooooo ooooo ooooo I I I I I I I 8 8 8 8 8 o 8 8 I \ `+' / I 8 8 8 8 8 8 \ `-+-' / 8 8 8 ooooo 8oooo `-__|__-' 8 8 8 8 8 | 8 o 8 8 o 8 8 ------+------ ooooo 8oooooo ooo8ooo ooooo 8 Welcome to GNU CLISP 2.47+ (2008-10-24) <http://clisp.cons.org/> Copyright (c) Bruno Haible, Michael Stoll 1992, 1993 Copyright (c) Bruno Haible, Marcus Daniels 1994-1997 Copyright (c) Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998 Copyright (c) Bruno Haible, Sam Steingold 1999-2000 Copyright (c) Sam Steingold, Bruno Haible 2001-2008 Type :h and hit Enter for context help. *** - READ: no entry for #<ADDRESS #x000100938920> from (%PUTD 'CHECK-REDEFINITION (FUNCTION CHECK-REDEFINITION (LAMBDA (OBJECT CALLER WHAT) (WHEN (AND (SYMBOLP OBJECT) (NOT (EQ CALLER 'DEFINE-SETF-EXPANDER)) (NOT (EQUAL CALLER '(SETF FIND-CLASS)))) (CHECK-SPECIAL-OPERATOR CALLER OBJECT)) (LET ((CUR-FILE *CURRENT-SOURCE-FILE*) (OLD-FILE (IF (AND (NOT (OR (EQ CALLER 'DEFINE-SETF-EXPANDER) (EQ CALLER 'DEFSETF))) (SUBR-INFO OBJECT)) "C" (CDR (GET-FILE-DOC OBJECT CALLER))))) (WHEN (CONSP OLD-FILE) (SETQ OLD-FILE (CAR OLD-FILE))) (UNLESS (OR *SUPPRESS-CHECK-REDEFINITION* (EQUALP OLD-FILE CUR-FILE) (AND (PATHNAMEP OLD-FILE) (PATHNAMEP CUR-FILE) (EQUAL (PATHNAME-NAME OLD-FILE) (PATHNAME-NAME CUR-FILE)))) (CHECK-PACKAGE-LOCK CALLER (COND ((ATOM OBJECT) (SYMBOL-PACKAGE OBJECT)) ((FUNCTION-NAME-P OBJECT) (SYMBOL-PACKAGE (SECOND OBJECT))) ((MAPCAR #'(LAMBDA (OBJ) (LET ((OO (IF (ATOM OBJ) OBJ (SECOND OBJ)))) (WHEN (SYMBOLP OO) (SYMBOL-PACKAGE OO)))) OBJECT))) OBJECT) (WHEN WHAT (WARN (TEXT "~A: redefining ~A ~S in ~A, was defined in ~A") CALLER WHAT OBJECT (OR CUR-FILE "top-level") (OR OLD-FILE #<READ-LABEL 1>)))) (SET-FILE-DOC OBJECT CALLER (AND CUR-FILE (LIST CUR-FILE *CURRENT-SOURCE-LINE-1* *CURRENT-SOURCE-LINE-2*))))))) in *READ-REFERENCE-TABLE* = ((#<READ-LABEL 1> . "top-level")) Bye. make: *** [interpreted.mem] Error 1 Vladimir, did you get access to the gcc compile farm? Sam. |
From: Vladimir T. <vtz...@gm...> - 2008-11-03 20:37:09
|
Hi Sam, On Nov 3, 2008, at 6:48 PM, Sam Steingold wrote: > Sam Steingold wrote: >>>>> ../src/spvw_circ.d: In function 'mlb_add': >>>>> ../src/spvw_circ.d:342: warning: large integer implicitly >>>>> truncated to unsigned type > > the attached patch removed the warning, but the loading init still > fails: > > ..................... > > Vladimir, did you get access to the gcc compile farm? Yes I have ubuntu running on amd64. I tried similar things as in the patch but even if it works - it is not good. I am not sure that fixing these warnings will help. The multi-level bitmap (hashset) used for detection of circularities consumes (will consumes) enormous amounts of memory on 64 bit architectures. For example it wants to malloc 0x400004190 bytes of memory (bit(31)*8 + more for the other levels) during bootstrap. It's not acceptable on any invocation of printer to execute such thing. I think about "alternative" implementation - which will at least reduce the amount of memory required. For example. While get_circ_mark() runs - no GC may happen (since it does not allocate anything and does not call blocking system calls - i.e. the thread cannot be suspended). So we may be sure that the "current" heap range will not change (I mean we can find the lowest and highest possible address for gcv_object_t - s). In this way we can effectively reduce the address space from 64 bit to something smaller. This will help but I do not like it. Especially with SPVW_PURE the address space will not be so small. Still thinking :). Vladimir |
From: Sam S. <sd...@gn...> - 2008-11-04 17:55:06
|
Vladimir Tzankov wrote: > For example it wants to malloc 0x400004190 bytes of memory (bit(31)*8 > + more for the other levels) during bootstrap. It's not acceptable on > any invocation of printer to execute such thing. > > Still thinking :). could it be that increasing the number of levels (from 6 to 10, to match the number of levels for the first 32 bits) is the solution? |
From: Vladimir T. <vtz...@gm...> - 2008-11-05 13:02:02
|
Hi Sam, On Nov 4, 2008, at 7:55 PM, Sam Steingold wrote: > Vladimir Tzankov wrote: >> For example it wants to malloc 0x400004190 bytes of memory (bit(31) >> *8 + more for the other levels) during bootstrap. It's not >> acceptable on any invocation of printer to execute such thing. >> Still thinking :). > > could it be that increasing the number of levels (from 6 to 10, to > match the number of levels for the first 32 bits) is the solution? I do not think this will help. As read the code - mlb_add() will always try to allocate bitmap for all possible addresses (correct me if I am wrong). Also I do not like the malloc() calls. I am looking now on the "old" implementation that uses gc marks. get_circ_mark() / get_circ_unmark() / subst_circ_mark() / subst_circ_unmark() cannot be interrupted by the GC (they do not do allocation or perform GC_SAFE calls). So it is safe to use marking from GC point of view. If the mark is per thread - than there will be no problems with different threads executing simultaneously these functions. Here some thoughts. Correct me if I am wrong. Currently every object on the heap start with GCSelf (32 or 64 bits) which is used only during GC (with exception of Symbols that use it for flags in some builds). Can we use these bits for marking per thread combined with a semaphore (with count - the number of available bits)? In this way we can allow concurrent circular detection for up to XX threads (XX = 32/64 minus bits reserved for symbol flags). What do you think? Vlado |
From: Sam S. <sd...@gn...> - 2008-11-05 14:36:22
|
Hi Vladimir, Vladimir Tzankov wrote: > On Nov 4, 2008, at 7:55 PM, Sam Steingold wrote: >> Vladimir Tzankov wrote: >>> For example it wants to malloc 0x400004190 bytes of memory (bit(31) >>> *8 + more for the other levels) during bootstrap. It's not >>> acceptable on any invocation of printer to execute such thing. >>> Still thinking :). >> could it be that increasing the number of levels (from 6 to 10, to >> match the number of levels for the first 32 bits) is the solution? > > I do not think this will help. As read the code - mlb_add() will > always try to allocate bitmap for all possible addresses (correct me > if I am wrong). > > Also I do not like the malloc() calls. I am looking now on the "old" > implementation that uses gc marks. > get_circ_mark() / get_circ_unmark() / subst_circ_mark() / > subst_circ_unmark() cannot be interrupted by the GC (they do not do > allocation or perform GC_SAFE calls). So it is safe to use marking > from GC point of view. > If the mark is per thread - than there will be no problems with > different threads executing simultaneously these functions. > > Here some thoughts. Correct me if I am wrong. > > Currently every object on the heap start with GCSelf (32 or 64 bits) > which is used only during GC (with exception of Symbols that use it > for flags in some builds). > Can we use these bits for marking per thread combined with a > semaphore (with count - the number of available bits)? I see no problem off hand, but I have not thought about it carefully yet. Bruno, WDYT? > In this way we can allow concurrent circular detection for up to XX > threads (XX = 32/64 minus bits reserved for symbol flags). I don't think this is necessary since circularity detection happens only in READ which has to lock all the packages anyway (I don't think we can afford acquiring a lock for each INTERN). |
From: Vladimir T. <vtz...@gm...> - 2008-11-12 11:42:39
|
Hi Sam, > Vladimir Tzankov wrote: >> On Nov 4, 2008, at 7:55 PM, Sam Steingold wrote: >>> Vladimir Tzankov wrote: >>>> For example it wants to malloc 0x400004190 bytes of memory (bit >>>> (31) *8 + more for the other levels) during bootstrap. It's >>>> not acceptable on any invocation of printer to execute such >>>> thing. >>>> Still thinking :). >>> could it be that increasing the number of levels (from 6 to 10, >>> to match the number of levels for the first 32 bits) is the >>> solution? >> I do not think this will help. As read the code - mlb_add() will >> always try to allocate bitmap for all possible addresses (correct >> me if I am wrong). >> Also I do not like the malloc() calls. I am looking now on the >> "old" implementation that uses gc marks. >> get_circ_mark() / get_circ_unmark() / subst_circ_mark() / >> subst_circ_unmark() cannot be interrupted by the GC (they do not >> do allocation or perform GC_SAFE calls). So it is safe to use >> marking from GC point of view. >> If the mark is per thread - than there will be no problems with >> different threads executing simultaneously these functions. >> Here some thoughts. Correct me if I am wrong. >> Currently every object on the heap start with GCSelf (32 or 64 >> bits) which is used only during GC (with exception of Symbols >> that use it for flags in some builds). >> Can we use these bits for marking per thread combined with a >> semaphore (with count - the number of available bits)? I was wrong twice in the above mail. Increasing the number of levels will usually reduce the size of allocated bitmap. However it will not guarantee small allocation size - depending on the object checked - it still can try to allocate enormous amounts of memory. For the GCSelf pointer - I was again wrong - only the varobjects have it. The objects in the cons heap do not start with GCSelf. So both ways are not good. The simplest way to go is to use the marking circularity detection and allow just a single thread to execute within it. Do not like it. We can use also hash table (most implementations do it in this way). Vladimir |
From: Sam S. <sd...@gn...> - 2008-11-12 14:46:27
|
Hi Vladimir, Vladimir Tzankov wrote: > > The simplest way to go is to use the marking circularity detection > and allow just a single thread to execute within it. Do not like it. well, circularity detection in _input_ is already de-facto single-threaded because READ must lock all packages. making _output_ single-threaded is certainly not a good idea. > We can use also hash table (most implementations do it in this way). yes, it would be nice to experiment with HTs and see how much we lose (or win!) with them. Sam. |
From: <don...@is...> - 2008-11-06 17:27:55
|
Thanks for renaming this thread to give those of us who are not quite as far into the issue as you some idea of what it's all about. I gather that the problem is that - there's some program for testing whether an object is circular - it was using an algorithm that starts out by allocating a bit map proportional to the size of VM - this seems too expensive for cases where there is a large VM I hope it's not too much trouble to provide a little more info for those of us who are curious about more detail. - What does this have to do with multithreading? Wasn't the problem noticed due to a build problem in MT code? - Was this code really trying to allocate a bit for every object in the possible VM of the machine? Why not just for the currently allocated memory? (1/64 of 2^64 seems unreasonable, but 1/64 of memory that you are using might not be.) - When is this function used, anyway? - I'd think that most such tests would involve small objects. Why not start with a hash table (isn't this how circle-print works?) and, possibly when that table becomes bigger than the bitmap would have been, then switch representation? (BTW, I seem to recall that there is an algorithm for circularity testing that uses very little memory.) |
From: <don...@is...> - 2008-11-06 20:55:32
|
Vladimir Tzankov writes: > The circularity check cannot be interrupted by GC - there are no so > called "safe points" in it. > So if a GC is requested while circularity check is in progress - the > GC will wait till circ check finishes in order to be able to suspend > this thread. I begin to understand - gc waits for all other threads to stop. But in that case it's terrible to have arbitrary amounts of computation between safe points, as seems to be the case here. > (if we want - "safe for GC" points may introduced in circularity > detection of course - but currently there are none). And when you do introduce them, you have to (in the current MT version with the bitmaps) start over. > > If you're willing to lock out GC while you compute circularities, > > then it seems you should be willing to lock out other circularity > > checks, and so you can use the GC bits again. But at this point So wouldn't it have been a lot easier to treat the circle check similar to gc (stop everything else) than to write a new program? Assuming you want to start with the easiest path to correct code. > > I don't understand why you don't at least start with a hash table. > > When that gets too big it becomes more reasonable to lock out all > > other circularity checks and GCs. > > We'll start - next week :) - just too busy these days. > I think we should not use any dynamic memory allocation. > Something like this: > http://article.gmane.org/gmane.lisp.clisp.devel/19419 I didn't quite follow that - was it suggesting that something like gc mark bits be used, but that more than one be allocated so that more than one thread (but still a limited number) could compute circularity at once? I'm still having trouble understanding why this circularity check is used at all. Perhaps you're saying that it's an implementation choice made under very different circumstances than those that apply to the present. I'd have thought that it would be more worth while to change that choice than to try to make it work in MT. Of course, you're the one doing the work, so I'm only providing free advice (worth every cent), based on much less understanding of the code than I imagine you have. |
From: Vladimir T. <vtz...@gm...> - 2008-11-06 21:14:47
|
Hi don. On Nov 6, 2008, at 10:55 PM, Don Cohen wrote: > Vladimir Tzankov writes: > >> The circularity check cannot be interrupted by GC - there are no so >> called "safe points" in it. >> So if a GC is requested while circularity check is in progress - the >> GC will wait till circ check finishes in order to be able to suspend >> this thread. > > I begin to understand - gc waits for all other threads to stop. > But in that case it's terrible to have arbitrary amounts of > computation between safe points, as seems to be the case here. Yes - such places (with possible arbitrary amount of computation) should be found and safe points inserted in them. This will be the "tuning" phase of MT build I suppose. >> (if we want - "safe for GC" points may introduced in circularity >> detection of course - but currently there are none). > > And when you do introduce them, you have to (in the current MT version > with the bitmaps) start over. I would prefer not to use the current MT version. >>> If you're willing to lock out GC while you compute circularities, >>> then it seems you should be willing to lock out other circularity >>> checks, and so you can use the GC bits again. But at this point > > So wouldn't it have been a lot easier to treat the circle check > similar to gc (stop everything else) than to write a new program? > Assuming you want to start with the easiest path to correct code. Yes this is the simplest approach - "two-liner" but somehow do not feel comfortable with it. >>> I don't understand why you don't at least start with a hash table. >>> When that gets too big it becomes more reasonable to lock out all >>> other circularity checks and GCs. >> >> We'll start - next week :) - just too busy these days. >> I think we should not use any dynamic memory allocation. >> Something like this: >> http://article.gmane.org/gmane.lisp.clisp.devel/19419 > > I didn't quite follow that - was it suggesting that something like gc > mark bits be used, but that more than one be allocated so that more > than one thread (but still a limited number) could compute circularity > at once? Yes. Currently single bit is used for marking and while the GC is not running we have a whole bunch of "available bits" - no need to allocate anything. The number of concurrent threads executing in circularity detection will be limited (to almost 32 or 64) which seems reasonable. > I'm still having trouble understanding why this circularity check > is used at all. Perhaps you're saying that it's an implementation > choice made under very different circumstances than those that apply > to the present. > I'd have thought that it would be more worth while to change that > choice than to try to make it work in MT. Do not know - Sam? > Of course, you're the one doing the work, so I'm only providing free > advice (worth every cent), based on much less understanding of the > code than I imagine you have. Ideas/advices/suggestions are always welcomed (I also have good understanding just on small parts of the runtime) Vladimir |
From: Vladimir T. <vtz...@gm...> - 2008-11-16 13:52:28
|
Sam, On Nov 16, 2008, at 2:17 AM, Sam Steingold wrote: >> * Vladimir Tzankov <igmnaxbi@tznvy.pbz> [2008-11-15 01:17:44 +0200]: >> Just committed (threads1 branch) spvw_circ.d which implements >> circularity detection with LISP hash table (used as hashset). >> The hash table is :fasthash-eq (but there is no big difference in >> performance if :stablehash-eq is used instead). > > implementing actual hash sets might improve things somewhat... > https://sourceforge.net/tracker2/? > func=detail&aid=1057882&group_id=1355&atid=351355 > > how about we use the plain gc mark bit for now, as a palliation? > I mean, block all other threads if we are doing i/o? > this, of course, sucks, but at least we will be able to build MT on > 64 bits. I committed (in threads1) the updated circularity detection. So far that's what we have: 1. GC marks - working in single and multi thread builds. In MT just a single thread is allowed at a time to execute circ detection (and GC is blocked)). 2. Hash table (used as hashset) - working but it's slow. 3. MLB - it is fast but does not work on 64 bit and uses malloc (on 64 bit tries to malloc huge sizes). (1) and (2) work on 64 bit platforms. By default in MT builds - GC marks with mutex lock are used. Vladimir |
From: Sam S. <sd...@gn...> - 2008-11-25 20:21:47
|
Vladimir, Vladimir Tzankov wrote: > On Nov 16, 2008, at 2:17 AM, Sam Steingold wrote: > > I committed (in threads1) the updated circularity detection. So far > that's what we have: > > 1. GC marks - working in single and multi thread builds. In MT just a > single thread is allowed at a time to execute circ detection (and GC > is blocked)). > 2. Hash table (used as hashset) - working but it's slow. > 3. MLB - it is fast but does not work on 64 bit and uses malloc (on > 64 bit tries to malloc huge sizes). > > (1) and (2) work on 64 bit platforms. > By default in MT builds - GC marks with mutex lock are used. did you consider AVL trees? we already have them in avl.d. we can use the stable hash code as the sorting key. Sam. |
From: Vladimir T. <vtz...@gm...> - 2008-11-27 07:23:00
|
On Nov 25, 2008, at 10:21 PM, Sam Steingold wrote: > Vladimir, > > Vladimir Tzankov wrote: >> On Nov 16, 2008, at 2:17 AM, Sam Steingold wrote: >> I committed (in threads1) the updated circularity detection. So >> far that's what we have: >> 1. GC marks - working in single and multi thread builds. In MT >> just a single thread is allowed at a time to execute circ >> detection (and GC is blocked)). >> 2. Hash table (used as hashset) - working but it's slow. >> 3. MLB - it is fast but does not work on 64 bit and uses malloc >> (on 64 bit tries to malloc huge sizes). >> (1) and (2) work on 64 bit platforms. >> By default in MT builds - GC marks with mutex lock are used. (1) is not good for MT. So on 64 bit MT the only option currently is Hash table (until better hashset/sparse bit array is implemented). > > did you consider AVL trees? > we already have them in avl.d. > we can use the stable hash code as the sorting key. I did not. The thing that I do not like in it for this purpose is usage of malloc() (for every NODE). Vladimir |
From: Sam S. <sd...@gn...> - 2008-11-06 17:52:08
|
Don Cohen wrote: > > - What does this have to do with multithreading? bitmaps are used only with MT |
From: <don...@is...> - 2008-11-06 19:27:09
|
Sam Steingold writes: > Don Cohen wrote: > > > > - What does this have to do with multithreading? > bitmaps are used only with MT This is weird! I now see that single thread was using GC mark bits, and I can see that you can't share those bits in the case where you have to compute circularities in two threads at the same time. But what happens if you have to GC in one thread while doing the circularity check in another? Then your bit maps no longer work. So, without reading the code further, it seems that you either have to delay GC (and all threads that want to do it, very likely all of them if you run out of space) until the check is done, or be prepared to abandon and restart the circularity check after the GC finishes. If you're willing to lock out GC while you compute circularities, then it seems you should be willing to lock out other circularity checks, and so you can use the GC bits again. But at this point I don't understand why you don't at least start with a hash table. When that gets too big it becomes more reasonable to lock out all other circularity checks and GCs. The file also says that this check is used in both print and read. I don't understand why it should be used by either. In print, if *print-circle* is nil then you don't have to worry about circularities at all, and if it's t then I'd expect you to create a hash table of pointers as you print. In read, it seems all you have to do is recognize the circularity markers as they arrive, store the new ones and look up the old ones. And all of this should have no impact on GC. If this is already analyzed and explained somewhere, please send a link. |
From: Vladimir T. <vtz...@gm...> - 2008-11-06 20:23:07
|
Hi Don, On Nov 6, 2008, at 9:26 PM, Don Cohen wrote: > Sam Steingold writes: >> Don Cohen wrote: >>> >>> - What does this have to do with multithreading? >> bitmaps are used only with MT > This is weird! > I now see that single thread was using GC mark bits, > and I can see that you can't share those bits in the case where > you have to compute circularities in two threads at the same time. > But what happens if you have to GC in one thread while doing the > circularity check in another? Then your bit maps no longer work. > So, without reading the code further, it seems that you either have > to delay GC (and all threads that want to do it, very likely all of > them if you run out of space) until the check is done, or be prepared > to abandon and restart the circularity check after the GC finishes. The circularity check cannot be interrupted by GC - there are no so called "safe points" in it. So if a GC is requested while circularity check is in progress - the GC will wait till circ check finishes in order to be able to suspend this thread. (if we want - "safe for GC" points may introduced in circularity detection of course - but currently there are none). > If you're willing to lock out GC while you compute circularities, > then it seems you should be willing to lock out other circularity > checks, and so you can use the GC bits again. But at this point > I don't understand why you don't at least start with a hash table. > When that gets too big it becomes more reasonable to lock out all > other circularity checks and GCs. We'll start - next week :) - just too busy these days. I think we should not use any dynamic memory allocation. Something like this: http://article.gmane.org/gmane.lisp.clisp.devel/19419 > The file also says that this check is used in both print and read. > I don't understand why it should be used by either. In print, if > *print-circle* is nil then you don't have to worry about circularities > at all, and if it's t then I'd expect you to create a hash table of > pointers as you print. Yes - print is the "difficult" case. > In read, it seems all you have to do is recognize the circularity > markers as they arrive, store the new ones and look up the old ones. > And all of this should have no impact on GC. > You are right here. This involves changes to the reader (which I have still not touched). Vladimir |
From: Sam S. <sd...@gn...> - 2008-11-06 21:59:40
|
Don Cohen wrote: > I'm still having trouble understanding why this circularity check > is used at all. when *print-circle* is non-NIL, every object is printed as either #N=... or #N#, thus you have to know which objects have already been printed and under which number, i.e., you need a map: t -> int when reading, you need to decode all the #N# objects, so you need a map: int -> t except that the destination object might not exist yet. e.g., imagine you are reading #1=#(1 2 #1# 3 4). when you read "#1=#(" you know this is a vector, but you do not know the length, so you postpone allocating it until you read all the elements and know the length. then you come to #1# and ... oops ... you have to replace it with the vector you have not allocated yet. therefore what we do is read the above as #(1 2 #<read label 1> 3 4) and, when we done reading, we can the object for read labels and do the substitutions. |
From: <don...@is...> - 2008-11-06 22:26:24
|
Sam Steingold writes: > Don Cohen wrote: > > I'm still having trouble understanding why this circularity check > > is used at all. > > when *print-circle* is non-NIL, every object is printed as either #N=... or > #N#, thus you have to know which objects have already been printed and under > which number, i.e., you need a map: t -> int Is there a requirement that you use no more #N's than necessary? If you're allowed to print #1=(#2=A) then you can build the map as you print. I admit that if you want to print (A) instead then you have to do this before you start. But I don't see any reason not to just walk the object in lisp code building a table. This must be only marginally more expensive than using the gc bits. And probably a lot cheaper than allocating a bit map to represent all of VM. And it can be interrupted by GC. > when reading, you need to decode all the #N# objects, so you need a > map: int -> t except that the destination object might not exist > yet. e.g., imagine you are reading #1=#(1 2 #1# 3 4). when you > read "#1=#(" you know this is a vector, but you do not know the > length, so you postpone allocating it until you read all the Is there any requirement that the read object be some particular type of vector? Couldn't you allocate an adjustable array and use vector-push-extend ? I suppose you could always come back and fix it later (as you evidently do now) if you wish to use non adjustable arrays. In any case, neither your solution nor mine requires a call to a function that finds the (non-immediate) objects that appear more than once in an object. |
From: Vladimir T. <vtz...@gm...> - 2008-11-14 23:17:51
|
Sam, On Nov 12, 2008, at 4:46 PM, Sam Steingold wrote: > >> We can use also hash table (most implementations do it in this way). > > yes, it would be nice to experiment with HTs and see how much we > lose (or win!) with them. Just committed (threads1 branch) spvw_circ.d which implements circularity detection with LISP hash table (used as hashset). The hash table is :fasthash-eq (but there is no big difference in performance if :stablehash-eq is used instead). The overall performance is much worse than with MLB (bitmaps) or gc marks. On overall I see it about 40-50% slower when running 'make check- tests' (on 32 bit). "Easily" can be added more hashset options. The most important feature of each implementation is whether it may GC. I am looking for alternative hashset implementation. Judy arrays (http://judy.sourceforge.net/) look interesting (however they also "malloc"). btw: DEBUG_GCSAFETY does not work with GENERATIONAL_GC (at least in MT build). Not that I need it - but have tried it while testing. Vladimir |