From: Sam S. <sd...@gn...> - 2008-11-25 15:00:43
|
Now check-tests-parallel fails in interpret_bytecode: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1124084032 (LWP 1695)] 0x000000000045702c in interpret_bytecode_ (closure= {one_o = 16164979579800633476}, codeptr=0x346b2f8c0, byteptr=0x346b2f8fe "ch\002\001@\002\026\002H\031\004\031\004") at ../src/eval.d:7025 7025 codeptr = TheSbvector(TheCclosure(closure)->clos_codevec); (gdb) p closure $2 = {one_o = 16164979579800633476} (gdb) up #1 0x000000000044ee4a in funcall_closure (closure={one_o = 2533288861432472}, args_on_stack=2) at ../src/eval.d:5618 5618 interpret_bytecode(closure,codevec,CCV_START_NONKEY); /* process Bytecode starting at Byte 8 */ (gdb) p closure $3 = {one_o = 2533288861432472} (gdb) xout closure #<COMPILED-FUNCTION SYS::INDEFINITE-SUBCLASSP>{one_o = 2533288861432472} (gdb) down #0 0x000000000045702c in interpret_bytecode_ (closure= {one_o = 16164979579800633476}, codeptr=0x346b2f8c0, byteptr=0x346b2f8fe "ch\002\001@\002\026\002H\031\004\031\004") at ../src/eval.d:7025 7025 codeptr = TheSbvector(TheCclosure(closure)->clos_codevec); (gdb) p closure $4 = {one_o = 16164979579800633476} (gdb) p closureptr $5 = (gcv_object_t *) 0x551bd8 (gdb) p *closureptr $6 = {one_o = 16164979579800633476} looks like closureptr is not restored properly by popSP(closureptr = (gcv_object_t*) ); |
From: Vladimir T. <vtz...@gm...> - 2008-11-25 16:20:03
|
On Nov 25, 2008, at 5:00 PM, Sam Steingold wrote: > Now check-tests-parallel fails in interpret_bytecode: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 1124084032 (LWP 1695)] > 0x000000000045702c in interpret_bytecode_ (closure= > {one_o = 16164979579800633476}, codeptr=0x346b2f8c0, > byteptr=0x346b2f8fe "ch\002\001@\002\026\002H\031\004\031\004") > at ../src/eval.d:7025 > 7025 codeptr = TheSbvector(TheCclosure(closure)- > >clos_codevec); > (gdb) p closure > $2 = {one_o = 16164979579800633476} > (gdb) up > #1 0x000000000044ee4a in funcall_closure (closure={one_o = > 2533288861432472}, > args_on_stack=2) at ../src/eval.d:5618 > 5618 interpret_bytecode(closure,codevec,CCV_START_NONKEY); / > * process Bytecode starting at Byte 8 */ > (gdb) p closure > $3 = {one_o = 2533288861432472} > (gdb) xout closure > #<COMPILED-FUNCTION SYS::INDEFINITE-SUBCLASSP>{one_o = > 2533288861432472} > (gdb) down > #0 0x000000000045702c in interpret_bytecode_ (closure= > {one_o = 16164979579800633476}, codeptr=0x346b2f8c0, > byteptr=0x346b2f8fe "ch\002\001@\002\026\002H\031\004\031\004") > at ../src/eval.d:7025 > 7025 codeptr = TheSbvector(TheCclosure(closure)- > >clos_codevec); > (gdb) p closure > $4 = {one_o = 16164979579800633476} > (gdb) p closureptr > $5 = (gcv_object_t *) 0x551bd8 > (gdb) p *closureptr > $6 = {one_o = 16164979579800633476} > > looks like closureptr is not restored properly by > > popSP(closureptr = (gcv_object_t*) ); On which test does this happen? How deep is the C call stack? Most of the SIGSEGV I have encountered in interpret_bytecode were caused by C stack overflow - but I have increased it to 16 MB in check-tests-parallel. Also in most of the cases it happens on entering in interpret_bytecode. Vladimir |
From: Sam S. <sd...@gn...> - 2008-11-25 17:12:40
|
Vladimir Tzankov wrote: > > On which test does this happen? dunno. what difference does it make? on a different occasion I got a deadlock (nothing happens, loadavg=2, i.e., apparently, two threads are spinning). the tests now get fairly far, while generating quite a few failures. after the crash I see these: 0 Nov 25 11:33 iofkts.erg 0 Nov 25 11:33 lambda.erg 0 Nov 25 11:33 lists151.erg 0 Nov 25 11:35 alltest.erg 0 Nov 25 11:35 characters.erg 0 Nov 25 11:35 clos.erg 870 Nov 25 11:35 backquot.erg 0 Nov 25 11:36 encoding.erg 3002 Nov 25 11:36 eval20.erg 0 Nov 25 11:36 ext-clisp.erg > How deep is the C call stack? Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1107302720 (LWP 5168)] 0x000000000045702c in interpret_bytecode_ (closure={one_o = 202623543511880}, codeptr=0x3456b0730, byteptr=0x3456b07dc "\026\006H\033���\033B�\033��\205\a�k\r��3\003\025��h\a") at ../src/eval.d:7025 7025 codeptr = TheSbvector(TheCclosure(closure)->clos_codevec); (gdb) where #0 0x000000000045702c in interpret_bytecode_ (closure= {one_o = 202623543511880}, codeptr=0x3456b0730, byteptr=0x3456b07dc "\026\006H\033���\033B�\033��\205\a�k\r��3\003\025��h\a") at ../src/eval.d:7025 #1 0x000000000044ee4a in funcall_closure (closure={one_o = 2533288840017992}, args_on_stack=2) at ../src/eval.d:5618 #2 0x000000000044bbd9 in funcall (fun={one_o = 2533288840017992}, args_on_stack=2) at ../src/eval.d:4850 #3 0x00000000004550b9 in interpret_bytecode_ (closure= {one_o = 2533288840034328}, codeptr=0x3456c7030, byteptr=0x3456c70c8 "") at ../src/eval.d:6833 #4 0x000000000044b624 in apply_closure (closure={one_o = 2533288840034328}, args_on_stack=0, args={one_o = 1125899916528448}) at ../src/eval.d:4783 #5 0x0000000000446716 in apply (fun={one_o = 2533288840034328}, args_on_stack=0, other_args={one_o = 18014453559950672}) at ../src/eval.d:4004 #6 0x000000000045544a in interpret_bytecode_ (closure= {one_o = 2533288840056976}, codeptr=0x3456ca728, byteptr=0x3456ca745 "") at ../src/eval.d:6854 #7 0x000000000044ee4a in funcall_closure (closure={one_o = 2533288840056976}, args_on_stack=0) at ../src/eval.d:5618 #8 0x000000000044bbd9 in funcall (fun={one_o = 2533288840056976}, args_on_stack=0) at ../src/eval.d:4850 #9 0x0000000000648da1 in thread_stub (arg=0x165d78c0) at ../src/zthread.d:142 #10 0x00000033c8e062f7 in start_thread () from /lib64/libpthread.so.0 #11 0x00000033c82ce85d in clone () from /lib64/libc.so.6 (gdb) what I see now is: *** - FRESH-LINE: extending the vector by 1 elements makes it too long Program exited with code 01. (gdb) |
From: Vladimir T. <vtz...@gm...> - 2008-11-25 17:34:37
|
On Nov 25, 2008, at 7:12 PM, Sam Steingold wrote: > Vladimir Tzankov wrote: >> On which test does this happen? > > dunno. > what difference does it make? Some tests require (very) deep stack - but apparently this is not the case. > on a different occasion I got a deadlock (nothing happens, > loadavg=2, i.e., apparently, two threads are spinning). I observed deadlocks and traced them to symbol package.d:newinsert(): (actually in string_hashcode()) but have gone further. |
From: Vladimir T. <vtz...@gm...> - 2008-11-25 16:47:25
|
Hi, Currently I encountered following problems: 1. There is a problem in the GC (pinned objects) - I am working on it now - almost succeeded to consistently reproduce it (and it seems related to SSTRINGS). 2. Streams bindings get messed (especially broadcasted *standard- output*). I am not sure that bindings get wrong or something else. 3. SSTRINGS (particularly O() that are used by the printer) cause SIGSEGV when used in few threads simultaneously - but may be this is related to the GC problem above. Vladimir |
From: Vladimir T. <vtz...@gm...> - 2008-11-26 13:56:26
|
Sam, On Nov 25, 2008, at 7:12 PM, Sam Steingold wrote: > Vladimir Tzankov wrote: >> On which test does this happen? > > dunno. > what difference does it make? > on a different occasion I got a deadlock (nothing happens, > loadavg=2, i.e., apparently, two threads are spinning). All the deadlocks I observed are due to some package operations (rehash_symtab, symtab_find, etc) - the internals are not in good state. Usually a thread wants to perform GC and waits all other threads to reach safe points - but as it happens - there is a thread in an infinite non-consing loop (rehash_symtab) causing hang of the whole process. As you wrote in doc/multithread.txt - the packages should be lockable. There should be some global locks as well (O(all_packages) at least). The places that should not be interrupted in the single thread build are "protected" by break_sem_2. In MT I removed all the break_sems because the threads can be interrupted only at "safe points". I am wondering whether the places where break_sem_2 are used are enough to be protected or we have to lock in a bigger granularity (the whole intern() or just the part around make_present() for example)? It looks to me that a higher granularity is needed. Vladimir |
From: Sam S. <sd...@gn...> - 2008-11-26 14:11:23
|
Vladimir, Vladimir Tzankov wrote: > On Nov 25, 2008, at 7:12 PM, Sam Steingold wrote: > > All the deadlocks I observed are due to some package operations > (rehash_symtab, symtab_find, etc) - the internals are not in good > state. Usually a thread wants to perform GC and waits all other > threads to reach safe points - but as it happens - there is a thread > in an infinite non-consing loop (rehash_symtab) causing hang of the > whole process. > > As you wrote in doc/multithread.txt - the packages should be > lockable. There should be some global locks as well (O(all_packages) > at least). > > I am wondering whether the places where break_sem_2 are used are > enough to be protected or we have to lock in a bigger granularity > (the whole intern() or just the part around make_present() for > example)? It looks to me that a higher granularity is needed. Yes, I am in favor of locking packages in intern, import, export et al. I also think that packages should come with an internal mutex and MUTEX-LOCK should operate on them too. Sam. |
From: <don...@is...> - 2008-11-26 18:45:11
|
Vladimir Tzankov writes: > All the deadlocks I observed are due to some package operations > (rehash_symtab, symtab_find, etc) - the internals are not in good > state. Usually a thread wants to perform GC and waits all other > threads to reach safe points - but as it happens - there is a thread > in an infinite non-consing loop (rehash_symtab) causing hang of the > whole process. Either I don't understand something (please enlighten me) or this violates my expectations in several ways. First, rehashing should not be an infinite loop, right? Or is rehash_symtab doing something other than rehashing a package? But infinite (or long) threads seem to me a more general problem. Clearly it would be desirable to be able to GC even when some thread is really in a non-consing infinite loop. I thought this problem was already solved in concept: all running code must "occasionally" reach a state that GC can understand and, in that state, check whether there is some reason to suspend. Ideally "occasionally" should be guaranteed to be less than some quantifiable not-too-large time limit. I thought that part of MT conversion is to make that actually be the case. Is that not so? My understanding was that - Intepreted lisp code reduces to solving the problem for the interpreter (one or more of the cases below) - compiled code (which in clisp is interpreted by a byte code interpreter) reduces to solving the problem for the byte code interpreter. This seems easy - you could check between instructions, and make sure that a single byte code instruction cannot run arbitrarily long. This leaves - c code that's part of the clisp implementation This is where you have to examine many cases and make sure that there are no arbitrarily long computations without checks. - foreign code I thought you had a solution here, but I'm not sure what it is. Was this supposed to be allowed to run, but not allowed to use any data affected by GC ? BTW, it occurs to me that the "check" for suspend might be done at almost zero cost by making use of some of those interesting hardware features that have been used for GC. Has this been done or considered? I don't know what the cost of the check is otherwise, but perhaps such a solution would encourage implementors to check more often and thus reduce the delay limit. |
From: Vladimir T. <vtz...@gm...> - 2008-11-27 08:18:58
|
Hi, On Nov 26, 2008, at 8:45 PM, Don Cohen wrote: > Vladimir Tzankov writes: > >> All the deadlocks I observed are due to some package operations >> (rehash_symtab, symtab_find, etc) - the internals are not in good >> state. Usually a thread wants to perform GC and waits all other >> threads to reach safe points - but as it happens - there is a thread >> in an infinite non-consing loop (rehash_symtab) causing hang of the >> whole process. > > Either I don't understand something (please enlighten me) or this > violates my expectations in several ways. > > First, rehashing should not be an infinite loop, right? sure it should not and I doubt you will find anybody to disagree on this :). > Or is rehash_symtab doing something other than rehashing a package? symtabs in packages are not lisp hash tables. Problems with packages (when modified concurrently) were expected and it's time to fix them. Probably there will be problems with other lisp objects (including hash tables) when used concurrently without locking. We are going to address possible deadlocks/crashes in these cases. > But infinite (or long) threads seem to me a more general problem. > Clearly it would be desirable to be able to GC even when some thread > is really in a non-consing infinite loop. > > I thought this problem was already solved in concept: > all running code must "occasionally" reach a state that GC can > understand and, in that state, check whether there is some reason to > suspend. Ideally "occasionally" should be guaranteed to be less than > some quantifiable not-too-large time limit. > > I thought that part of MT conversion is to make that actually be the > case. Is that not so? > > My understanding was that > - Intepreted lisp code reduces to solving the problem for the > interpreter (one or more of the cases below) > - compiled code (which in clisp is interpreted by a byte code > interpreter) reduces to solving the problem for the byte code > interpreter. This seems easy - you could check between instructions, > and make sure that a single byte code instruction cannot run > arbitrarily long. Currently the GC can suspend threads only in two cases. When thread allocates heap object or when the thread is in (possibly) blocking system (foreign) call. For the latter - all such possibly blocking foreign calls should be found and "marked" (currently - streams and file system operations are mostly marked). As you write below - checking whether GC waits on thread to be suspended is almost zero cost - it is not a problem to add checks in (bytecode) interpreter if needed. However depending where these checks are put additional actions may be needed - at least putting all objects in use to GC visible locations. > This leaves > - c code that's part of the clisp implementation > This is where you have to examine many cases and make sure that > there are no arbitrarily long computations without checks. The problem with package is of this kind (not only). It happens because the package internal data is not consistent due to concurrent modifications. There are other such places which will be addressed (btw: I do not know all of them). > - foreign code > I thought you had a solution here, but I'm not sure what it is. > Was this supposed to be allowed to run, but not allowed to use > any data affected by GC ? As I wrote above the foreign calls that may block should be "marked" (enclosed in GC safe regions). If pointer in the heap is passed to foreign code - the heap object should be "pinned" - so GC (if happens) is not going to move it. I see here a problem (possible crash) when another thread destructively modifies pinned object (i.e. adjustable arrays) while it is passed to foreign call. > > BTW, it occurs to me that the "check" for suspend might be done at > almost zero cost by making use of some of those interesting hardware > features that have been used for GC. Has this been done or > considered? I don't know what the cost of the check is otherwise, but > perhaps such a solution would encourage implementors to check more > often and thus reduce the delay limit. Vladimir |
From: <don...@is...> - 2008-11-27 09:44:44
|
Vladimir Tzankov writes: I don't want to waste too much of the time you could be spending actually getting this stuff done, so I'll try to stick to the important part: Am I correct that the plan is that gc will be able to proceed without indefinite delay in all cases? The paragraph below indicates that this is not currently the case: > Currently the GC can suspend threads only in two cases. When thread > allocates heap object or when the thread is in (possibly) blocking > system (foreign) call. For the latter - all such possibly blocking > foreign calls should be found and "marked" (currently - streams and > file system operations are mostly marked). But the current case may not be (I hope is not) the goal. Is the goal, as I hope? That is, (1) all lisp code will be guaranteed to check within a short time (2) foreign code need not be stopped for GC to run Which implies that GC will never be delayed very long. > As you write below - checking whether GC waits on thread to be > suspended is almost zero cost - it is not a problem to add checks in > (bytecode) interpreter if needed. However depending where these > checks are put additional actions may be needed - at least putting > all objects in use to GC visible locations. I was imagining that the state at the end of every byte code instruction was understood by GC, but I suppose you could stop in some place where this was not the case, but where some other process could recognize and fix the problem. You might then view the extra process as an early phase of GC. > As I wrote above the foreign calls that may block should be > "marked" (enclosed in GC safe regions). If pointer in the heap is > passed to foreign code - the heap object should be "pinned" - so GC > (if happens) is not going to move it. This seems to be imposing restrictions on code that can block, whereas I want to instead impose restrictions on sections of code that access data that is accessed by GC. It seems fair to impose requirements on foreign code. I suggest the requirements be: (1) all access to GC sensitive data must be done inside a marked section (2) all marked sections are required to last a short time (in particular, if you want GC not to be delayed more than 1 sec you must ensure that your marked sections last less than 1 sec) Perhaps this is related to your remark about blocking? You should not block in a marked section since it then might violate this rule. (3) of course you must expect that a gc might occur between your marked sections - so you can't save any pointers to GC sensitive data from one marked section to the next. We have to specify what this GC sensitive data is. I presume at least the heap, but also some data on the stack and in registers? In particular, any foreign code that does not access the heap should be able to run in parallel with GC. |
From: Vladimir T. <vtz...@gm...> - 2008-11-28 07:00:53
|
Hi Don, On Nov 27, 2008, at 11:44 AM, Don Cohen wrote: > > Am I correct that the plan is that gc will be able to proceed without > indefinite delay in all cases? yes - correct. > The paragraph below indicates that this is not currently the case: > >> Currently the GC can suspend threads only in two cases. When thread >> allocates heap object or when the thread is in (possibly) blocking >> system (foreign) call. For the latter - all such possibly blocking >> foreign calls should be found and "marked" (currently - streams and >> file system operations are mostly marked). > > But the current case may not be (I hope is not) the goal. > Is the goal, as I hope? That is, > (1) all lisp code will be guaranteed to check within a short time > (2) foreign code need not be stopped for GC to run > Which implies that GC will never be delayed very long. The current state is just first approach for the goal. There are few places (which I know) that require adding checks for GC suspends requests. >> >> As I wrote above the foreign calls that may block should be >> "marked" (enclosed in GC safe regions). If pointer in the heap is >> passed to foreign code - the heap object should be "pinned" - so GC >> (if happens) is not going to move it. > > This seems to be imposing restrictions on code that can block, whereas > I want to instead impose restrictions on sections of code that access > data that is accessed by GC. > > It seems fair to impose requirements on foreign code. > I suggest the requirements be: > (1) all access to GC sensitive data must be done inside a marked > section how does a foreign code knows it is accessing clisp heap (and why care)? (probably I do not understand what you mean) > (2) all marked sections are required to last a short time > (in particular, if you want GC not to be delayed more than 1 sec > you must ensure that your marked sections last less than 1 sec) > Perhaps this is related to your remark about blocking? You should > not block in a marked section since it then might violate this > rule. > (3) of course you must expect that a gc might occur between your > marked sections - so you can't save any pointers to GC sensitive > data from one marked section to the next. We have to specify what > this GC sensitive data is. I presume at least the heap, but also > > some data on the stack and in registers? > In particular, any foreign code that does not access the heap should > be able to run in parallel with GC. I do not understand all of the above but here is what we have and why. Foreign calls that may block do not stop GC of occurring (even if they access heap). These are (at least) all file operations and sockets. We have (use) two options for parameters/results: (1) - never pass pointer to heap object to foreign call - so parameters/results should be copied explicitly (and memory for them allocated of course). (2) - introduce pinned objects - so they stay at the same place in the heap during the blocking foreign call. The first option is expensive for large objects - especially with "bulk i/o" so we use the second - pinned objects - in most of the I/O operation. Before calling into foreign call - the object is pinned and after the return - it is unpinned. Meanwhile GC is not blocked by the calling thread (the thread make itself appear safe for gc until it is in the foreign call). Vladimir |
From: <don...@is...> - 2008-11-28 12:09:34
|
Vladimir Tzankov writes: > > It seems fair to impose requirements on foreign code. > > I suggest the requirements be: > > (1) all access to GC sensitive data must be done inside a marked > > section > how does a foreign code knows it is accessing clisp heap (and why > care)? (probably I do not understand what you mean) The implementers have to provide enough documentation to allow those who write FFs to access lisp data. I want that doc to include these rules. It's not that FF "knows" it is reading lisp data. The programmer knows that he intends to do so. I would argue that code that picks a random memory address and reads the data there without claiming (marking) that it is reading lisp data may indeed be "correct", even if that data is on the lisp heap, as long as the programmer does not expect that data to have any properties related to lisp. The mark I have in mind effectively means to block GC while in the marked code. Perhaps that's not the meaning of the marking you had in mind? > > In particular, any foreign code that does not access the heap should > > be able to run in parallel with GC. I hope it makes more sense if I say that foreign code should only access lisp data from inside a construct like (withoutGC ...) and that GC must wait for all foreign code to be outside of such a construct, and of course, that such a construct cannot be entered while GC is in progress. > I do not understand all of the above but here is what we have and why. > Foreign calls that may block do not stop GC of occurring (even if > they access heap). This seems inadequate, since their heap accesses may not make any sense during GC. > These are (at least) all file operations and sockets. We have (use) > two options for parameters/results: > (1) - never pass pointer to heap object to foreign call - so > parameters/results should be copied explicitly (and memory for them > allocated of course). This would qualify as not accessing lisp data other than the copying, which has to be done in (withoutGC ...). > (2) - introduce pinned objects - so they stay at the same place in > the heap during the blocking foreign call. This is another approach, but it needs another spec. In particular, if it contains pointers to lisp objects then GC at least has to be able to read those pointers to follow them, and the foreign code better not be changing them during GC. > The first option is expensive for large objects - especially with > "bulk i/o" so we use the second - pinned objects - in most of the I/O > operation. In this case I'd argue that the memory affected by the IO has no pointers to lisp objects, so GC does not care about it. Of course, if that memory is allocated as part of an array of bytes, then the array itself is something that GC could move. So pinning that array makes sense. But I'm not convinced that this is a better solution over all than copying the IO buffers into/out of lisp memory during a (withoutGC ...). That copy obviously has a cost, but so does pinning objects. I'm not worried about the cost of actually doing the pin, but rather the cost in added GC complexity and the cost in loss of memory locality that GC is trying to provide. I'm less concerned with IO operations that are part of the lisp implementation (and that you're going to fix) than with foreign code that normal lisp users/programmers will write. Are you planning to allow users to pin lisp objects? If so, I suggest that you still need a spec as mentioned above and some additional mechanism to control interaction with GC. |
From: Hoehle, Joerg-C. <Joe...@t-...> - 2008-11-28 15:06:18
|
Hi, I'm not following the discussion closely at all, and wanted to ensure the following aspect are not neglected: If there will be pinnable objects, is there some concept of nesting? Consider the following scenario, involving FFI callbacks: - pin object - call FFI function - FFI function might access heap via pinned object - FFI function calls back into Lisp - Lisp is entered, so from a dynamic point of view, no execution currently occurs in foreign land! - GC might occur, what happens? - Lisp callback exits back to foreign world - foreign code still legally allowed to access heap from pinned object - foreign code exits back to Lisp - Lisp object is unpinned. Don Cohen wrote: >The mark I have in mind effectively means to block GC while in the >marked code. The above scenario is precisely meant to challenge this statement. Within the callback, you're back in Lisp, obviously not "while in the marked code" (from a program counter point of view). The above scenario is not something devious. It is the rule for all event based GUI applications driven by a foreign loop. Another issue I can think of: Please do not assume there will be few pinned objects. Once the thing is available, people will start to use it, and automated wrappers (think CFFI, Verrazano etc.) may cause it to be used a lot and automatically. Others may want to pin objects for a very long time (e.g. a SSH byte buffer /vector useable from both Lisp and C). Do you have plans for an automated unpinning based on dynamic execution (a la unwind-protect, so unpinning is guaranteed as soon as the stack is unwound) or do you plan to provide it via distinct pin/unpin calls, so there could be pin-leaks, i.e. objects where unpinning is forgotten, e.g. by non-local transfer of control in the case of exceptions? Regards, Jorg Hohle |
From: Vladimir T. <vtz...@gm...> - 2008-11-28 16:04:18
|
Hi, On Nov 28, 2008, at 5:07 PM, Hoehle, Joerg-Cyril wrote: > > If there will be pinnable objects, is there some concept of nesting? Yes currently pinned objects can be nested. Every thread keeps a list of pinned object. The list itself is on the C stack and there are UNWIND-PROTECT frames on the lisp stack for automatic unpinning when stacks are unwound. > Consider the following scenario, involving FFI callbacks: > > - pin object > - call FFI function > - FFI function might access heap via pinned object > - FFI function calls back into Lisp > - Lisp is entered, so from a dynamic point of view, no execution > currently occurs in foreign land! > - GC might occur, what happens? > - Lisp callback exits back to foreign world > - foreign code still legally allowed to access heap from pinned object > - foreign code exits back to Lisp > - Lisp object is unpinned. I have still not looked on the FFI for thread support - there are at least few issues that have to be discussed (for example callback from an unknown to clisp thread). The nesting of objects can be seen in MT SIGINT handling (when the execution is in system i/o for example). On SIGINT a thread is interrupted with (CERROR "CTRL-C pressed.."). At that time the object involved in the i/o is still pinned. At another SIGINT we will have (probably) another pinned object and so on. That's the way I tested the unpinning during stacks unwinding. > Don Cohen wrote: >> The mark I have in mind effectively means to block GC while in the >> marked code. > The above scenario is precisely meant to challenge this statement. > Within the callback, you're back in Lisp, obviously not "while in the > marked code" (from a program counter point of view). > > The above scenario is not something devious. It is the rule for all > event based GUI applications driven by a foreign loop. > > > Another issue I can think of: Please do not assume there will be few > pinned objects. Once the thing is available, people will start to use > it, and automated wrappers (think CFFI, Verrazano etc.) may cause > it to > be used a lot and automatically. > > Others may want to pin objects for a very long time (e.g. a SSH byte > buffer /vector useable from both Lisp and C). I am not sure that we should provide user functions for pinning/ unpinning of heap objects. May be a form WITH-PINNED-OBJECT (like WITH-OPEN-FILE) and only for some kind of vectors? > > Do you have plans for an automated unpinning based on dynamic > execution > (a la unwind-protect, so unpinning is guaranteed as soon as the > stack is > unwound) or do you plan to provide it via distinct pin/unpin calls, so > there could be pin-leaks, i.e. objects where unpinning is forgotten, > e.g. by non-local transfer of control in the case of exceptions? It is available. Vladimir |
From: Sam S. <sd...@gn...> - 2008-11-28 16:19:30
|
Vladimir Tzankov wrote: > On Nov 28, 2008, at 5:07 PM, Hoehle, Joerg-Cyril wrote: >> Consider the following scenario, involving FFI callbacks: >> >> - pin object >> - call FFI function >> - FFI function might access heap via pinned object >> - FFI function calls back into Lisp >> - Lisp is entered, so from a dynamic point of view, no execution >> currently occurs in foreign land! >> - GC might occur, what happens? >> - Lisp callback exits back to foreign world >> - foreign code still legally allowed to access heap from pinned object >> - foreign code exits back to Lisp >> - Lisp object is unpinned. > > I have still not looked on the FFI for thread support - there are at > least few issues that have to be discussed (for example callback from > an unknown to clisp thread). > The nesting of objects can be seen in MT SIGINT handling (when the > execution is in system i/o for example). On SIGINT a thread is > interrupted with (CERROR "CTRL-C pressed.."). At that time the object > involved in the i/o is still pinned. At another SIGINT we will have > (probably) another pinned object and so on. > > That's the way I tested the unpinning during stacks unwinding. One place you might take a look at is modiles/bindings/glibc/test.tst which has a much reviled by Jorg section "signal handling examples". |
From: <don...@is...> - 2008-11-28 22:01:14
|
Hoehle, Joerg-Cyril writes: > If there will be pinnable objects, is there some concept of nesting? > Consider the following scenario, involving FFI callbacks: > - pin object > - call FFI function > - FFI function might access heap via pinned object > - FFI function calls back into Lisp ... > Don Cohen wrote: > >The mark I have in mind effectively means to block GC while in the > >marked code. > The above scenario is precisely meant to challenge this statement. > Within the callback, you're back in Lisp, obviously not "while in the > marked code" (from a program counter point of view). I had in mind that the (withoutGC ...) construct was not related to program counter, but rather some state of the thread. I would argue that it makes no sense to call back to lisp while preventing GC, unless you're sure that your lisp call won't require GC. This should either be considered a programmer error (results undefined) or, if you want to be nice about it, generate an error if and when the callback tries to GC, or even generate an error on the callback if the caller is in a (withoutGC ...). > The above scenario is not something devious. It is the rule for all > event based GUI applications driven by a foreign loop. I don't understand why the foreign look needs to pin any lisp data, or for that matter why it needs to block GC, other than when it is copying data between lisp and foreign space, which should NOT include any lisp callbacks. > Another issue I can think of: Please do not assume there will be few > pinned objects. Once the thing is available, people will start to use > it, and automated wrappers (think CFFI, Verrazano etc.) may cause it to > be used a lot and automatically. > Others may want to pin objects for a very long time (e.g. a SSH byte > buffer /vector useable from both Lisp and C). I agree with both above and view both as reasons that pinning is undesirable. What's the evidence that the other solution is worse, namely (withoutGC ... copy between lisp data and foreign data ...) ? Isn't a memory copy pretty cheap? I'm thinking in terms of copying byte vectors, typically on the order of a page or a few pages. Vladimir Tzankov writes: > I am not sure that we should provide user functions for pinning/ > unpinning of heap objects. > May be a form WITH-PINNED-OBJECT (like WITH-OPEN-FILE) and only for > some kind of vectors? If this is only internally used and not available to lisp programmers then I see only minor problems. It still seems to complicate GC and it still prevents GC from localizing data. Also I'd hope to find some doc in the code to clarify what is required in order to help maintainers satisfy those requirements, and other c programmers to write new code using similar facilities. It all seems a lot of trouble for what I would expect to be rather small benefit. |