From: Martin C. <cra...@co...> - 2009-02-02 16:39:49
Attachments:
bad-sbcl-200801.diff
|
I apologize for giving feedback so late, there's a bit of a backlog here. I didn't find any discussion of the following patch. It hurts performance badly, a straight 4% for my employer's overall mix. Given that we only do a certain amount of foreign calls this seems to slow down actual calls by an order of magnitude. On top of that, it conses. That call to cons in invoke-with-saved-fp-and-pc is a real killer. I hope I didn't overlook a previous discussion, but can we re-hash why this is required? Since it is labeled as a hack, I assume that a better solution is possible but hasn't been implemented yet? Maybe we can introduce a switch so that the extended marking is only done when debug == 3? revision 1.4135 date: 2008/10/20 12:00:53; author: melisgl; state: Exp; lines: +1 -1 1.0.21.32: hack around truncated backtraces with lost frames On :C-STACK-IS-THE-CONTROL-STACK platforms when calling an alien function stash the current frame pointer and return address away so that no matter how the alien stack frames are laid out the debugger can find its way back to lisp land. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ FreeBSD - where you want to go, today. http://www.freebsd.org/ |
From: Gábor M. <me...@re...> - 2009-02-02 17:03:53
|
On Lunes 02 Febrero 2009, Martin Cracauer wrote: > I apologize for giving feedback so late, there's a bit of a backlog > here. > > I didn't find any discussion of the following patch. It hurts > performance badly, a straight 4% for my employer's overall mix. > Given that we only do a certain amount of foreign calls this seems to > slow down actual calls by an order of magnitude. On top of that, it > conses. That call to cons in invoke-with-saved-fp-and-pc is a real > killer. Interesting, do you have code that shows this difference? > > I hope I didn't overlook a previous discussion, but can we re-hash > why this is required? Since it is labeled as a hack, I assume that a > better solution is possible but hasn't been implemented yet? Because the frame parsing logic sometimes gets stuck in alien frames and you get a truncated backtrace. > Maybe we can introduce a switch so that the extended marking is only > done when debug == 3? If it turns out to be a real bottleneck, then it would make sense to do that or switch it off when speed is greater than debug. Cheers, Gabor |
From: Gábor M. <me...@re...> - 2009-02-02 17:11:44
|
On Lunes 02 Febrero 2009, Martin Cracauer wrote: > I apologize for giving feedback so late, there's a bit of a backlog > here. > > I didn't find any discussion of the following patch. It hurts > performance badly, a straight 4% for my employer's overall mix. > Given that we only do a certain amount of foreign calls this seems to > slow down actual calls by an order of magnitude. On top of that, it > conses. That call to cons in invoke-with-saved-fp-and-pc is a real > killer. I have measured the difference on getpid to be 30%. That's really much, I'll have a look at how to speed it up. But an order of magnitude? |
From: Martin C. <cra...@co...> - 2009-02-02 17:46:26
|
Gbor Melis wrote on Mon, Feb 02, 2009 at 06:11:26PM +0100: > On Lunes 02 Febrero 2009, Martin Cracauer wrote: > > I apologize for giving feedback so late, there's a bit of a backlog > > here. > > > > I didn't find any discussion of the following patch. It hurts > > performance badly, a straight 4% for my employer's overall mix. > > Given that we only do a certain amount of foreign calls this seems to > > slow down actual calls by an order of magnitude. On top of that, it > > conses. That call to cons in invoke-with-saved-fp-and-pc is a real > > killer. > > I have measured the difference on getpid to be 30%. That's really much, > I'll have a look at how to speed it up. But an order of magnitude? Sorry, that was just a knee-jerk assessment. It might be in the 30% range now that I think about it. Do you see an increase in consing from your isolated experiment? I think the cons cell is supposed to be stack-allocated, but apparently that doesn't happen. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ FreeBSD - where you want to go, today. http://www.freebsd.org/ |
From: Gábor M. <me...@re...> - 2009-02-02 17:53:42
|
On Lunes 02 Febrero 2009, Martin Cracauer wrote: > I apologize for giving feedback so late, there's a bit of a backlog > here. > > I didn't find any discussion of the following patch. It hurts > performance badly, a straight 4% for my employer's overall mix. > Given that we only do a certain amount of foreign calls this seems to > slow down actual calls by an order of magnitude. On top of that, it > conses. That call to cons in invoke-with-saved-fp-and-pc is a real > killer. > > I hope I didn't overlook a previous discussion, but can we re-hash > why this is required? Since it is labeled as a hack, I assume that a > better solution is possible but hasn't been implemented yet? > > Maybe we can introduce a switch so that the extended marking is only > done when debug == 3? It's actually not the consing but accessing and binding that special what hurts. The patch below makes this machinery conditional on (<= speed debug) which seems reasonable to me from a correctness first point of view. diff --git a/src/compiler/aliencomp.lisp b/src/compiler/aliencomp.lisp index 3635cb4..2f34fc5 100644 --- a/src/compiler/aliencomp.lisp +++ b/src/compiler/aliencomp.lisp @@ -640,7 +640,7 @@ (int-sap (get-lisp-obj-address (car x))) fp) (return (values (car x) (cdr x))))))) -(deftransform alien-funcall ((function &rest args) * * :important t) +(deftransform alien-funcall ((function &rest args) * * :node node :important t) (let ((type (lvar-type function))) (unless (alien-type-type-p type) (give-up-ir1-transform "can't tell function type at compile time")) @@ -698,7 +698,8 @@ ;; to it later regardless of how the foreign stack looks ;; like. #!+:c-stack-is-control-stack - (setf body `(invoke-with-saved-fp-and-pc (lambda () ,body))) + (when (policy node (<= speed debug)) + (setf body `(invoke-with-saved-fp-and-pc (lambda () ,body)))) (/noshow "returning from DEFTRANSFORM ALIEN-FUNCALL" (params) body) `(lambda (function ,@(params)) ,body))))))) |
From: Martin C. <cra...@co...> - 2009-02-02 18:08:05
|
Gbor Melis wrote on Mon, Feb 02, 2009 at 06:53:25PM +0100: > On Lunes 02 Febrero 2009, Martin Cracauer wrote: > > I apologize for giving feedback so late, there's a bit of a backlog > > here. > > > > I didn't find any discussion of the following patch. It hurts > > performance badly, a straight 4% for my employer's overall mix. > > Given that we only do a certain amount of foreign calls this seems to > > slow down actual calls by an order of magnitude. On top of that, it > > conses. That call to cons in invoke-with-saved-fp-and-pc is a real > > killer. > > > > I hope I didn't overlook a previous discussion, but can we re-hash > > why this is required? Since it is labeled as a hack, I assume that a > > better solution is possible but hasn't been implemented yet? > > > > Maybe we can introduce a switch so that the extended marking is only > > done when debug == 3? > > It's actually not the consing but accessing and binding that special > what > hurts. But that would break if a piece of C code would ever call back into Lisp and the Lisp code spawns a new thread, right? > The patch below makes this machinery conditional on > (<= speed debug) which seems reasonable to me from a correctness > first > point of view. I'll try that but I'm certain it'll do what I want (I "fixed" it by just killing that line manually). I'm still interested why you labeled this as a hack. Can you think of a better solution? Martin > diff --git a/src/compiler/aliencomp.lisp b/src/compiler/aliencomp.lisp > index 3635cb4..2f34fc5 100644 > --- a/src/compiler/aliencomp.lisp > +++ b/src/compiler/aliencomp.lisp > @@ -640,7 +640,7 @@ > (int-sap (get-lisp-obj-address (car x))) fp) > (return (values (car x) (cdr x))))))) > > -(deftransform alien-funcall ((function &rest args) * * :important t) > +(deftransform alien-funcall ((function &rest args) * * :node node :important t) > (let ((type (lvar-type function))) > (unless (alien-type-type-p type) > (give-up-ir1-transform "can't tell function type at compile time")) > @@ -698,7 +698,8 @@ > ;; to it later regardless of how the foreign stack looks > ;; like. > #!+:c-stack-is-control-stack > - (setf body `(invoke-with-saved-fp-and-pc (lambda () ,body))) > + (when (policy node (<= speed debug)) > + (setf body `(invoke-with-saved-fp-and-pc (lambda () ,body)))) > (/noshow "returning from DEFTRANSFORM ALIEN-FUNCALL" (params) bod y) > `(lambda (function ,@(params)) > ,body))))))) -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ FreeBSD - where you want to go, today. http://www.freebsd.org/ |
From: James Y K. <fo...@fu...> - 2009-02-02 19:44:21
|
On Feb 2, 2009, at 1:08 PM, Martin Cracauer wrote: > I'm still interested why you labeled this as a hack. Can you think of > a better solution? A better solution is to be able to understand the layout of the stack frames made by the C code, so you can backtrace through them normally. Unfortunately, on platforms without a frame pointer, that's non- trivial, as it requires that you be able to interpret the dwarf unwind information stored in all the eh_frame sections of libraries you have loaded. But that is the right thing to do, really. James |
From: Gábor M. <me...@re...> - 2009-02-02 19:43:49
|
On Lunes 02 Febrero 2009, Martin Cracauer wrote: > Gbor Melis wrote on Mon, Feb 02, 2009 at 06:53:25PM +0100: > > On Lunes 02 Febrero 2009, Martin Cracauer wrote: > > > I apologize for giving feedback so late, there's a bit of a > > > backlog here. > > > > > > I didn't find any discussion of the following patch. It hurts > > > performance badly, a straight 4% for my employer's overall mix. > > > Given that we only do a certain amount of foreign calls this > > > seems to slow down actual calls by an order of magnitude. On top > > > of that, it conses. That call to cons in > > > invoke-with-saved-fp-and-pc is a real killer. > > > > > > I hope I didn't overlook a previous discussion, but can we > > > re-hash why this is required? Since it is labeled as a hack, I > > > assume that a better solution is possible but hasn't been > > > implemented yet? > > > > > > Maybe we can introduce a switch so that the extended marking is > > > only done when debug == 3? > > > > It's actually not the consing but accessing and binding that > > special what > > hurts. > > But that would break if a piece of C code would ever call back into > Lisp and the Lisp code spawns a new thread, right? New threads don't inherit specials, if that's what you mean. > > The patch below makes this machinery conditional on > > (<= speed debug) which seems reasonable to me from a correctness > > first > > point of view. > > I'll try that but I'm certain it'll do what I want (I "fixed" it by > just killing that line manually). > > I'm still interested why you labeled this as a hack. Can you think > of a better solution? I labelled it as a hack because it may be possible to do a better job than what x86-call-context does at present. I'm not sure if it can ever be bulletproof. |
From: Martin C. <cra...@co...> - 2009-02-02 20:31:47
|
Gbor Melis wrote on Mon, Feb 02, 2009 at 08:43:30PM +0100: > On Lunes 02 Febrero 2009, Martin Cracauer wrote: > > Gbor Melis wrote on Mon, Feb 02, 2009 at 06:53:25PM +0100: > > > On Lunes 02 Febrero 2009, Martin Cracauer wrote: > > > > I apologize for giving feedback so late, there's a bit of a > > > > backlog here. > > > > > > > > I didn't find any discussion of the following patch. It hurts > > > > performance badly, a straight 4% for my employer's overall mix. > > > > Given that we only do a certain amount of foreign calls this > > > > seems to slow down actual calls by an order of magnitude. On top > > > > of that, it conses. That call to cons in > > > > invoke-with-saved-fp-and-pc is a real killer. > > > > > > > > I hope I didn't overlook a previous discussion, but can we > > > > re-hash why this is required? Since it is labeled as a hack, I > > > > assume that a better solution is possible but hasn't been > > > > implemented yet? > > > > > > > > Maybe we can introduce a switch so that the extended marking is > > > > only done when debug == 3? > > > > > > It's actually not the consing but accessing and binding that > > > special what > > > hurts. > > > > But that would break if a piece of C code would ever call back into > > Lisp and the Lisp code spawns a new thread, right? > > New threads don't inherit specials, if that's what you mean. Yes, so if you create threads (in Lisp or C) with the binding in effect it'll be lost. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ FreeBSD - where you want to go, today. http://www.freebsd.org/ |
From: Gábor M. <me...@re...> - 2009-02-02 20:50:35
|
On Lunes 02 Febrero 2009, Martin Cracauer wrote: > > > > It's actually not the consing but accessing and binding that > > > > special what > > > > hurts. > > > > > > But that would break if a piece of C code would ever call back > > > into Lisp and the Lisp code spawns a new thread, right? > > > > New threads don't inherit specials, if that's what you mean. > > Yes, so if you create threads (in Lisp or C) with the binding in > effect it'll be lost. As it should be. The bindings of *saved-fp-and-pc* belong to the current thread. New threads don't inherit bindings of specials so when one is started it has an empty *saved-fp-and-pc* which matches the state of its control stack. Threads in which it is allowed to call into lisp are always started from lisp. |
From: Martin C. <cra...@co...> - 2009-02-03 20:52:39
|
Thanks, Gbor. I have verified that the patch does solve the problem for fast > debug (as expected). If there are no objections I'd like that committed, I think it's the right thing to do. However, I am still unclear on the question of why this is consing so much. A simple getpid(2) wrapper does not cons. But here is before and after of a more complex task: Evaluation took: 19.800 seconds of real time 19.796991 seconds of total run time (19.301066 user, 0.495925 system) [ Run times consist of 0.649 seconds GC time, and 19.148 seconds non-GC time. ] 99.98% CPU 59,251,648,329 processor cycles 852,574,944 bytes consed [patched] 18.694 seconds of real time 18.688159 seconds of total run time (18.556179 user, 0.131980 system) [ Run times consist of 0.460 seconds GC time, and 18.229 seconds non-GC time. ] 99.97% CPU 55,938,726,639 processor cycles 467,941,168 bytes consed It seems that the dynamic-extend for the cons cell is not coming through for some/many alien calls. Martin Gbor Melis wrote on Mon, Feb 02, 2009 at 06:53:25PM +0100: > > diff --git a/src/compiler/aliencomp.lisp b/src/compiler/aliencomp.lisp > index 3635cb4..2f34fc5 100644 > --- a/src/compiler/aliencomp.lisp > +++ b/src/compiler/aliencomp.lisp > @@ -640,7 +640,7 @@ > (int-sap (get-lisp-obj-address (car x))) fp) > (return (values (car x) (cdr x))))))) > > -(deftransform alien-funcall ((function &rest args) * * :important t) > +(deftransform alien-funcall ((function &rest args) * * :node node :important t) > (let ((type (lvar-type function))) > (unless (alien-type-type-p type) > (give-up-ir1-transform "can't tell function type at compile time")) > @@ -698,7 +698,8 @@ > ;; to it later regardless of how the foreign stack looks > ;; like. > #!+:c-stack-is-control-stack > - (setf body `(invoke-with-saved-fp-and-pc (lambda () ,body))) > + (when (policy node (<= speed debug)) > + (setf body `(invoke-with-saved-fp-and-pc (lambda () ,body)))) > (/noshow "returning from DEFTRANSFORM ALIEN-FUNCALL" (params) body) > `(lambda (function ,@(params)) > ,body))))))) > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > Sbcl-devel mailing list > Sbc...@li... > https://lists.sourceforge.net/lists/listinfo/sbcl-devel -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ FreeBSD - where you want to go, today. http://www.freebsd.org/ |
From: Gábor M. <me...@re...> - 2009-02-04 17:43:59
Attachments:
invoke-with-saved-fp-and-pc.patch
|
On Martes 03 Febrero 2009, Martin Cracauer wrote: > Thanks, Gbor. > > I have verified that the patch does solve the problem for fast > > debug (as expected). If there are no objections I'd like that > committed, I think it's the right thing to do. > > However, I am still unclear on the question of why this is consing so > much. A simple getpid(2) wrapper does not cons. A closer looks reveals that is really the second value returned by %caller-frame-and-pc that's causing the consing. With the attached patch I get: * (time (loop repeat 10000000 do (sb-posix:getpid))) Evaluation took: 0.438 seconds of real time 0.440027 seconds of total run time (0.440027 user, 0.000000 system) 100.46% CPU 947,342,942 processor cycles 0 bytes consed I think this is pretty much indistuinguishable from the orignal. Cheers, Gabor |
From: Martin C. <cra...@co...> - 2009-02-04 23:18:02
|
Gbor Melis wrote on Wed, Feb 04, 2009 at 06:43:41PM +0100: > On Martes 03 Febrero 2009, Martin Cracauer wrote: > > Thanks, Gbor. > > > > I have verified that the patch does solve the problem for fast > > > debug (as expected). If there are no objections I'd like that > > committed, I think it's the right thing to do. > > > > However, I am still unclear on the question of why this is consing so > > much. A simple getpid(2) wrapper does not cons. > > A closer looks reveals that is really the second value returned > by %caller-frame-and-pc that's causing the consing. With the attached > patch I get: > > * (time (loop repeat 10000000 do (sb-posix:getpid))) > > Evaluation took: > 0.438 seconds of real time > 0.440027 seconds of total run time (0.440027 user, 0.000000 system) > 100.46% CPU > 947,342,942 processor cycles > 0 bytes consed > > I think this is pretty much indistuinguishable from the orignal. This fixes the consing but it is still slower: 18.834 seconds of real time 18.829137 seconds of total run time (18.699157 user, 0.129980 system) [ Run times consist of 0.432 seconds GC time, and 18.398 seconds non-GC time. ] 99.97% CPU 56,361,004,605 processor cycles 473,948,080 bytes consed [with previous patch, turn it off when debug < speed] 18.694 seconds of real time 18.688159 seconds of total run time (18.556179 user, 0.131980 system) [ Run times consist of 0.460 seconds GC time, and 18.229 seconds non-GC time. ] 99.97% CPU 55,938,726,639 processor cycles 467,941,168 bytes consed That's 1% slower, but this benchmark is a complex mix of things, only a fraction of it alien-heavy. So (IMHO) even in the updated form this fix should only be active for debug >= speed. I'll be out of email for a couple days. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ FreeBSD - where you want to go, today. http://www.freebsd.org/ |
From: Gábor M. <me...@re...> - 2009-02-05 08:33:39
|
On Jueves 05 Febrero 2009, Martin Cracauer wrote: > Gbor Melis wrote on Wed, Feb 04, 2009 at 06:43:41PM +0100: > > On Martes 03 Febrero 2009, Martin Cracauer wrote: > > > Thanks, Gbor. > > > > > > I have verified that the patch does solve the problem for fast > > > > debug (as expected). If there are no objections I'd like that > > > committed, I think it's the right thing to do. > > > > > > However, I am still unclear on the question of why this is > > > consing so much. A simple getpid(2) wrapper does not cons. > > > > A closer looks reveals that is really the second value returned > > by %caller-frame-and-pc that's causing the consing. With the > > attached patch I get: > > > > * (time (loop repeat 10000000 do (sb-posix:getpid))) > > > > Evaluation took: > > 0.438 seconds of real time > > 0.440027 seconds of total run time (0.440027 user, 0.000000 > > system) 100.46% CPU > > 947,342,942 processor cycles > > 0 bytes consed > > > > I think this is pretty much indistuinguishable from the orignal. > > This fixes the consing but it is still slower: > > 18.834 seconds of real time > 18.829137 seconds of total run time (18.699157 user, 0.129980 > system) [ Run times consist of 0.432 seconds GC time, and 18.398 > seconds non-GC time. ] 99.97% CPU > 56,361,004,605 processor cycles > 473,948,080 bytes consed > > [with previous patch, turn it off when debug < speed] > 18.694 seconds of real time > 18.688159 seconds of total run time (18.556179 user, 0.131980 > system) [ Run times consist of 0.460 seconds GC time, and 18.229 > seconds non-GC time. ] 99.97% CPU > 55,938,726,639 processor cycles > 467,941,168 bytes consed > > That's 1% slower, but this benchmark is a complex mix of things, only > a fraction of it alien-heavy. So (IMHO) even in the updated form > this fix should only be active for debug >= speed. My unscientific microbenchmark (time (loop repeat most-positive-fixnum do (sb-posix:getpid))) shows that it's down to being 6% slower with the previous patch which may indeed explain 1% in a complex benchmark if it makes a lot of calls to aliens that take an insignificant time themselves. Anyway, I'm committing both patches together. > I'll be out of email for a couple days. > > Martin |