From: Martin C. <cra...@co...> - 2010-11-09 18:22:53
|
This slows down compilation of my toy by roughly a factor of 100. Can't tell more exactly, not going to wait for a week :-) Compilation in fast mode, safety == 0. Martin Nikodemus Siivola wrote on Sun, Nov 07, 2010 at 01:14:41AM +0000: > Update of /cvsroot/sbcl/sbcl > In directory sfp-cvsdas-3.v30.ch3.sourceforge.com:/tmp/cvs-serv15230 > > Modified Files: > NEWS version.lisp-expr > Log Message: > 1.0.44.1: more conservative CONCATENATE open-coding > > Don't fully open code for long strings, no matter what policy: > constraint-propagation will go seriously nonlinear. > > Also optimize the open-coded form a bit. Use > > (SETF (AREF .STRING. (TRULY-THE INDEX (+ .POS. <constant>))) <char>) > ...repeat... > (INCF .POS. <constant>) > > instead of > > (SETF (AREF .STRING .POS.) <char>) > (INCF .POS.) > ...repeat... > > . Smaller code, easier on the constraint propagation, and a tiny > bit faster too. > > > Index: NEWS > =================================================================== > RCS file: /cvsroot/sbcl/sbcl/NEWS,v > retrieving revision 1.1849 > retrieving revision 1.1850 > diff -u -d -r1.1849 -r1.1850 > --- NEWS 6 Nov 2010 03:29:03 -0000 1.1849 > +++ NEWS 7 Nov 2010 01:14:39 -0000 1.1850 > @@ -1,4 +1,8 @@ > ;;;; -*- coding: utf-8; fill-column: 78 -*- > +changes relative to sbcl-1.0.44: > + * bug fix: when SPEED > SPACE compiling CONCATENATE 'STRING with constant > + long string arguments slowed the compiler down to a crawl. > + > changes in sbcl-1.0.44 relative to sbcl-1.0.43: > * enhancement: RUN-PROGRAM accepts :EXTERNAL-FORMAT argument to select the > external-format for its :INPUT, :OUTPUT, AND :ERROR :STREAMs. > > Index: version.lisp-expr > =================================================================== > RCS file: /cvsroot/sbcl/sbcl/version.lisp-expr,v > retrieving revision 1.5110 > retrieving revision 1.5111 > diff -u -d -r1.5110 -r1.5111 > --- version.lisp-expr 6 Nov 2010 03:29:03 -0000 1.5110 > +++ version.lisp-expr 7 Nov 2010 01:14:39 -0000 1.5111 > @@ -17,4 +17,4 @@ > ;;; checkins which aren't released. (And occasionally for internal > ;;; versions, especially for internal versions off the main CVS > ;;; branch, it gets hairier, e.g. "0.pre7.14.flaky4.13".) > -"1.0.44" > +"1.0.44.1" > > > ------------------------------------------------------------------------------ > The Next 800 Companies to Lead America's Growth: New Video Whitepaper > David G. Thomson, author of the best-selling book "Blueprint to a > Billion" shares his insights and actions to help propel your > business during the next growth cycle. Listen Now! > http://p.sf.net/sfu/SAP-dev2dev > _______________________________________________ > Sbcl-commits mailing list > Sbc...@li... > https://lists.sourceforge.net/lists/listinfo/sbcl-commits -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ |
From: Nikodemus S. <nik...@ra...> - 2010-11-10 07:44:28
|
On 9 November 2010 20:07, Martin Cracauer <cra...@co...> wrote: > This slows down compilation of my toy by roughly a factor of 100. > Can't tell more exactly, not going to wait for a week :-) Interesting! It had the opposite effect for me -- I tried to compile Screamer with (SPEED 3) and pretty much hung on compiling a DEFUN with (concatenate 'string header <long-constant>) . Is the slowdown all over the place, or is there a specific top-level form where it grinds down? If so, can I see it? Cheers, -- Nikodemus |
From: Martin C. <cra...@co...> - 2010-11-10 21:20:50
|
Nikodemus Siivola wrote on Wed, Nov 10, 2010 at 09:44:20AM +0200: > On 9 November 2010 20:07, Martin Cracauer <cra...@co...> wrote: > > > This slows down compilation of my toy by roughly a factor of 100. > > > Can't tell more exactly, not going to wait for a week :-) > > Interesting! > > It had the opposite effect for me -- I tried to compile Screamer with > (SPEED 3) and pretty much hung on compiling a DEFUN with > > (concatenate 'string header <long-constant>) > > . Is the slowdown all over the place, or is there a specific top-level > form where it grinds down? If so, can I see it? It isn't a specific place I'm afraid. I was hoping for that but it eventually makes it through all files I was consciously observing it at. Chances are it is big functions. The files that stayed longest are our engine-in-a-defun items. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ |
From: Nikodemus S. <nik...@ra...> - 2010-11-12 11:26:02
|
On 10 November 2010 23:20, Martin Cracauer <cra...@co...> wrote: > Nikodemus Siivola wrote on Wed, Nov 10, 2010 at 09:44:20AM +0200: >> On 9 November 2010 20:07, Martin Cracauer <cra...@co...> wrote: >> >> > This slows down compilation of my toy by roughly a factor of 100. >> >> > Can't tell more exactly, not going to wait for a week :-) >> >> Interesting! >> >> It had the opposite effect for me -- I tried to compile Screamer with >> (SPEED 3) and pretty much hung on compiling a DEFUN with >> >> (concatenate 'string header <long-constant>) >> >> . Is the slowdown all over the place, or is there a specific top-level >> form where it grinds down? If so, can I see it? > > It isn't a specific place I'm afraid. I was hoping for that but it > eventually makes it through all files I was consciously observing it > at. > > Chances are it is big functions. The files that stayed longest are > our engine-in-a-defun items. Hm. I just verified that ;; Corresponds roughly to the new CONCATENATE open-coding (time (compile nil `(lambda (dest base) (declare (simple-vector dest) (sb-int:index base)) (macrolet ((copy (n) `(progn ,@(loop for i from 0 below n collect `(setf (aref dest (truly-the sb-int:index (+ base ,i))) #\x))))) (copy 1000) dest)))) indeed compiles faster than ;; corresponding to the old CONCATENATE (time (compile nil `(lambda (dest base) (declare (type (simple-array character (*)) dest) (sb-int:index base)) (macrolet ((copy (n) `(let ((pos base)) (declare (sb-int:index pos)) ,@(loop repeat n collect `(progn (setf (aref dest pos) #\x) (incf pos)))))) (copy 1000) dest)))) so the source of the slowdown is non-obvious to me. I'll write a bit of code to instrument what CONCATENATE is doing, and send it to you. Might also be informative to compile the big files with SB-SPROF profiling and compare the reports for new and old CONCATENATE transforms. (Apropos: this is really a shame, because I actually expected _both_ 1.0.441. and 1.0.44.2 to speed your compiles up a bit...) Cheers, -- Nikodemus |
From: Nikodemus S. <nik...@ra...> - 2010-11-15 11:09:46
Attachments:
concatenate-inst.lisp
|
On 12 November 2010 13:25, Nikodemus Siivola <nik...@ra...> wrote: > so the source of the slowdown is non-obvious to me. I've done some more testing, but haven't still been able to cook up a case that is slower to compile with the new version. > I'll write a bit of code to instrument what CONCATENATE is doing, and > send it to you. > > Might also be informative to compile the big files with SB-SPROF > profiling and compare the reports for new and old CONCATENATE > transforms. So, if you have the time... Load attached concatenate-inst.lisp from an init or build-config file. At the top of files experiencing most notable slowdowns: (eval-when (:compile-toplevel) (sb-sprof:start-profiling)) At the bottom of the same files: (eval-when (:compile-toplevel) (sb-sprof:stop-profiling) (with-open-file (f (format nil "/tmp/sb-sprof-~A.txt" sb-c::*concatenate-version*) :direction :output :if-does-not-exist :create :if-exists :append) (format f "~&FROM: ~S~%" *compile-file-pathname*) (sb-sprof:report :stream f)) (sb-sprof:reset)) 1. Build your toy with 1.0.44.1. The build-time should now be mostly in line with pre 1.0.44.1 -- modulo instrumentation overhead. If it isn't then that's a clue of sorts, I guess... 2. Change the DEFPARAMETER in concatenate-inst.lisp to :NEW, and rebuild your toy with the same SBCL. The build time should now be in line with unpatched 1.0.44.1. Send me /tmp/concatenate-NEW.txt, /tmp/concatenate-CLASSIC.txt, /tmp/sb-sprof-NEW.txt, and /tmp/sb-sprof-CLASSIC.txt. Cheers, -- Nikodemus |
From: Martin C. <cra...@co...> - 2010-11-15 21:56:34
|
Nikodemus Siivola wrote on Mon, Nov 15, 2010 at 01:09:36PM +0200: > On 12 November 2010 13:25, Nikodemus Siivola <nik...@ra...> wrote: > > > so the source of the slowdown is non-obvious to me. > > I've done some more testing, but haven't still been able to cook up a > case that is slower to compile with the new version. > > > I'll write a bit of code to instrument what CONCATENATE is doing, and > > send it to you. > > > > Might also be informative to compile the big files with SB-SPROF > > profiling and compare the reports for new and old CONCATENATE > > transforms. > > So, if you have the time... > > Load attached concatenate-inst.lisp from an init or build-config file. I already tried profiling it but it blew up in several places including heap exhaustion :-) Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ |
From: Martin C. <cra...@co...> - 2010-11-15 23:33:19
|
Nikodemus Siivola wrote on Mon, Nov 15, 2010 at 01:09:36PM +0200: > On 12 November 2010 13:25, Nikodemus Siivola <nik...@ra...> wrote: > > > so the source of the slowdown is non-obvious to me. > > I've done some more testing, but haven't still been able to cook up a > case that is slower to compile with the new version. > > > I'll write a bit of code to instrument what CONCATENATE is doing, and > > send it to you. > > > > Might also be informative to compile the big files with SB-SPROF > > profiling and compare the reports for new and old CONCATENATE > > transforms. > > So, if you have the time... > > Load attached concatenate-inst.lisp from an init or build-config file. The problem is reasonably profiling-resistant. Any of the files that slow down big time did blow up the Lisp with heap exhaustion so far. In normal files, where 1.0.44.2 is only about 50% slower than 1.0.41 the profile is not really clear but in many sub-profiles I see (FLET #:BODY-FUN-[GETHASH3]221) calls where there were none in 1.0.41. The top of that looks like this: 1 0.9 SB-FASL::STRING-CHECK-TABLE [398] 1 0.9 PPRINT-DISPATCH [377] 1 0.9 (LABELS SB-C::MARK-2BLOCK) [371] 1 0.9 SB-FASL::DUMP-NON-IMMEDIATE-OBJECT [360] 1 0.9 SB-C::MAYBE-INSTRUMENT-PROGN-LIKE [13] 1 0.9 5 4.5 SB-IMPL::GETHASH3 [64] 4 3.6 (FLET #:BODY-FUN-[GETHASH3]221) [1] Not that the sample frequency is deficient since these files compile so much faster than the big stuff that it's heavily reduced. I have reduced my sample frequency by a bazillion and hope that it makes it through one of the big files overnight. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ |
From: Nikodemus S. <nik...@ra...> - 2010-11-16 08:49:30
|
On 16 November 2010 01:33, Martin Cracauer <cra...@co...> wrote: > I have reduced my sample frequency by a bazillion and hope that it > makes it through one of the big files overnight. Reducing the number of :MAX-SAMPLES and :MAX-DEPTH should help if it doesn't. I'm guessing that you have some deeply nested inline-expansions there (you dump *INLINE-EXPANSION-LIMIT* pretty high, right?), so :MAX-DEPTH in particular might help. Cheers, -- Nikodemus |
From: Martin C. <cra...@co...> - 2010-11-16 16:01:43
|
Nikodemus Siivola wrote on Tue, Nov 16, 2010 at 10:49:22AM +0200: > On 16 November 2010 01:33, Martin Cracauer <cra...@co...> wrote: > > > I have reduced my sample frequency by a bazillion and hope that it > > makes it through one of the big files overnight. > > Reducing the number of :MAX-SAMPLES and :MAX-DEPTH should help if it doesn't. > > I'm guessing that you have some deeply nested inline-expansions there > (you dump *INLINE-EXPANSION-LIMIT* pretty high, right?), so :MAX-DEPTH > in particular might help. I'm sorry but the thing I started overnight, with sample-interval 0.01 (no typo) also blew up. I'll give it one more try but at some point I'd like to open negotiations about backing out your change. The increase in compile time is so large that debugging it turns into a nasty time sink. :max-samples expands on it's own, so I'll try max-depth. The profiler also obviously doesn't work right here, profiling around the same block that you have a (time ...) around comes up with the profiler taking a lot less samples than expected: Evaluation took: 1.390 seconds of real time 1.388789 seconds of total run time (1.276806 user, 0.111983 system) [ Run times consist of 0.865 seconds GC time, and 0.524 seconds non-GC time. ] 99.93% CPU 775 forms interpreted 925 lambdas converted 4,159,278,000 processor cycles 77,819,440 bytes consed Number of samples: 53 Sample interval: 0.01 seconds Total sampling time: 0.53 seconds Number of cycles: 0 Sampled threads: Or another: Evaluation took: 0.823 seconds of real time 0.822875 seconds of total run time (0.725890 user, 0.096985 system) [ Run times consist of 0.666 seconds GC time, and 0.157 seconds non-GC time. ] 100.00% CPU 245 forms interpreted 191 lambdas converted 2,462,996,664 processor cycles 25,763,632 bytes consed Number of samples: 17 Sample interval: 0.01 seconds Total sampling time: 0.17 seconds Number of cycles: 0 Also, it obviously shouldn't blow up with a sample interval of 0.01 at all. There's something wonky going on. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ |
From: Christophe R. <cs...@ca...> - 2010-11-16 16:16:47
|
Martin Cracauer <cra...@co...> writes: > I'll give it one more try but at some point I'd like to open > negotiations about backing out your change. The increase in compile > time is so large that debugging it turns into a nasty time sink. > :max-samples expands on it's own, so I'll try max-depth. Have we established what routine is taking up so much time? It should be "straightforward" to emulate statistical profiling by simply interrupting the compile, noting where the program is, and resuming; if there's some hugely exciting new bottleneck, chances are that this will find it, no? Is it possible that the new form of the concatenate expansion allows Python to try to do some analysis where previously it was giving up, but it touches some hideous nonlinearity in constraint generation, type expansion or sset? Might this only happen with certain compile settings (e.g. speed 3 or debug 3)? > The profiler also obviously doesn't work right here, profiling around the > same block that you have a (time ...) around comes up with the > profiler taking a lot less samples than expected: > [ Run times consist of 0.865 seconds GC time, and 0.524 seconds non-GC time. ] > Total sampling time: 0.53 seconds > [ Run times consist of 0.666 seconds GC time, and 0.157 seconds non-GC time. ] > Total sampling time: 0.17 seconds The sampling times are within shooting distance of the non-GC times; that doesn't completely surprise me (though I'm not sure that I would have predicted it before hand. Cheers, Christophe |
From: Nikodemus S. <nik...@ra...> - 2010-11-16 17:13:02
|
On 16 November 2010 18:01, Martin Cracauer <cra...@co...> wrote: > I'll give it one more try but at some point I'd like to open > negotiations about backing out your change. The increase in compile > time is so large that debugging it turns into a nasty time sink. > :max-samples expands on it's own, so I'll try max-depth. The sample-vector expands on it's own, but it new samples aren't recorded after samples-trace-count > samples-max-samples. At least should not. Just to check: did you try with :CLASSIC using the concatenate-inst.lisp? I assume that didn't blow up. If it did, then something very funky is going on. I would be deeply surprised if it's the *CONCATENATE-OPEN-CODE-LIMIT* that's responsible instead of the AREF-with-offset/INCF change. Assuming it is the AREF-with-offset/INCF change, and we cannot identify the source of the slowdown you're seeing, I can back out the change. While it gives a small improvement in all my test-cases for both compile-time and performance, I don't consider it critical. Unsurprisingly I would really like to identify the underlying issue, though, since it may well help sorting out some other compiler performance issues as well -- or at least help us understand them. If it is the *CONCATENATE-OPEN-CODE-LIMIT* that is guilty... I'd _really_ like to identify the issue, because adding that is necessary for simple-but-realistic cases to compile without nonlinear explosion in compile-time as length of string-constants increases. Cheers, -- Nikodemus |