Thread: Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better h (Page 2)

sbcl-devel

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Ingvar <in...@he...> - 2007-12-05 19:02:30

James Y Knight writes:
> On Dec 4, 2007, at 8:35 PM, Brian Downing wrote:
[ SNIP ]
> > I would personally really like to see BASE-CHAR/STRING be able to hold
> > iso-8859-1.  I've had several cases where I wanted to parse "plain  
> > ascii
> > with other random binary garbage I didn't care about", and doing that
> > in SBCL as it stands without taking the unicode hit is pretty painful.
> 
> I like that it only holds ASCII, in that it forces the programmer to  
> think, and use strings for what they're meant for: text. If you want  
> to store random bytes, you ought to be using a byte array. And if you  
> want to store text, it's pretty rare that you actually really only  
> want LATIN-1 (or if you do want that, it's pretty rare that you  
> *should* want that)

For me, essentially every single text-processing I've done in the last 20 
years fits quite well inside "Just LATIN-1" and frequently, the source data is 
encoded in Latin-1 anyway.

Saying that, I can certainly see the logic to "BASE-CHAR can only take ASCII" 
and taht covers probably 90% of the times I use strings anyway (for "random 
binary data" I tend to use arrays of (UNSIGNED-BYTE 8)).

//INgvar

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Stefan L. <lan...@gm...> - 2007-11-22 11:02:01

On Donnerstag, 22. November 2007, Bruno Daniel wrote:
> Juho Snellman writes:
> > As for the slowness issue, since everybody seems to be making
> > wild guesses about the reasons without actually doing any
> > profiling, I'll throw in one more.
>
> Here's the result SBCL's statistical profiler shows on my 64 bit
> machine (The statistics are reproducible up to fluctuations of
> about 1%):

Since SUBSEQ shows up very high on the list:
I've improved one or two entries on the Great Computer
Language Shootout (they are gone, alioth lost a few days
data around that time) and on the way encountered that some
sequence functions, including SUBSEQ, can be improved by an
order of magnitude or more for simple array types.

In my experience this is the main reason why direct
ports from Ruby/Python code perform bad under SBCL.

Stefan

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Attila L. <att...@gm...> - 2007-11-22 14:19:30

Attachments: split-sequence.lisp sbcl.diff

hi,

i've optimized it a bit which i think is in par with the python speed
if not faster now. two things are attached to the mail: one is the
optimized split-sequence and the other is the diff to sbcl HEAD. this
latter diff contains the previously posted base-char changes to
read-line and an optimization to string-trim. (the big change in
stream.lisp is only the make-result-string macrolet and the (declare
(type index len index)) type declaration).

i was working with a smaller version of the file to keep it in the
disk cache and avoid swapping. it went down from 9.927 to 2.597.

the final form of the defun:

(defun parse-text (&optional (filename "/home/ati/fake-data.txt"))
  (declare (optimize speed (debug 0))
           (inline split-sequence:split-sequence
                   string-trim))
  (with-open-file (in filename
                      :element-type 'base-char
                      :external-format :ascii
                      :direction :input
                      :if-does-not-exist :error)
    (let ((ht (make-hash-table :test 'equal)))
      (loop
         for line of-type simple-base-string = (read-line in nil)
         while line do
           (let ((fields (split-sequence:split-sequence #\~ line)))
             (when (= (length fields) 3)
               (let ((id (string-trim " " (the simple-base-string
(first fields))))
                     (attribute (string-trim " " (the
simple-base-string (second fields))))
                     (value (string-trim " " (the simple-base-string
(third fields)))))
                 (when (not (gethash id ht))
                   (setf (gethash id ht) (make-hash-table :test 'equal)))
                 (let ((fields-ht (gethash id ht)))
                   (setf (gethash attribute fields-ht) value))))))
      ;;(print (hash-table-count ht))
      (values))))


i've recorded the interesting steps:

CL-USER> (progn
           (sb-ext:gc :full t)
           (time (parse-text)))

*** unoptimized:

Evaluation took:
  9.927 seconds of real time
  9.056566 seconds of user run time
  0.856054 seconds of system run time
  [Run times include 2.352 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  1,633,159,856 bytes consed.




*** after applying the patch for read-line to return base-string's:

difference is below measuring error (but it's important for the
split-sequence inlining)




*** after adding (declare (optimize speed (debug 0)))

difference is below measuring error




*** after optimizing and inlining split-sequence (it even had apply
#'position calls!)

Evaluation took:
  4.916 seconds of real time
  4.504282 seconds of user run time
  0.408026 seconds of system run time
  [Run times include 1.044 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  796,540,592 bytes consed.




*** after adding optimization to string-trim that avoids calling
subseq if there's nothing to be trimmed (note: this is explicitly
allowed by the spec):

Evaluation took:
  3.372 seconds of real time
  2.940184 seconds of user run time
  0.428027 seconds of system run time
  [Run times include 1.048 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  728,696,000 bytes consed.




*** after inlining string-trim:

time difference is below measuring error, but consing went down a little.

Evaluation took:
  3.28 seconds of real time
  2.864179 seconds of user run time
  0.396024 seconds of system run time
  [Run times include 1.024 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  718,985,664 bytes consed.




*** annotating (the simple-base-string ...) for the args of the
string-trim calls:

Evaluation took:
  3.107 seconds of real time
  2.740172 seconds of user run time
  0.360022 seconds of system run time
  [Run times include 1.048 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  718,925,760 bytes consed.





*** at this point the profiling looks like this (sorry, it's
unreadable without fixed font, but plain text mails are preferred):

Rank Name                                                    Self% Cumul% Total%
1    READ-LINE                                               39.34  53.08  39.34
     Callers
       PARSE-TEXT                                                   53.08
     Calls
       SB-INT:FAST-READ-CHAR-REFILL                                  9.00
       SB-KERNEL:%SHRINK-VECTOR                                      3.32
       SB-KERNEL:UB32-BASH-COPY                                      1.42
2    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        24.64  24.64  63.98
3    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII               7.58   9.00  71.56
     Callers
       SB-INT:FAST-READ-CHAR-REFILL                                  9.00
     Calls
       SB-IMPL::REFILL-INPUT-BUFFER                                  1.42
4    PARSE-TEXT                                               6.16  93.84  77.73
     Callers
       NIL                                                          93.84
     Calls
       READ-LINE                                                    53.08
       SB-IMPL::GETHASH3                                             5.21
       SB-VM::GENERIC-+                                              4.74
       MAKE-HASH-TABLE                                               3.32
       SB-KERNEL:UB8-BASH-COPY                                       2.84
       SB-KERNEL:%PUTHASH                                            0.95
5    SB-VM::GENERIC-+                                         4.74   4.74  82.46
6    SB-KERNEL:%SHRINK-VECTOR                                 3.32   3.32  85.78
7    SB-KERNEL:UB8-BASH-COPY                                  2.84   2.84  88.63
8    SB-KERNEL:%SP-STRING-COMPARE                             2.37   2.37  91.00
9    MAKE-HASH-TABLE                                          2.37   3.32  93.36
10   SB-KERNEL:STRING=*                                       1.42   3.79  94.79
11   SB-KERNEL:UB32-BASH-COPY                                 1.42   1.42  96.21
12   (FLET #:BODY-FUN-[GETHASH3]1076)                         0.95   5.21  97.16
13   (FLET SB-IMPL::TRICK)                                    0.47   0.47  97.63
14   SB-IMPL::REFILL-INPUT-BUFFER                             0.47   1.42  98.10
15   SB-IMPL::CEIL-POWER-OF-TWO                               0.47   0.47  98.58
16   (FLET #:CLEANUP-FUN-[%PUTHASH]1409)                      0.47   0.47  99.05
17   SB-KERNEL:%PUTHASH                                       0.47   0.95  99.53
18   SB-IMPL::%MAKE-HASH-TABLE                                0.47   0.47 100.00





*** after rising +ansi-stream-in-buffer-length+ to 4096 and recompiling sbcl

note that consing fall down drastically but due to the problem with
REPLACE that Christoph mentioned it got slower. but hopefully Juho has
something to say here.

Evaluation took:
  3.59 seconds of real time
  3.272205 seconds of user run time
  0.316019 seconds of system run time
  [Run times include 0.528 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  260,260,240 bytes consed.

Rank Name                                                    Self% Cumul% Total%
1    REPLACE                                                 16.42  49.25  16.42
     Callers
       READ-LINE                                                    49.25
       REPLACE                                                       0.37
     Calls
       SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                           10.07
       SB-KERNEL:HAIRY-DATA-VECTOR-SET                               9.70
       SB-KERNEL:HAIRY-DATA-VECTOR-REF                               7.84
       SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                            2.61
       SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                            2.24
       LENGTH                                                        0.37
       REPLACE                                                       0.37
2    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII              11.57  16.42  27.99
     Callers
       SB-INT:FAST-READ-CHAR-REFILL                                 16.42
     Calls
       SB-IMPL::REFILL-INPUT-BUFFER                                  4.85
3    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        10.82  10.82  38.81
4    READ-LINE                                               10.07  76.12  48.88
     Callers
       PARSE-TEXT                                                   76.12
     Calls
       REPLACE                                                      49.25
       SB-INT:FAST-READ-CHAR-REFILL                                 16.42
       SB-KERNEL:%SHRINK-VECTOR                                      0.37
5    SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                      10.07  10.07  58.96
     Callers
       REPLACE                                                      10.07
6    SB-KERNEL:HAIRY-DATA-VECTOR-SET                          9.70   9.70  68.66
7    PARSE-TEXT                                               7.84  95.90  76.49
8    SB-KERNEL:HAIRY-DATA-VECTOR-REF                          7.84   7.84  84.33
9    SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                       2.61   2.61  86.94
10   SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                       2.24   2.24  89.18
11   SB-KERNEL:UB8-BASH-COPY                                  1.87   1.87  91.04
12   SB-KERNEL:STRING=*                                       1.49   2.24  92.54
13   SB-KERNEL:%PUTHASH                                       0.75   1.12  93.28
14   SB-KERNEL:%SP-STRING-COMPARE                             0.75   0.75  94.03




*** after getting rid of the REPLACE problem with an ugly hack in
ansi-stream-read-line that helps propagating the type to REPLACE:

Evaluation took:
  2.597 seconds of real time
  2.356147 seconds of user run time
  0.240015 seconds of system run time
  [Run times include 0.532 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  260,262,768 bytes consed.

Rank Name                                                    Self% Cumul% Total%
1    PARSE-TEXT                                              16.40  96.83  16.40
2    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII              14.81  18.52  31.22
     Callers
       SB-INT:FAST-READ-CHAR-REFILL                                 18.52
     Calls
       SB-IMPL::REFILL-INPUT-BUFFER                                  3.70
3    READ-LINE                                               13.76  61.90  44.97
     Callers
       PARSE-TEXT                                                   61.90
     Calls
       REPLACE                                                      29.63
       SB-INT:FAST-READ-CHAR-REFILL                                 18.52
4    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        13.76  13.76  58.73
     Callers
       SB-UNIX::CALL-WITHOUT-INTERRUPTS                             13.76
5    REPLACE                                                  8.47  29.63  67.20
     Callers
       READ-LINE                                                    29.63
       REPLACE                                                       0.53
     Calls
       SB-KERNEL:HAIRY-DATA-VECTOR-REF                               7.41
       SB-KERNEL:HAIRY-DATA-VECTOR-SET                               5.82
       SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                            3.70
       SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                            3.70
       REPLACE                                                       0.53
       LENGTH                                                        0.53
6    SB-KERNEL:HAIRY-DATA-VECTOR-REF                          7.41   7.41  74.60
7    SB-KERNEL:HAIRY-DATA-VECTOR-SET                          5.82   5.82  80.42
8    SB-KERNEL:UB8-BASH-COPY                                  4.23   4.23  84.66
9    SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                       3.70   3.70  88.36
10   SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                       3.70   3.70  92.06
11   (FLET #:BODY-FUN-[%PUTHASH]1355)                         1.59   2.12  93.65
12   SB-KERNEL:%SP-STRING-COMPARE                             1.06   1.06  94.71
13   (FLET #:BODY-FUN-[GETHASH3]1076)                         1.06   2.65  95.77
14   LENGTH                                                   0.53   0.53  96.30
15   SB-KERNEL::%MAKE-INSTANCE-WITH-LAYOUT                    0.53   0.53  96.83
16   SB-IMPL::REFILL-INPUT-BUFFER                             0.53   3.70  97.35
17   SB-KERNEL:%UNARY-TRUNCATE                                0.53   0.53  97.88
18   SB-THREAD::MAKE-SPINLOCK                                 0.53   0.53  98.41
19   (FLET SB-IMPL::TRICK)                                    0.53   0.53  98.94
20   SB-KERNEL:%SXHASH-SIMPLE-STRING                          0.53   1.06  99.47




*** and an interesting last piece: using (ppcre:split "~" line
:sharedp t), adding a (declare (type simple-base-string
target-string)) to split and avoiding apply #'subseq, it's hardly
slower:

Evaluation took:
  3.028 seconds of real time
  2.772173 seconds of user run time
  0.256016 seconds of system run time
  [Run times include 0.556 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  269,862,768 bytes consed.

-- 
 attila

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: James Y K. <fo...@fu...> - 2007-12-05 18:03:48

On Dec 4, 2007, at 8:35 PM, Brian Downing wrote:

> On Wed, Dec 05, 2007 at 03:25:07AM +0200, Juho Snellman wrote:
>> (There's also the slight problem that some of the speed increase of
>> this hack comes from completely ignoring external formats. This would
>> be ok if base-strings were defined to contain iso-8859-1, but they're
>> actually defined to contain just ascii. Ok for a quick hack, less  
>> good
>> for something included with SBCL. There's always the option of
>> defining base-char to map to 8859-1 instead, but that's the  
>> discussion
>> we've had before. I haven't measured whether this is an effect that
>> actually matters.)
>
> I would personally really like to see BASE-CHAR/STRING be able to hold
> iso-8859-1.  I've had several cases where I wanted to parse "plain  
> ascii
> with other random binary garbage I didn't care about", and doing that
> in SBCL as it stands without taking the unicode hit is pretty painful.

I like that it only holds ASCII, in that it forces the programmer to  
think, and use strings for what they're meant for: text. If you want  
to store random bytes, you ought to be using a byte array. And if you  
want to store text, it's pretty rare that you actually really only  
want LATIN-1 (or if you do want that, it's pretty rare that you  
*should* want that)

James

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Brian D. <bd-...@la...> - 2007-12-05 18:50:21

On Wed, Dec 05, 2007 at 01:03:22PM -0500, James Y Knight wrote:
> I like that it only holds ASCII, in that it forces the programmer to  
> think, and use strings for what they're meant for: text. If you want  
> to store random bytes, you ought to be using a byte array. And if you  
> want to store text, it's pretty rare that you actually really only  
> want LATIN-1 (or if you do want that, it's pretty rare that you  
> *should* want that)

I think this is a decent theoretical design.  Unfortauntely, I think it
pretty much sucks from a practical standpoint.

Let me use my earlier example.  I was parsing RCS files.  RCS is a
mostly-text-based grammar with some binary sections (i.e. the actual
file data).  There is no way to know where the binary parts are without
parsing the text.

For performance and memory-consumption reasons, I could not deal with
using 4x the memory to load an RCS file into an SBCL unicode string.
And since the file can contain high-byte characters, I couldn't load it
into a base-string as they exist today.  Further, I couldn't load the
file a "byte at a time", reading the right kind of element (character
or byte), as that was very slow.

So, I wound up slurping the whole thing into a byte array, as you
recommend above.  I think this sucks.  Here's why:

I had to rewrite my lexer to work on a byte array, not a string.  This
means:

  * The code is very clumsy, and not at all prototypical with what you'd
    expect for string operations.

  * I couldn't use any of the nice built-in CL string functions.

  * I couldn't use any of the nice external CL libraries that deal in
    strings.  (*cough* CL-PPCRE *cough*)

The fact is the whole project would have been a lot easier had I been
able to reason about the file as a string, efficiently.

Frankly, it's very common to have arbitrary binary data with large amounts
of ASCII text.  Making this a pain in the ass to deal with efficiently
in SBCL doesn't seem like a good practical decision.

Add more interesting games, like mmapping a file into memory and sticking
an SBCL base-string header in front of it, and I think you really want
to be able to handle 8-bit strings.

-bcd

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Nathan F. <fr...@gm...> - 2007-11-22 22:34:07

On Nov 22, 2007 6:01 AM, Stefan Lang <lan...@gm...> wrote:
> Since SUBSEQ shows up very high on the list:
> I've improved one or two entries on the Great Computer
> Language Shootout (they are gone, alioth lost a few days
> data around that time) and on the way encountered that some
> sequence functions, including SUBSEQ, can be improved by an
> order of magnitude or more for simple array types.
>
> In my experience this is the main reason why direct
> ports from Ruby/Python code perform bad under SBCL.

For better or for worse, SBCL's philosophy is that the general
sequence functions are, well, general--with all the type dispatching
and such that implies.  If you want speed, then you need to declare
types at the call site.  This is in contrast to many "scripting"
languages, where the library functions are going to be pretty speedy.

Thinking out loud: would it be worthwhile to specialize things like
SUBSEQ--particularly on simple arrays--in the generic library code,
similar to the way Juho optimized array access?  Doing this sort of
optimization for REPLACE/MISMATCH/etc. would probably require too much
space and handling all the keyword arguments in functions like FIND or
POSITION might be tricky, but the benefits to scripting-language-esque
code might be worth it.

Failing that, we might try being a little more careful in optimizing
things like string functions--I know that the core function for string
comparison can be sped up quite a bit by doing type dispatch on the
type of strings prior to actually doing the comparison to eliminate
jumps in the inner loops, for example.

-Nathan

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Nikodemus S. <nik...@ra...> - 2007-11-23 12:08:41

On Nov 22, 2007 10:34 PM, Nathan Froyd <fr...@gm...> wrote:

> Thinking out loud: would it be worthwhile to specialize things like
> SUBSEQ--particularly on simple arrays--in the generic library code,
> similar to the way Juho optimized array access?  Doing this sort of
> optimization for REPLACE/MISMATCH/etc. would probably require too much
> space and handling all the keyword arguments in functions like FIND or
> POSITION might be tricky, but the benefits to scripting-language-esque
> code might be worth it.

If we convert first to %REPLACE &co with required arguments only, then
the keyword handling doesn't take any space...

Cheers,

 -- Nikodemus

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Gary K. <gw...@me...> - 2007-11-25 20:33:31

Hi Attila,

Thanks for all this work. Interesting notes.

On Nov 22, 2007, at 9:19 AM, Attila Lendvai wrote:

> hi,
>
> i've optimized it a bit which i think is in par with the python speed
> if not faster now. two things are attached to the mail: one is the
> optimized split-sequence and the other is the diff to sbcl HEAD. this
> latter diff contains the previously posted base-char changes to
> read-line and an optimization to string-trim. (the big change in
> stream.lisp is only the make-result-string macrolet and the (declare
> (type index len index)) type declaration).
>
> i was working with a smaller version of the file to keep it in the
> disk cache and avoid swapping. it went down from 9.927 to 2.597.
>
> the final form of the defun:
>
> (defun parse-text (&optional (filename "/home/ati/fake-data.txt"))
>   (declare (optimize speed (debug 0))
>            (inline split-sequence:split-sequence
>                    string-trim))
>   (with-open-file (in filename
>                       :element-type 'base-char
>                       :external-format :ascii
>                       :direction :input
>                       :if-does-not-exist :error)
>     (let ((ht (make-hash-table :test 'equal)))
>       (loop
>          for line of-type simple-base-string = (read-line in nil)
>          while line do
>            (let ((fields (split-sequence:split-sequence #\~ line)))
>              (when (= (length fields) 3)
>                (let ((id (string-trim " " (the simple-base-string
> (first fields))))
>                      (attribute (string-trim " " (the
> simple-base-string (second fields))))
>                      (value (string-trim " " (the simple-base-string
> (third fields)))))
>                  (when (not (gethash id ht))
>                    (setf (gethash id ht) (make-hash-table :test  
> 'equal)))
>                  (let ((fields-ht (gethash id ht)))
>                    (setf (gethash attribute fields-ht) value))))))
>       ;;(print (hash-table-count ht))
>       (values))))
>
>
> i've recorded the interesting steps:
>
> CL-USER> (progn
>            (sb-ext:gc :full t)
>            (time (parse-text)))
>
> *** unoptimized:
>
> Evaluation took:
>   9.927 seconds of real time
>   9.056566 seconds of user run time
>   0.856054 seconds of system run time
>   [Run times include 2.352 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   1,633,159,856 bytes consed.
>
>
>
>
> *** after applying the patch for read-line to return base-string's:
>
> difference is below measuring error (but it's important for the
> split-sequence inlining)
>
>
>
>
> *** after adding (declare (optimize speed (debug 0)))
>
> difference is below measuring error
>
>
>
>
> *** after optimizing and inlining split-sequence (it even had apply
> #'position calls!)
>
> Evaluation took:
>   4.916 seconds of real time
>   4.504282 seconds of user run time
>   0.408026 seconds of system run time
>   [Run times include 1.044 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   796,540,592 bytes consed.
>
>
>
>
> *** after adding optimization to string-trim that avoids calling
> subseq if there's nothing to be trimmed (note: this is explicitly
> allowed by the spec):
>
> Evaluation took:
>   3.372 seconds of real time
>   2.940184 seconds of user run time
>   0.428027 seconds of system run time
>   [Run times include 1.048 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   728,696,000 bytes consed.
>
>
>
>
> *** after inlining string-trim:
>
> time difference is below measuring error, but consing went down a  
> little.
>
> Evaluation took:
>   3.28 seconds of real time
>   2.864179 seconds of user run time
>   0.396024 seconds of system run time
>   [Run times include 1.024 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   718,985,664 bytes consed.
>
>
>
>
> *** annotating (the simple-base-string ...) for the args of the
> string-trim calls:
>
> Evaluation took:
>   3.107 seconds of real time
>   2.740172 seconds of user run time
>   0.360022 seconds of system run time
>   [Run times include 1.048 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   718,925,760 bytes consed.
>
>
>
>
>
> *** at this point the profiling looks like this (sorry, it's
> unreadable without fixed font, but plain text mails are preferred):
>
> Rank Name                                                    Self%  
> Cumul% Total%
> 1    READ-LINE                                               39.34   
> 53.08  39.34
>      Callers
>        PARSE-TEXT                                                    
> 53.08
>      Calls
>        SB-INT:FAST-READ-CHAR- 
> REFILL                                  9.00
>        SB-KERNEL:%SHRINK- 
> VECTOR                                      3.32
>        SB-KERNEL:UB32-BASH- 
> COPY                                      1.42
> 2    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        24.64   
> 24.64  63.98
> 3    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII                
> 7.58   9.00  71.56
>      Callers
>        SB-INT:FAST-READ-CHAR- 
> REFILL                                  9.00
>      Calls
>        SB-IMPL::REFILL-INPUT- 
> BUFFER                                  1.42
> 4    PARSE-TEXT                                               6.16   
> 93.84  77.73
>      Callers
>        NIL                                                           
> 93.84
>      Calls
>        READ-LINE                                                     
> 53.08
>        SB- 
> IMPL::GETHASH3                                             5.21
>        SB-VM::GENERIC- 
> +                                              4.74
>        MAKE-HASH- 
> TABLE                                               3.32
>        SB-KERNEL:UB8-BASH- 
> COPY                                       2.84
>        SB-KERNEL:% 
> PUTHASH                                            0.95
> 5    SB-VM::GENERIC-+                                          
> 4.74   4.74  82.46
> 6    SB-KERNEL:%SHRINK-VECTOR                                  
> 3.32   3.32  85.78
> 7    SB-KERNEL:UB8-BASH-COPY                                   
> 2.84   2.84  88.63
> 8    SB-KERNEL:%SP-STRING-COMPARE                              
> 2.37   2.37  91.00
> 9    MAKE-HASH-TABLE                                           
> 2.37   3.32  93.36
> 10   SB-KERNEL:STRING=*                                        
> 1.42   3.79  94.79
> 11   SB-KERNEL:UB32-BASH-COPY                                  
> 1.42   1.42  96.21
> 12   (FLET #:BODY-FUN-[GETHASH3]1076)                          
> 0.95   5.21  97.16
> 13   (FLET SB-IMPL::TRICK)                                     
> 0.47   0.47  97.63
> 14   SB-IMPL::REFILL-INPUT-BUFFER                              
> 0.47   1.42  98.10
> 15   SB-IMPL::CEIL-POWER-OF-TWO                                
> 0.47   0.47  98.58
> 16   (FLET #:CLEANUP-FUN-[%PUTHASH]1409)                       
> 0.47   0.47  99.05
> 17   SB-KERNEL:%PUTHASH                                        
> 0.47   0.95  99.53
> 18   SB-IMPL::%MAKE-HASH-TABLE                                 
> 0.47   0.47 100.00
>
>
>
>
>
> *** after rising +ansi-stream-in-buffer-length+ to 4096 and  
> recompiling sbcl
>
> note that consing fall down drastically but due to the problem with
> REPLACE that Christoph mentioned it got slower. but hopefully Juho has
> something to say here.
>
> Evaluation took:
>   3.59 seconds of real time
>   3.272205 seconds of user run time
>   0.316019 seconds of system run time
>   [Run times include 0.528 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   260,260,240 bytes consed.
>
> Rank Name                                                    Self%  
> Cumul% Total%
> 1    REPLACE                                                 16.42   
> 49.25  16.42
>      Callers
>        READ-LINE                                                     
> 49.25
>         
> REPLACE                                                       0.37
>      Calls
>        SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                            
> 10.07
>        SB-KERNEL:HAIRY-DATA-VECTOR- 
> SET                               9.70
>        SB-KERNEL:HAIRY-DATA-VECTOR- 
> REF                               7.84
>        SB-IMPL::OPTIMIZED-DATA-VECTOR- 
> REF                            2.61
>        SB-IMPL::OPTIMIZED-DATA-VECTOR- 
> REF                            2.24
>         
> LENGTH                                                        0.37
>         
> REPLACE                                                       0.37
> 2    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII              11.57   
> 16.42  27.99
>      Callers
>        SB-INT:FAST-READ-CHAR-REFILL                                  
> 16.42
>      Calls
>        SB-IMPL::REFILL-INPUT- 
> BUFFER                                  4.85
> 3    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        10.82   
> 10.82  38.81
> 4    READ-LINE                                               10.07   
> 76.12  48.88
>      Callers
>        PARSE-TEXT                                                    
> 76.12
>      Calls
>        REPLACE                                                       
> 49.25
>        SB-INT:FAST-READ-CHAR-REFILL                                  
> 16.42
>        SB-KERNEL:%SHRINK- 
> VECTOR                                      0.37
> 5    SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                      10.07   
> 10.07  58.96
>      Callers
>        REPLACE                                                       
> 10.07
> 6    SB-KERNEL:HAIRY-DATA-VECTOR-SET                           
> 9.70   9.70  68.66
> 7    PARSE-TEXT                                               7.84   
> 95.90  76.49
> 8    SB-KERNEL:HAIRY-DATA-VECTOR-REF                           
> 7.84   7.84  84.33
> 9    SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                        
> 2.61   2.61  86.94
> 10   SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                        
> 2.24   2.24  89.18
> 11   SB-KERNEL:UB8-BASH-COPY                                   
> 1.87   1.87  91.04
> 12   SB-KERNEL:STRING=*                                        
> 1.49   2.24  92.54
> 13   SB-KERNEL:%PUTHASH                                        
> 0.75   1.12  93.28
> 14   SB-KERNEL:%SP-STRING-COMPARE                              
> 0.75   0.75  94.03
>
>
>
>
> *** after getting rid of the REPLACE problem with an ugly hack in
> ansi-stream-read-line that helps propagating the type to REPLACE:
>
> Evaluation took:
>   2.597 seconds of real time
>   2.356147 seconds of user run time
>   0.240015 seconds of system run time
>   [Run times include 0.532 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   260,262,768 bytes consed.
>
> Rank Name                                                    Self%  
> Cumul% Total%
> 1    PARSE-TEXT                                              16.40   
> 96.83  16.40
> 2    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII              14.81   
> 18.52  31.22
>      Callers
>        SB-INT:FAST-READ-CHAR-REFILL                                  
> 18.52
>      Calls
>        SB-IMPL::REFILL-INPUT- 
> BUFFER                                  3.70
> 3    READ-LINE                                               13.76   
> 61.90  44.97
>      Callers
>        PARSE-TEXT                                                    
> 61.90
>      Calls
>        REPLACE                                                       
> 29.63
>        SB-INT:FAST-READ-CHAR-REFILL                                  
> 18.52
> 4    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        13.76   
> 13.76  58.73
>      Callers
>        SB-UNIX::CALL-WITHOUT-INTERRUPTS                              
> 13.76
> 5    REPLACE                                                  8.47   
> 29.63  67.20
>      Callers
>        READ-LINE                                                     
> 29.63
>         
> REPLACE                                                       0.53
>      Calls
>        SB-KERNEL:HAIRY-DATA-VECTOR- 
> REF                               7.41
>        SB-KERNEL:HAIRY-DATA-VECTOR- 
> SET                               5.82
>        SB-IMPL::OPTIMIZED-DATA-VECTOR- 
> SET                            3.70
>        SB-IMPL::OPTIMIZED-DATA-VECTOR- 
> REF                            3.70
>         
> REPLACE                                                       0.53
>         
> LENGTH                                                        0.53
> 6    SB-KERNEL:HAIRY-DATA-VECTOR-REF                           
> 7.41   7.41  74.60
> 7    SB-KERNEL:HAIRY-DATA-VECTOR-SET                           
> 5.82   5.82  80.42
> 8    SB-KERNEL:UB8-BASH-COPY                                   
> 4.23   4.23  84.66
> 9    SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                        
> 3.70   3.70  88.36
> 10   SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                        
> 3.70   3.70  92.06
> 11   (FLET #:BODY-FUN-[%PUTHASH]1355)                          
> 1.59   2.12  93.65
> 12   SB-KERNEL:%SP-STRING-COMPARE                              
> 1.06   1.06  94.71
> 13   (FLET #:BODY-FUN-[GETHASH3]1076)                          
> 1.06   2.65  95.77
> 14   LENGTH                                                    
> 0.53   0.53  96.30
> 15   SB-KERNEL::%MAKE-INSTANCE-WITH-LAYOUT                     
> 0.53   0.53  96.83
> 16   SB-IMPL::REFILL-INPUT-BUFFER                              
> 0.53   3.70  97.35
> 17   SB-KERNEL:%UNARY-TRUNCATE                                 
> 0.53   0.53  97.88
> 18   SB-THREAD::MAKE-SPINLOCK                                  
> 0.53   0.53  98.41
> 19   (FLET SB-IMPL::TRICK)                                     
> 0.53   0.53  98.94
> 20   SB-KERNEL:%SXHASH-SIMPLE-STRING                           
> 0.53   1.06  99.47
>
>
>
>
> *** and an interesting last piece: using (ppcre:split "~" line
> :sharedp t), adding a (declare (type simple-base-string
> target-string)) to split and avoiding apply #'subseq, it's hardly
> slower:
>
> Evaluation took:
>   3.028 seconds of real time
>   2.772173 seconds of user run time
>   0.256016 seconds of system run time
>   [Run times include 0.556 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   269,862,768 bytes consed.
>
> -- 
>  attila<split- 
> sequence.lisp><sbcl.diff>--------------------------------------------- 
> ----------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ 
> _______________________________________________
> Sbcl-devel mailing list
> Sbc...@li...
> https://lists.sourceforge.net/lists/listinfo/sbcl-devel

--
Gary Warren King, metabang.com
Cell: (413) 559 8738
Fax: (206) 338-4052
gwkkwg on Skype * garethsan on AIM

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: David J. N. <dav...@gm...> - 2007-11-30 23:03:41

Hi Attila,

Thanks for all of your work!

We're very interested in improving the performance of the program that
Gary posted, and have tried some comparisions with and without your
changes, and using Python.

I used SBCL 1.0.12.7 from the git repo, and the fake-data.tgz file
that Gary provided.

Here's what I found:

1. Using the code as Gary provided:

Evaluation took:
  22.975 seconds of real time
  20.319805 seconds of user run time
  2.187422 seconds of system run time
  [Run times include 7.712 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  4,192,713,792 bytes consed.


2. Applying your sbcl.diff, using your split-sequence, and using your
version of parse-text, I got:

Evaluation took:
  12.328 seconds of real time
  11.457298 seconds of user run time
  0.800819 seconds of system run time
  [Run times include 2.122 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  911,974,696 bytes consed

3. Python: 4.188586 seconds


I'm wondering if you or anyone else had any thoughts on the following
questions:

a. While this is a /dramatic/ improvement:
   - It's still much slower than Python, and you mentioned that your
   results were comparable to Python.  Any thoughts why?  Did you get
   the same results on Gary's fake-data.tgz?

  - The reduction in consing in the example, was dramatic, but didn't
    compare to your results.  Again, any thoughts why?  Did you get
    the same results on Gary's fake-data.tgz?

b. If I run (sb-ext:gc :full t) after each run, I can repeatedly call
PARSE-TEXT with no issues, if I don't make the gc call, I get dropped
into the debugger, is this to be expected?

c. Juho had mentioned that tuning the garbage collector was discussed
on this list, would this address point b., and could someone point me
to the appropriate post?

I found this post
http://sourceforge.net/mailarchive/message.php?msg_id=l37iskysld.fsf%40kyle.netcamp.se
but am not sure how to directly apply the discussion.

d. You wrote:
> >*** and an interesting last piece: using (ppcre:split "~" line
> >:sharedp t), adding a (declare (type simple-base-string
> >target-string)) to split and avoiding apply #'subseq, it's hardly
> >slower: 
To clarify, did you put
(declare (type simple-base-string target-string))
in api.lisp?

Could you explain how you: "avoiding apply #'subseq"?

Again, many thanks for your work on this, and any answers you can
provide!

Cheers,
David

On Sun, Nov 25, 2007 at 03:33:17PM -0500, Gary King wrote:
> Hi Attila,
> 
> Thanks for all this work. Interesting notes.
> 
> On Nov 22, 2007, at 9:19 AM, Attila Lendvai wrote:
> 
> >hi,
> >
> >i've optimized it a bit which i think is in par with the python speed
> >if not faster now. two things are attached to the mail: one is the
> >optimized split-sequence and the other is the diff to sbcl HEAD. this
> >latter diff contains the previously posted base-char changes to
> >read-line and an optimization to string-trim. (the big change in
> >stream.lisp is only the make-result-string macrolet and the (declare
> >(type index len index)) type declaration).
> >
> >i was working with a smaller version of the file to keep it in the
> >disk cache and avoid swapping. it went down from 9.927 to 2.597.
> >
> >the final form of the defun:
> >
> >(defun parse-text (&optional (filename "/home/ati/fake-data.txt"))
> >  (declare (optimize speed (debug 0))
> >           (inline split-sequence:split-sequence
> >                   string-trim))
> >  (with-open-file (in filename
> >                      :element-type 'base-char
> >                      :external-format :ascii
> >                      :direction :input
> >                      :if-does-not-exist :error)
> >    (let ((ht (make-hash-table :test 'equal)))
> >      (loop
> >         for line of-type simple-base-string = (read-line in nil)
> >         while line do
> >           (let ((fields (split-sequence:split-sequence #\~ line)))
> >             (when (= (length fields) 3)
> >               (let ((id (string-trim " " (the simple-base-string
> >(first fields))))
> >                     (attribute (string-trim " " (the
> >simple-base-string (second fields))))
> >                     (value (string-trim " " (the simple-base-string
> >(third fields)))))
> >                 (when (not (gethash id ht))
> >                   (setf (gethash id ht) (make-hash-table :test  
> >'equal)))
> >                 (let ((fields-ht (gethash id ht)))
> >                   (setf (gethash attribute fields-ht) value))))))
> >      ;;(print (hash-table-count ht))
> >      (values))))
> >
> >
> >i've recorded the interesting steps:
> >
> >CL-USER> (progn
> >           (sb-ext:gc :full t)
> >           (time (parse-text)))
> >
> >*** unoptimized:
> >
> >Evaluation took:
> >  9.927 seconds of real time
> >  9.056566 seconds of user run time
> >  0.856054 seconds of system run time
> >  [Run times include 2.352 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  1,633,159,856 bytes consed.
> >
> >
> >
> >
> >*** after applying the patch for read-line to return base-string's:
> >
> >difference is below measuring error (but it's important for the
> >split-sequence inlining)
> >
> >
> >
> >
> >*** after adding (declare (optimize speed (debug 0)))
> >
> >difference is below measuring error
> >
> >
> >
> >
> >*** after optimizing and inlining split-sequence (it even had apply
> >#'position calls!)
> >
> >Evaluation took:
> >  4.916 seconds of real time
> >  4.504282 seconds of user run time
> >  0.408026 seconds of system run time
> >  [Run times include 1.044 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  796,540,592 bytes consed.
> >
> >
> >
> >
> >*** after adding optimization to string-trim that avoids calling
> >subseq if there's nothing to be trimmed (note: this is explicitly
> >allowed by the spec):
> >
> >Evaluation took:
> >  3.372 seconds of real time
> >  2.940184 seconds of user run time
> >  0.428027 seconds of system run time
> >  [Run times include 1.048 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  728,696,000 bytes consed.
> >
> >
> >
> >
> >*** after inlining string-trim:
> >
> >time difference is below measuring error, but consing went down a  
> >little.
> >
> >Evaluation took:
> >  3.28 seconds of real time
> >  2.864179 seconds of user run time
> >  0.396024 seconds of system run time
> >  [Run times include 1.024 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  718,985,664 bytes consed.
> >
> >
> >
> >
> >*** annotating (the simple-base-string ...) for the args of the
> >string-trim calls:
> >
> >Evaluation took:
> >  3.107 seconds of real time
> >  2.740172 seconds of user run time
> >  0.360022 seconds of system run time
> >  [Run times include 1.048 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  718,925,760 bytes consed.
> >
> >
> >
> >
> >
> >*** at this point the profiling looks like this (sorry, it's
> >unreadable without fixed font, but plain text mails are preferred):
> >
> >Rank Name                                                    Self%  
> >Cumul% Total%
> >1    READ-LINE                                               39.34   
> >53.08  39.34
> >     Callers
> >       PARSE-TEXT                                                    
> >53.08
> >     Calls
> >       SB-INT:FAST-READ-CHAR- 
> >REFILL                                  9.00
> >       SB-KERNEL:%SHRINK- 
> >VECTOR                                      3.32
> >       SB-KERNEL:UB32-BASH- 
> >COPY                                      1.42
> >2    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        24.64   
> >24.64  63.98
> >3    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII                
> >7.58   9.00  71.56
> >     Callers
> >       SB-INT:FAST-READ-CHAR- 
> >REFILL                                  9.00
> >     Calls
> >       SB-IMPL::REFILL-INPUT- 
> >BUFFER                                  1.42
> >4    PARSE-TEXT                                               6.16   
> >93.84  77.73
> >     Callers
> >       NIL                                                           
> >93.84
> >     Calls
> >       READ-LINE                                                     
> >53.08
> >       SB- 
> >IMPL::GETHASH3                                             5.21
> >       SB-VM::GENERIC- 
> >+                                              4.74
> >       MAKE-HASH- 
> >TABLE                                               3.32
> >       SB-KERNEL:UB8-BASH- 
> >COPY                                       2.84
> >       SB-KERNEL:% 
> >PUTHASH                                            0.95
> >5    SB-VM::GENERIC-+                                          
> >4.74   4.74  82.46
> >6    SB-KERNEL:%SHRINK-VECTOR                                  
> >3.32   3.32  85.78
> >7    SB-KERNEL:UB8-BASH-COPY                                   
> >2.84   2.84  88.63
> >8    SB-KERNEL:%SP-STRING-COMPARE                              
> >2.37   2.37  91.00
> >9    MAKE-HASH-TABLE                                           
> >2.37   3.32  93.36
> >10   SB-KERNEL:STRING=*                                        
> >1.42   3.79  94.79
> >11   SB-KERNEL:UB32-BASH-COPY                                  
> >1.42   1.42  96.21
> >12   (FLET #:BODY-FUN-[GETHASH3]1076)                          
> >0.95   5.21  97.16
> >13   (FLET SB-IMPL::TRICK)                                     
> >0.47   0.47  97.63
> >14   SB-IMPL::REFILL-INPUT-BUFFER                              
> >0.47   1.42  98.10
> >15   SB-IMPL::CEIL-POWER-OF-TWO                                
> >0.47   0.47  98.58
> >16   (FLET #:CLEANUP-FUN-[%PUTHASH]1409)                       
> >0.47   0.47  99.05
> >17   SB-KERNEL:%PUTHASH                                        
> >0.47   0.95  99.53
> >18   SB-IMPL::%MAKE-HASH-TABLE                                 
> >0.47   0.47 100.00
> >
> >
> >
> >
> >
> >*** after rising +ansi-stream-in-buffer-length+ to 4096 and  
> >recompiling sbcl
> >
> >note that consing fall down drastically but due to the problem with
> >REPLACE that Christoph mentioned it got slower. but hopefully Juho has
> >something to say here.
> >
> >Evaluation took:
> >  3.59 seconds of real time
> >  3.272205 seconds of user run time
> >  0.316019 seconds of system run time
> >  [Run times include 0.528 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  260,260,240 bytes consed.
> >
> >Rank Name                                                    Self%  
> >Cumul% Total%
> >1    REPLACE                                                 16.42   
> >49.25  16.42
> >     Callers
> >       READ-LINE                                                     
> >49.25
> >        
> >REPLACE                                                       0.37
> >     Calls
> >       SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                            
> >10.07
> >       SB-KERNEL:HAIRY-DATA-VECTOR- 
> >SET                               9.70
> >       SB-KERNEL:HAIRY-DATA-VECTOR- 
> >REF                               7.84
> >       SB-IMPL::OPTIMIZED-DATA-VECTOR- 
> >REF                            2.61
> >       SB-IMPL::OPTIMIZED-DATA-VECTOR- 
> >REF                            2.24
> >        
> >LENGTH                                                        0.37
> >        
> >REPLACE                                                       0.37
> >2    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII              11.57   
> >16.42  27.99
> >     Callers
> >       SB-INT:FAST-READ-CHAR-REFILL                                  
> >16.42
> >     Calls
> >       SB-IMPL::REFILL-INPUT- 
> >BUFFER                                  4.85
> >3    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        10.82   
> >10.82  38.81
> >4    READ-LINE                                               10.07   
> >76.12  48.88
> >     Callers
> >       PARSE-TEXT                                                    
> >76.12
> >     Calls
> >       REPLACE                                                       
> >49.25
> >       SB-INT:FAST-READ-CHAR-REFILL                                  
> >16.42
> >       SB-KERNEL:%SHRINK- 
> >VECTOR                                      0.37
> >5    SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                      10.07   
> >10.07  58.96
> >     Callers
> >       REPLACE                                                       
> >10.07
> >6    SB-KERNEL:HAIRY-DATA-VECTOR-SET                           
> >9.70   9.70  68.66
> >7    PARSE-TEXT                                               7.84   
> >95.90  76.49
> >8    SB-KERNEL:HAIRY-DATA-VECTOR-REF                           
> >7.84   7.84  84.33
> >9    SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                        
> >2.61   2.61  86.94
> >10   SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                        
> >2.24   2.24  89.18
> >11   SB-KERNEL:UB8-BASH-COPY                                   
> >1.87   1.87  91.04
> >12   SB-KERNEL:STRING=*                                        
> >1.49   2.24  92.54
> >13   SB-KERNEL:%PUTHASH                                        
> >0.75   1.12  93.28
> >14   SB-KERNEL:%SP-STRING-COMPARE                              
> >0.75   0.75  94.03
> >
> >
> >
> >
> >*** after getting rid of the REPLACE problem with an ugly hack in
> >ansi-stream-read-line that helps propagating the type to REPLACE:
> >
> >Evaluation took:
> >  2.597 seconds of real time
> >  2.356147 seconds of user run time
> >  0.240015 seconds of system run time
> >  [Run times include 0.532 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  260,262,768 bytes consed.
> >
> >Rank Name                                                    Self%  
> >Cumul% Total%
> >1    PARSE-TEXT                                              16.40   
> >96.83  16.40
> >2    SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII              14.81   
> >18.52  31.22
> >     Callers
> >       SB-INT:FAST-READ-CHAR-REFILL                                  
> >18.52
> >     Calls
> >       SB-IMPL::REFILL-INPUT- 
> >BUFFER                                  3.70
> >3    READ-LINE                                               13.76   
> >61.90  44.97
> >     Callers
> >       PARSE-TEXT                                                    
> >61.90
> >     Calls
> >       REPLACE                                                       
> >29.63
> >       SB-INT:FAST-READ-CHAR-REFILL                                  
> >18.52
> >4    (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24)        13.76   
> >13.76  58.73
> >     Callers
> >       SB-UNIX::CALL-WITHOUT-INTERRUPTS                              
> >13.76
> >5    REPLACE                                                  8.47   
> >29.63  67.20
> >     Callers
> >       READ-LINE                                                     
> >29.63
> >        
> >REPLACE                                                       0.53
> >     Calls
> >       SB-KERNEL:HAIRY-DATA-VECTOR- 
> >REF                               7.41
> >       SB-KERNEL:HAIRY-DATA-VECTOR- 
> >SET                               5.82
> >       SB-IMPL::OPTIMIZED-DATA-VECTOR- 
> >SET                            3.70
> >       SB-IMPL::OPTIMIZED-DATA-VECTOR- 
> >REF                            3.70
> >        
> >REPLACE                                                       0.53
> >        
> >LENGTH                                                        0.53
> >6    SB-KERNEL:HAIRY-DATA-VECTOR-REF                           
> >7.41   7.41  74.60
> >7    SB-KERNEL:HAIRY-DATA-VECTOR-SET                           
> >5.82   5.82  80.42
> >8    SB-KERNEL:UB8-BASH-COPY                                   
> >4.23   4.23  84.66
> >9    SB-IMPL::OPTIMIZED-DATA-VECTOR-REF                        
> >3.70   3.70  88.36
> >10   SB-IMPL::OPTIMIZED-DATA-VECTOR-SET                        
> >3.70   3.70  92.06
> >11   (FLET #:BODY-FUN-[%PUTHASH]1355)                          
> >1.59   2.12  93.65
> >12   SB-KERNEL:%SP-STRING-COMPARE                              
> >1.06   1.06  94.71
> >13   (FLET #:BODY-FUN-[GETHASH3]1076)                          
> >1.06   2.65  95.77
> >14   LENGTH                                                    
> >0.53   0.53  96.30
> >15   SB-KERNEL::%MAKE-INSTANCE-WITH-LAYOUT                     
> >0.53   0.53  96.83
> >16   SB-IMPL::REFILL-INPUT-BUFFER                              
> >0.53   3.70  97.35
> >17   SB-KERNEL:%UNARY-TRUNCATE                                 
> >0.53   0.53  97.88
> >18   SB-THREAD::MAKE-SPINLOCK                                  
> >0.53   0.53  98.41
> >19   (FLET SB-IMPL::TRICK)                                     
> >0.53   0.53  98.94
> >20   SB-KERNEL:%SXHASH-SIMPLE-STRING                           
> >0.53   1.06  99.47
> >
> >
> >
> >
> >*** and an interesting last piece: using (ppcre:split "~" line
> >:sharedp t), adding a (declare (type simple-base-string
> >target-string)) to split and avoiding apply #'subseq, it's hardly
> >slower:
> >
> >Evaluation took:
> >  3.028 seconds of real time
> >  2.772173 seconds of user run time
> >  0.256016 seconds of system run time
> >  [Run times include 0.556 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  269,862,768 bytes consed.
> >
> >-- 
> > attila<split- 
> >sequence.lisp><sbcl.diff>--------------------------------------------- 
> >----------------------------
> >This SF.net email is sponsored by: Microsoft
> >Defy all challenges. Microsoft(R) Visual Studio 2005.
> >http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ 
> >_______________________________________________
> >Sbcl-devel mailing list
> >Sbc...@li...
> >https://lists.sourceforge.net/lists/listinfo/sbcl-devel
> 
> --
> Gary Warren King, metabang.com
> Cell: (413) 559 8738
> Fax: (206) 338-4052
> gwkkwg on Skype * garethsan on AIM
> 
> 
> 
> 

> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Sbcl-devel mailing list
> Sbc...@li...
> https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Attila L. <att...@gm...> - 2007-12-01 15:09:40

> Hi Attila,
>
> Thanks for all of your work!

i'm glad it helps. i was just pasting the results into a file and at
the end i've thought it may be interesting to others.

> I'm wondering if you or anyone else had any thoughts on the following
> questions:
>
> a. While this is a /dramatic/ improvement:
>    - It's still much slower than Python, and you mentioned that your
>    results were comparable to Python.  Any thoughts why?  Did you get
>    the same results on Gary's fake-data.tgz?

i was using the fake-data file, but chopped off the end at around 40
MB, iirc. i didn't want swapping and disk reading to influence the
measurements.

at the end it was about 1/4th of the initial time, so that 23 secs for
you should have gone down to 5.75. it was on linux x86_64.

but please note that as i've mentioned, applying that
simple-base-string patch to sbcl disables optimizations for a REPLACE
call in ansi-stream-read-line. you can read that in the compiler notes
when C-cC-c'ing ansi-stream-read-line. it's relatively easy to rewrite
the code to reenable it by adding an ecase for the stream element type
and adding two distinct calls to REPLACE. that turned the 3x speedup
for me into a 4x speedup. but that's a kludge, so it's not included in
the patch i've sent. (see Christophe's comment on a possible proper
solution)

>   - The reduction in consing in the example, was dramatic, but didn't
>     compare to your results.  Again, any thoughts why?  Did you get
>     the same results on Gary's fake-data.tgz?
>
> b. If I run (sb-ext:gc :full t) after each run, I can repeatedly call
> PARSE-TEXT with no issues, if I don't make the gc call, I get dropped
> into the debugger, is this to be expected?

that's a limitation of the current copying gc in sbcl, but i don't
know much more about the details.

> d. You wrote:
> > >*** and an interesting last piece: using (ppcre:split "~" line
> > >:sharedp t), adding a (declare (type simple-base-string
> > >target-string)) to split and avoiding apply #'subseq, it's hardly
> > >slower:
> To clarify, did you put
> (declare (type simple-base-string target-string))
> in api.lisp?

yes

> Could you explain how you: "avoiding apply #'subseq"?

i've replaced the (funcall substr-fn ...) to a simple (subseq ...) and
C-cC-c'd the function (as opposed to my bogus comment about #'apply i
wrote from memory). using funcall/apply hinders the application of
numerous compiler optimizations.

the key thing is using the profiler and checking the compiler notes
about failed optimizations when the (optimize (speed 3)) declaration
is in effect. i was using sb-sprof with Juho's brilliant
slime-profile-browser:
http://jsnell.iki.fi/blog/archive/2006-11-19-sb-sprof.html

hth,

-- 
 attila

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Christophe R. <cs...@ca...> - 2007-12-05 21:22:48

Brian Downing <bd-...@la...> writes:

>   * I couldn't use any of the nice external CL libraries that deal in
>     strings.  (*cough* CL-PPCRE *cough*)

So, as the major culprit for making the theoretical decision to have
base-strings handle ASCII only, I'll ask: how hard would it be to
adapt CL-PPCRE or similar libraries to work on octet vectors, using
the same programmer-friendly stringy input format that currently
exists?

The other point, parsing file contents where the encoding is not a
priori known, sounds like an excellent argument for implementing (setf
stream-external-format) and bivalent streams, but not so much for
stuffing arbitrary binary data into a sequence of characters.

I'll admit, I have a strong bias against pandering to a particular
interpretation of what happens to be convenient today when that is the
only argument that is advanced for doing something; if people could
convince me that latin-1 is the Right Thing in all circumstances for
BASE-CHAR to represent, then I would not argue against it.  (Duh. :-)

Cheers,

Christophe

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Brian D. <bd-...@la...> - 2007-12-05 22:24:27

On Wed, Dec 05, 2007 at 09:22:37PM +0000, Christophe Rhodes wrote:
> Brian Downing <bd-...@la...> writes:
> >   * I couldn't use any of the nice external CL libraries that deal in
> >     strings.  (*cough* CL-PPCRE *cough*)
> 
> So, as the major culprit for making the theoretical decision to have
> base-strings handle ASCII only, I'll ask: how hard would it be to
> adapt CL-PPCRE or similar libraries to work on octet vectors, using
> the same programmer-friendly stringy input format that currently
> exists?

I dunno, but I do know that I didn't want to.  :)

> The other point, parsing file contents where the encoding is not a
> priori known, sounds like an excellent argument for implementing (setf
> stream-external-format) and bivalent streams, but not so much for
> stuffing arbitrary binary data into a sequence of characters.

The thing is what I really want is a "bivalent array."  I wanted it
all, in memory, ready to be accessed in either form.  Calling stream
accessors per character was way too slow.

A latin-1 base-string is as close as I could figure to getting that and
still being able to do string operations on it.

> I'll admit, I have a strong bias against pandering to a particular
> interpretation of what happens to be convenient today when that is the
> only argument that is advanced for doing something; if people could
> convince me that latin-1 is the Right Thing in all circumstances for
> BASE-CHAR to represent, then I would not argue against it.  (Duh. :-)

Well, latin-1 is already a higher-class citizen than any other encoding,
as the first 256 Unicode code points are latin-1.  Also, it's not as if
the first 128 are any more stable.  A lot of Japanese code pages have
#\¥ at #x5C, for instance.  And then there's EBCDIC...

So I'm not sure assuming latin-1 for BASE-CHAR is any less sane than
assuming ASCII for BASE-CHAR.  And it's quite a bit more convenient for
things like I just described.

-bcd

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: David J. N. <dav...@gm...> - 2007-12-03 22:12:05

Hi Juho,

Many thanks for taking the time to craft a solution - it runs
extremely fast - much faster than the Python script and dramatically
reduced the number of bytes consed:

Evaluation took:
  1.937 seconds of real time
  1.791011 seconds of user run time
  0.119532 seconds of system run time
  [Run times include 0.432 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  103,786,536 bytes consed.

However, on the third attempt to run it I get

System call error 12 (Cannot allocate memory)
   [Condition of type SB-POSIX:SYSCALL-ERROR]

is there a way around this?

> Ok. May I ask why you're interested in this?
Sure, for two reasons:

1. We have a production app that processes large quantities of texual
and binary data.  As part of textual side, we're trying to tune some
learning algorithms, and want to work in CL.

2. Reading text files is a common task, and we're assuming that if
we're running into problems, it's likely someone is also, and this
could be a barrier to them adopting CL.

> Reading text files and manipulating strings are really the bread and
> butter features of scripting languages, and will have been optimized
> accordingly. So it's somewhat unreasonable to expect SBCL to be
> exactly as fast in the generic case.
OK, but:

LispWorks performs quite nicely compare to the Python program's
4.246053 seconds:

LispWorks against a text file w/out unicode using original code
as posted by Gary:
User time    =        6.106
System time  =        0.202
Elapsed time =        6.349
Allocation   = 756144540 bytes
0 Page faults
Timing the evaluation of (PARSE-TEXT "fake-data.txt")

LispWorks against a text file w/out unicode using original code
as posted by Gary using cl-ppcre:split rather than split-sequence:
split-sequence:
User time    =        6.426
System time  =        0.240
Elapsed time =        6.716
Allocation   = 602020240 bytes
0 Page faults
Timing the evaluation of (PARSE-TEXT-USING-SPLIT "fake-data.txt")

SBCL 1.0.12.14, with --dynamic-space-size 1500, against a text file w/out unicode using original code
as posted by Gary:
TEXT-PARSER> (time (parse-text "fake-data.txt"))
Evaluation took:
  19.346 seconds of real time
  16.760124 seconds of user run time
  2.497303 seconds of system run time
  [Run times include 8.315 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  4,183,290,560 bytes consed.

SBCL 1.0.12.14 with --dynamic-space-size 1500, against a text file
w/out unicode using original code as posted by Gary using
cl-ppcre:split rather than split-sequence:
Evaluation took:
  16.284 seconds of real time
  13.648916 seconds of user run time
  2.546111 seconds of system run time
  [Run times include 8.323 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  3,356,522,032 bytes consed.

And, then there's the gc issue - the orginal sbcl program can't be run
twice without dropping into the gc ... so thanks for the link about gc
characteristics and tuning:

> http://thread.gmane.org/gmane.lisp.steel-bank.devel/9879/focus=9883

unfortunately, after doing
  (define-alien-variable gencgc-oldest-gen-to-gc char)
  (setf gencgc-oldest-gen-to-gc 0) 
at the top level, the program drops into the ldb before
completing even one run.

BTW, Attila's solution can be run repeatedly without memory issues, I
assume due to the "non-unicode" savings.

My question to the sbcl developers is: Is there a solution to reading
a plain (non-unicode) text file into a data structure, that can be
incorporated into sbcl, maybe something along the lines of what Attila
provided, that will result in speed and memory usage on par with
Python, or LispWorks, or in the best case Juho's program?

Again, many thanks for the help - we hope the solutions being offered
here are useful to others!

Cheers,
David

On Sun, Dec 02, 2007 at 12:58:52AM +0200, Juho Snellman wrote:
> "David J. Neu" <dav...@gm...> writes:
> > Hi Attila,
> > 
> > Thanks for all of your work!
> > 
> > We're very interested in improving the performance of the program that
> > Gary posted, and have tried some comparisions with and without your
> > changes, and using Python.
> 
> Ok. May I ask why you're interested in this? Reading text files and
> manipulating strings are really the bread and butter features of
> scripting languages, and will have been optimized accordingly. So it's
> somewhat unreasonable to expect SBCL to be exactly as fast in the
> generic case.
> 
> That said, I've attached a version of the benchmark program that runs
> in 2.5s versus 4.3s for the Python version.
> 

> ;;;; Utilities to mmap a file directly into an SBCL base string
> 
> (defmacro with-mmaped-base-string ((string file) &body body)
>   (let ((handle-var (gensym))
>         (stream-var (gensym)))
>     `(with-open-file (,stream-var ,file)
>        (let ((,handle-var (mmap-as-base-string ,stream-var)))
>          (unwind-protect
>               (let ((,string (mmap-handle-string ,handle-var)))
>                 ,@body)
>            (mmap-close ,handle-var))))))
> 
> (defstruct mmap-handle
>   (string (coerce "" 'base-string) :type simple-base-string)
>   fd
>   address
>   length)
> 
> (defun mmap-close (handle)
>   (sb-posix:munmap (mmap-handle-address handle)
>                    (mmap-handle-length handle)))
> 
> (defun mmap-as-base-string (stream)
>   (declare (optimize debug)
>            (notinline sb-posix::mmap))
>   (with-open-file (devnull "/dev/null")
>     (let* ((length (file-length stream))
>            (sap1 (sb-posix:mmap nil
>                                 (+ length 4096)
>                                 (logior sb-posix:prot-read sb-posix:prot-write)
>                                 sb-posix:map-private
>                                 (sb-impl::fd-stream-fd stream)
>                                 0))
>            (sap2 (sb-posix:mmap (sb-sys:sap+ sap1 4096)
>                                 length
>                                 sb-posix:prot-read
>                                 (logior sb-posix:map-private sb-posix:map-fixed)
>                                 (sb-impl::fd-stream-fd stream)
>                                 0))
>            (handle (make-mmap-handle :address sap1
>                                      :length length)))
>       ;; simple-base-string header word
>       (setf (sb-sys:sap-ref-word sap2 (- (* 2 sb-vm:n-word-bytes)))
>             sb-vm:simple-base-string-widetag)
>       ;; simple-base-string length word (as fixnum)
>       (setf (sb-sys:sap-ref-word sap2 (- (* 1 sb-vm:n-word-bytes)))
>             (ash length sb-vm:n-fixnum-tag-bits))
>       (setf (mmap-handle-string handle)
>             (sb-kernel:%make-lisp-obj
>              (logior sb-vm:other-pointer-lowtag
>                      (- (sb-sys:sap-int sap2) (* 2 sb-vm:n-word-bytes)))))
>       handle)))
> 
> ;;;; Implement the benchmark
> 
> (defun split-and-trim-sequence (delimiter string start-of-line end-of-line trim)
>   (declare (type simple-base-string string)
>            (type fixnum start-of-line end-of-line)
>            (optimize speed))
>   (loop for start = start-of-line then (1+ end)
>         for end = (position delimiter string :start start :end end-of-line)
>         for end-pos = (or end end-of-line)
>         for length = (- end-pos start)
>         do (loop while (< start end-pos)
>                  while (eql (aref string start) trim)
>                  do (incf start))
>         do (loop until (eql end-pos start)
>                  while (eql (aref string (1- end-pos)) trim)
>                  do (decf end-pos))
>         collect (if (< length 64)
>                     ;; A displaced array takes around 64 bytes. For strings
>                     ;; shorter than 64 base-chars we might as well just
>                     ;; make a simple string.
>                     (subseq string start end-pos)
>                     (make-array length
>                                 :element-type 'base-char
>                                 :displaced-to string
>                                 :displaced-index-offset start))
>         while (and end (< end end-of-line))))
> 
> (defun parse-text (filename)
>   (declare (optimize speed))
>   (with-mmaped-base-string (string filename)
>     ;; Note that the contents of the hash-table won't be valid outside
>     ;; the dynamic scope of WITH-MMAPED-BASE-STRING, since we're
>     ;; displacing arrays to STRING.
>     (let ((ht (make-hash-table :test 'equal)))
>       (loop for start = 0 then (1+ end)
>             for end = (position #\Newline string :start start)
>             for end-pos = (or end (length string))
>             do (let ((fields (split-and-trim-sequence #\~ string start end-pos
>                                                       #\space)))
>                  (when (= (length (the list fields)) 3)
>                    (destructuring-bind (id attribute value) fields
>                      (when (not (gethash id ht))
>                        (setf (gethash id ht) (make-hash-table :test 'equal)))
>                      (let ((fields-ht (gethash id ht)))
>                        (setf (gethash attribute fields-ht) value)))))
>             while end)
>       (print (hash-table-count ht))
>       nil)))

> 
> (You might object that the program is more complex than the original
> code. But if the factor of 2 difference in runtimes between the
> earlier Lisp solution and Python is an issue for you, surely a little
> extra complexity is a price worth paying for something that's a factor
> of 2 faster than the Python program.)
> 
> > b. If I run (sb-ext:gc :full t) after each run, I can repeatedly call
> > PARSE-TEXT with no issues, if I don't make the gc call, I get dropped
> > into the debugger, is this to be expected?
> 
> Yes.
>  
> > c. Juho had mentioned that tuning the garbage collector was discussed
> > on this list, would this address point b., and could someone point me
> > to the appropriate post?
> > 
> > I found this post
> > http://sourceforge.net/mailarchive/message.php?msg_id=l37iskysld.fsf%40kyle.netcamp.se
> > but am not sure how to directly apply the discussion.
> 
> That is a completely unrelated discussion. See for example:
> 
> http://thread.gmane.org/gmane.lisp.steel-bank.devel/9879/focus=9883
> 
> -- 
> Juho Snellman

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: John W. <jo...@ne...> - 2007-12-04 03:18:34

This might not be apropos, but I just came across this tech note from  
Allegro
where they are doing a very similar thing with CL: using it to parse  
strings
from gigantic text files.  The way they combatted the performance  
issue was
to custom-build a non-consing version of read-line.  I wonder if such  
a thing
would improve SBCL's performance in this example?

   http://www.franz.com/support/tech_corner/cons-tricks-121306.lhtml

John

On Dec 3, 2007, at 6:11 PM, David J. Neu wrote:

> Many thanks for taking the time to craft a solution - it runs
> extremely fast - much faster than the Python script and dramatically
> reduced the number of bytes consed:
>
> Evaluation took:
>  1.937 seconds of real time
>  1.791011 seconds of user run time
>  0.119532 seconds of system run time
>  [Run times include 0.432 seconds GC run time.]
>  0 calls to %EVAL
>  0 page faults and
>  103,786,536 bytes consed.

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: David J. N. <dav...@gm...> - 2007-12-04 12:21:45

Yes, this would be very nice, and there have also be many suggestions
in this thread about "improving" read-line itself such as:

- having its return type depend on its stream's element-type, so e.g.
  that it would in this example return a base-string

- increasing +ANSI-STREAM-IN-BUFFER-LENGTH+

Cheers,
David

On Mon, Dec 03, 2007 at 11:17:58PM -0400, John Wiegley wrote:
> This might not be apropos, but I just came across this tech note from  
> Allegro
> where they are doing a very similar thing with CL: using it to parse  
> strings
> from gigantic text files.  The way they combatted the performance  
> issue was
> to custom-build a non-consing version of read-line.  I wonder if such  
> a thing
> would improve SBCL's performance in this example?
> 
>    http://www.franz.com/support/tech_corner/cons-tricks-121306.lhtml
> 
> John
> 
> On Dec 3, 2007, at 6:11 PM, David J. Neu wrote:
> 
> > Many thanks for taking the time to craft a solution - it runs
> > extremely fast - much faster than the Python script and dramatically
> > reduced the number of bytes consed:
> >
> > Evaluation took:
> >  1.937 seconds of real time
> >  1.791011 seconds of user run time
> >  0.119532 seconds of system run time
> >  [Run times include 0.432 seconds GC run time.]
> >  0 calls to %EVAL
> >  0 page faults and
> >  103,786,536 bytes consed.
> 
> -------------------------------------------------------------------------
> SF.Net email is sponsored by: The Future of Linux Business White Paper
> from Novell.  From the desktop to the data center, Linux is going
> mainstream.  Let it simplify your IT future.
> http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
> _______________________________________________
> Sbcl-devel mailing list
> Sbc...@li...
> https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Juho S. <js...@ik...> - 2007-12-04 16:15:20

"David J. Neu" <dav...@gm...> writes:
> However, on the third attempt to run it I get
> 
> System call error 12 (Cannot allocate memory)
>    [Condition of type SB-POSIX:SYSCALL-ERROR]
> 
> is there a way around this?

Hmm... looks like an off-by-4096, the last mmaped page was never freed
-> address space becomes fragmented. I wonder why it didn't fail on
Linux. Maybe the following would help?

(defvar +page-size+ (sb-posix:getpagesize))

(defun mmap-as-base-string (stream)
  (declare (optimize debug)
           (notinline sb-posix::mmap))
  (with-open-file (devnull "/dev/null")
    (let* ((file-length (file-length stream))
           (map-length (+ file-length +page-size+))
           (sap1 (sb-posix:mmap nil
                                map-length
                                (logior sb-posix:prot-read sb-posix:prot-write)
                                sb-posix:map-private
                                (sb-impl::fd-stream-fd stream)
                                0))
           (sap2 (sb-posix:mmap (sb-sys:sap+ sap1 +page-size+)
                                file-length
                                sb-posix:prot-read
                                (logior sb-posix:map-private sb-posix:map-fixed)
                                (sb-impl::fd-stream-fd stream)
                                0))
           (handle (make-mmap-handle :address sap1
                                     :length map-length)))
      ;; simple-base-string header word
      (setf (sb-sys:sap-ref-word sap2 (- (* 2 sb-vm:n-word-bytes)))
            sb-vm:simple-base-string-widetag)
      ;; simple-base-string length word (as fixnum)
      (setf (sb-sys:sap-ref-word sap2 (- (* 1 sb-vm:n-word-bytes)))
            (ash file-length sb-vm:n-fixnum-tag-bits))
      (setf (mmap-handle-string handle)
            (sb-kernel:%make-lisp-obj
             (logior sb-vm:other-pointer-lowtag
                     (- (sb-sys:sap-int sap2) (* 2 sb-vm:n-word-bytes)))))
      handle)))

> > Ok. May I ask why you're interested in this?
> Sure, for two reasons:
> 
> 1. We have a production app that processes large quantities of texual
> and binary data.  As part of textual side, we're trying to tune some
> learning algorithms, and want to work in CL.

It's not obvious to me that Gary's benchmark accurately models most
processes I've seen that process large quantities of text
data. Usually they for example don't hang on to all of the data for
the duration of the whole process.

> > Reading text files and manipulating strings are really the bread and
> > butter features of scripting languages, and will have been optimized
> > accordingly. So it's somewhat unreasonable to expect SBCL to be
> > exactly as fast in the generic case.
> OK, but:
> 
> LispWorks performs quite nicely compare to the Python program's
> 4.246053 seconds:
>
> LispWorks against a text file w/out unicode using original code
> as posted by Gary:
> User time    =        6.106

That's still basically crappy performance. Obviously not nearly as
crappy as SBCL, but it's still 2x slower than an idiomatically written
Python program would be.

> > http://thread.gmane.org/gmane.lisp.steel-bank.devel/9879/focus=9883
> 
> unfortunately, after doing
>   (define-alien-variable gencgc-oldest-gen-to-gc char)
>   (setf gencgc-oldest-gen-to-gc 0) 
> at the top level, the program drops into the ldb before
> completing even one run.

Sure. You're trying to stuff 230*4 (unicode)*2 (copying gc)+a bit
(non-strings) of data into 1500MB, which can reasonably be expected to
fail.

> My question to the sbcl developers is: Is there a solution to reading
> a plain (non-unicode) text file into a data structure, that can be
> incorporated into sbcl, maybe something along the lines of what Attila
> provided

I'm going to be committing a nicer implementation of READ-LINE once I
get some machine with my ssl keys back on the net. This is basically
just fixing the issues that the current implementation has with long
lines (this reduces the amount of time for reading in the input file
from 5.5s to 3.5s).

I don't think doing any base-string specific handling in READ-LINE
would make much sense as long as the fd-stream character input buffer
is implemented as a (SIMPLE-ARRAY CHARACTER).

The simple solution would be to just wrap the mmap hack into a couple
of gray streams for base-char or (unsigned-byte 8) input. Should be
trivial, it's only one more data copy than that direct solution, but
it would hide all the ugliness. I would rather see this in a user
library though, since once fu-streams have been implemented (ha, ha),
those gray streams would basically be obsolete.

> that will result in speed and memory usage on par with Python

Unlikely to ever happen.

> or LispWorks,

Somewhat improbable in the near future on that exact program out of
the box. With slight rewriting and using a gray stream like above,
sure, right now.

> or in the best case Juho's program?

That program itself is obviously not going to go into SBCL :-) I'm not
wild about adding support for mmaping stuff as lisp arrays in SBCL
either, though it has been proposed before.

-- 
Juho Snellman

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: David J. N. <dav...@gm...> - 2007-12-04 18:09:17

Hi Juho,

On Tue, Dec 04, 2007 at 06:15:40PM +0200, Juho Snellman wrote:
> "David J. Neu" <dav...@gm...> writes:
> > However, on the third attempt to run it I get
> > 
> > System call error 12 (Cannot allocate memory)
> >    [Condition of type SB-POSIX:SYSCALL-ERROR]
> > 
> > is there a way around this?
> 
> Hmm... looks like an off-by-4096, the last mmaped page was never freed
> -> address space becomes fragmented. I wonder why it didn't fail on
> Linux. Maybe the following would help?
> 
Yep, that fixed it - BTW, I'm running FreeBSD 6.2.

> It's not obvious to me that Gary's benchmark accurately models most
> processes I've seen that process large quantities of text
> data. Usually they for example don't hang on to all of the data for
> the duration of the whole process.
I can't say for sure, but it's sure fast when running experiments, or
for keeping a big lookup table in a production app.

> I'm going to be committing a nicer implementation of READ-LINE once I
> get some machine with my ssl keys back on the net. This is basically
> just fixing the issues that the current implementation has with long
> lines (this reduces the amount of time for reading in the input file
> from 5.5s to 3.5s).
Great - thanks!

> I don't think doing any base-string specific handling in READ-LINE
> would make much sense as long as the fd-stream character input buffer
> is implemented as a (SIMPLE-ARRAY CHARACTER).
Hmmm, does that imply that there's no way (using classic functions
like read-line) to pull in a plain old ASCII text file w/out ending up
with the unicode bloat issue?
 
> The simple solution would be to just wrap the mmap hack into a couple
> of gray streams for base-char or (unsigned-byte 8) input. Should be
> trivial, it's only one more data copy than that direct solution, but
> it would hide all the ugliness. I would rather see this in a user
> library though, since once fu-streams have been implemented (ha, ha),
> those gray streams would basically be obsolete.
Seems like a wonderfully powerful tool to have available. 

> > or LispWorks,
> 
> Somewhat improbable in the near future on that exact program out of
> the box. With slight rewriting and using a gray stream like above,
> sure, right now.
Can I ask why?
 
> > or in the best case Juho's program?
> 
> That program itself is obviously not going to go into SBCL :-) I'm not
> wild about adding support for mmaping stuff as lisp arrays in SBCL
> either, though it has been proposed before.
> 
Again, seems like a wonderfully powerful tool.

Bottomline, I understand that SBCL can't beat every language on every
benchmark, but on a task as common as reading a text file, it seems
that there'd be great benefit in:

- the mods to read-line that you suggested, as well as a way to get
  plain vanilla ascii strings

- the mmaping stuff

- something like Franz has implemented
http://franz.com/support/tech_corner/cons-tricks-121306.lhtml

Many thanks for your help, we use SBCL here everyday in prodution, and
truly value the improvements that are being made.

Cheers,
David

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: Gary K. <gw...@me...> - 2007-12-09 19:21:23

Hi Nathan,

Can you say more or point me towards a reference for how Python does  
handle strings? Why can't Python be using destructive operations?

thanks,

On Nov 21, 2007, at 4:22 PM, Nathan Froyd wrote:

>> 1. (Common) Lisp doesn't include any convenient destructive string
>> operations or even a buffered read-line approach (Franz has one now
>> but it's not standard). Having these around would, I think, be a good
>> thing.
>
> The Python code you posted earlier doesn't use destructive string
> operations--indeed, it can't, due to the nature of Python strings.

--
Gary Warren King, metabang.com
Cell: (413) 559 8738
Fax: (206) 338-4052
gwkkwg on Skype * garethsan on AIM

Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better here...?

From: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX - 2007-12-09 21:49:55

On Dec 9, 2007 8:21 PM, Gary King <gw...@me...> wrote:
> Hi Nathan,
>
> Can you say more or point me towards a reference for how Python does
> handle strings? Why can't Python be using destructive operations?

Because its strings are by definition immutable.


bye,


Erik.

>
> thanks,
>
> On Nov 21, 2007, at 4:22 PM, Nathan Froyd wrote:
>
> >> 1. (Common) Lisp doesn't include any convenient destructive string
> >> operations or even a buffered read-line approach (Franz has one now
> >> but it's not standard). Having these around would, I think, be a good
> >> thing.
> >
> > The Python code you posted earlier doesn't use destructive string
> > operations--indeed, it can't, due to the nature of Python strings.
>
> --
> Gary Warren King, metabang.com
> Cell: (413) 559 8738
> Fax: (206) 338-4052
> gwkkwg on Skype * garethsan on AIM
>
>
>
>
>
> -------------------------------------------------------------------------
> SF.Net email is sponsored by:
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
>
> _______________________________________________
> Sbcl-devel mailing list
> Sbc...@li...
> https://lists.sourceforge.net/lists/listinfo/sbcl-devel
>

<< < 1 2 (Page 2 of 2)

Thread: Re: [Sbcl-devel] on an unsettling Lisp versus Python 'competition' or why isn't Lisp doing better h (Page 2)

Common Lisp compiler and runtime

sbcl-devel