From: Attila L. <att...@gm...> - 2007-11-22 14:19:30
|
hi, i've optimized it a bit which i think is in par with the python speed if not faster now. two things are attached to the mail: one is the optimized split-sequence and the other is the diff to sbcl HEAD. this latter diff contains the previously posted base-char changes to read-line and an optimization to string-trim. (the big change in stream.lisp is only the make-result-string macrolet and the (declare (type index len index)) type declaration). i was working with a smaller version of the file to keep it in the disk cache and avoid swapping. it went down from 9.927 to 2.597. the final form of the defun: (defun parse-text (&optional (filename "/home/ati/fake-data.txt")) (declare (optimize speed (debug 0)) (inline split-sequence:split-sequence string-trim)) (with-open-file (in filename :element-type 'base-char :external-format :ascii :direction :input :if-does-not-exist :error) (let ((ht (make-hash-table :test 'equal))) (loop for line of-type simple-base-string = (read-line in nil) while line do (let ((fields (split-sequence:split-sequence #\~ line))) (when (= (length fields) 3) (let ((id (string-trim " " (the simple-base-string (first fields)))) (attribute (string-trim " " (the simple-base-string (second fields)))) (value (string-trim " " (the simple-base-string (third fields))))) (when (not (gethash id ht)) (setf (gethash id ht) (make-hash-table :test 'equal))) (let ((fields-ht (gethash id ht))) (setf (gethash attribute fields-ht) value)))))) ;;(print (hash-table-count ht)) (values)))) i've recorded the interesting steps: CL-USER> (progn (sb-ext:gc :full t) (time (parse-text))) *** unoptimized: Evaluation took: 9.927 seconds of real time 9.056566 seconds of user run time 0.856054 seconds of system run time [Run times include 2.352 seconds GC run time.] 0 calls to %EVAL 0 page faults and 1,633,159,856 bytes consed. *** after applying the patch for read-line to return base-string's: difference is below measuring error (but it's important for the split-sequence inlining) *** after adding (declare (optimize speed (debug 0))) difference is below measuring error *** after optimizing and inlining split-sequence (it even had apply #'position calls!) Evaluation took: 4.916 seconds of real time 4.504282 seconds of user run time 0.408026 seconds of system run time [Run times include 1.044 seconds GC run time.] 0 calls to %EVAL 0 page faults and 796,540,592 bytes consed. *** after adding optimization to string-trim that avoids calling subseq if there's nothing to be trimmed (note: this is explicitly allowed by the spec): Evaluation took: 3.372 seconds of real time 2.940184 seconds of user run time 0.428027 seconds of system run time [Run times include 1.048 seconds GC run time.] 0 calls to %EVAL 0 page faults and 728,696,000 bytes consed. *** after inlining string-trim: time difference is below measuring error, but consing went down a little. Evaluation took: 3.28 seconds of real time 2.864179 seconds of user run time 0.396024 seconds of system run time [Run times include 1.024 seconds GC run time.] 0 calls to %EVAL 0 page faults and 718,985,664 bytes consed. *** annotating (the simple-base-string ...) for the args of the string-trim calls: Evaluation took: 3.107 seconds of real time 2.740172 seconds of user run time 0.360022 seconds of system run time [Run times include 1.048 seconds GC run time.] 0 calls to %EVAL 0 page faults and 718,925,760 bytes consed. *** at this point the profiling looks like this (sorry, it's unreadable without fixed font, but plain text mails are preferred): Rank Name Self% Cumul% Total% 1 READ-LINE 39.34 53.08 39.34 Callers PARSE-TEXT 53.08 Calls SB-INT:FAST-READ-CHAR-REFILL 9.00 SB-KERNEL:%SHRINK-VECTOR 3.32 SB-KERNEL:UB32-BASH-COPY 1.42 2 (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24) 24.64 24.64 63.98 3 SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII 7.58 9.00 71.56 Callers SB-INT:FAST-READ-CHAR-REFILL 9.00 Calls SB-IMPL::REFILL-INPUT-BUFFER 1.42 4 PARSE-TEXT 6.16 93.84 77.73 Callers NIL 93.84 Calls READ-LINE 53.08 SB-IMPL::GETHASH3 5.21 SB-VM::GENERIC-+ 4.74 MAKE-HASH-TABLE 3.32 SB-KERNEL:UB8-BASH-COPY 2.84 SB-KERNEL:%PUTHASH 0.95 5 SB-VM::GENERIC-+ 4.74 4.74 82.46 6 SB-KERNEL:%SHRINK-VECTOR 3.32 3.32 85.78 7 SB-KERNEL:UB8-BASH-COPY 2.84 2.84 88.63 8 SB-KERNEL:%SP-STRING-COMPARE 2.37 2.37 91.00 9 MAKE-HASH-TABLE 2.37 3.32 93.36 10 SB-KERNEL:STRING=* 1.42 3.79 94.79 11 SB-KERNEL:UB32-BASH-COPY 1.42 1.42 96.21 12 (FLET #:BODY-FUN-[GETHASH3]1076) 0.95 5.21 97.16 13 (FLET SB-IMPL::TRICK) 0.47 0.47 97.63 14 SB-IMPL::REFILL-INPUT-BUFFER 0.47 1.42 98.10 15 SB-IMPL::CEIL-POWER-OF-TWO 0.47 0.47 98.58 16 (FLET #:CLEANUP-FUN-[%PUTHASH]1409) 0.47 0.47 99.05 17 SB-KERNEL:%PUTHASH 0.47 0.95 99.53 18 SB-IMPL::%MAKE-HASH-TABLE 0.47 0.47 100.00 *** after rising +ansi-stream-in-buffer-length+ to 4096 and recompiling sbcl note that consing fall down drastically but due to the problem with REPLACE that Christoph mentioned it got slower. but hopefully Juho has something to say here. Evaluation took: 3.59 seconds of real time 3.272205 seconds of user run time 0.316019 seconds of system run time [Run times include 0.528 seconds GC run time.] 0 calls to %EVAL 0 page faults and 260,260,240 bytes consed. Rank Name Self% Cumul% Total% 1 REPLACE 16.42 49.25 16.42 Callers READ-LINE 49.25 REPLACE 0.37 Calls SB-IMPL::OPTIMIZED-DATA-VECTOR-SET 10.07 SB-KERNEL:HAIRY-DATA-VECTOR-SET 9.70 SB-KERNEL:HAIRY-DATA-VECTOR-REF 7.84 SB-IMPL::OPTIMIZED-DATA-VECTOR-REF 2.61 SB-IMPL::OPTIMIZED-DATA-VECTOR-REF 2.24 LENGTH 0.37 REPLACE 0.37 2 SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII 11.57 16.42 27.99 Callers SB-INT:FAST-READ-CHAR-REFILL 16.42 Calls SB-IMPL::REFILL-INPUT-BUFFER 4.85 3 (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24) 10.82 10.82 38.81 4 READ-LINE 10.07 76.12 48.88 Callers PARSE-TEXT 76.12 Calls REPLACE 49.25 SB-INT:FAST-READ-CHAR-REFILL 16.42 SB-KERNEL:%SHRINK-VECTOR 0.37 5 SB-IMPL::OPTIMIZED-DATA-VECTOR-SET 10.07 10.07 58.96 Callers REPLACE 10.07 6 SB-KERNEL:HAIRY-DATA-VECTOR-SET 9.70 9.70 68.66 7 PARSE-TEXT 7.84 95.90 76.49 8 SB-KERNEL:HAIRY-DATA-VECTOR-REF 7.84 7.84 84.33 9 SB-IMPL::OPTIMIZED-DATA-VECTOR-REF 2.61 2.61 86.94 10 SB-IMPL::OPTIMIZED-DATA-VECTOR-REF 2.24 2.24 89.18 11 SB-KERNEL:UB8-BASH-COPY 1.87 1.87 91.04 12 SB-KERNEL:STRING=* 1.49 2.24 92.54 13 SB-KERNEL:%PUTHASH 0.75 1.12 93.28 14 SB-KERNEL:%SP-STRING-COMPARE 0.75 0.75 94.03 *** after getting rid of the REPLACE problem with an ugly hack in ansi-stream-read-line that helps propagating the type to REPLACE: Evaluation took: 2.597 seconds of real time 2.356147 seconds of user run time 0.240015 seconds of system run time [Run times include 0.532 seconds GC run time.] 0 calls to %EVAL 0 page faults and 260,262,768 bytes consed. Rank Name Self% Cumul% Total% 1 PARSE-TEXT 16.40 96.83 16.40 2 SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII 14.81 18.52 31.22 Callers SB-INT:FAST-READ-CHAR-REFILL 18.52 Calls SB-IMPL::REFILL-INPUT-BUFFER 3.70 3 READ-LINE 13.76 61.90 44.97 Callers PARSE-TEXT 61.90 Calls REPLACE 29.63 SB-INT:FAST-READ-CHAR-REFILL 18.52 4 (FLET #:CLEANUP-FUN-[CALL-WITHOUT-INTERRUPTS]24) 13.76 13.76 58.73 Callers SB-UNIX::CALL-WITHOUT-INTERRUPTS 13.76 5 REPLACE 8.47 29.63 67.20 Callers READ-LINE 29.63 REPLACE 0.53 Calls SB-KERNEL:HAIRY-DATA-VECTOR-REF 7.41 SB-KERNEL:HAIRY-DATA-VECTOR-SET 5.82 SB-IMPL::OPTIMIZED-DATA-VECTOR-SET 3.70 SB-IMPL::OPTIMIZED-DATA-VECTOR-REF 3.70 REPLACE 0.53 LENGTH 0.53 6 SB-KERNEL:HAIRY-DATA-VECTOR-REF 7.41 7.41 74.60 7 SB-KERNEL:HAIRY-DATA-VECTOR-SET 5.82 5.82 80.42 8 SB-KERNEL:UB8-BASH-COPY 4.23 4.23 84.66 9 SB-IMPL::OPTIMIZED-DATA-VECTOR-REF 3.70 3.70 88.36 10 SB-IMPL::OPTIMIZED-DATA-VECTOR-SET 3.70 3.70 92.06 11 (FLET #:BODY-FUN-[%PUTHASH]1355) 1.59 2.12 93.65 12 SB-KERNEL:%SP-STRING-COMPARE 1.06 1.06 94.71 13 (FLET #:BODY-FUN-[GETHASH3]1076) 1.06 2.65 95.77 14 LENGTH 0.53 0.53 96.30 15 SB-KERNEL::%MAKE-INSTANCE-WITH-LAYOUT 0.53 0.53 96.83 16 SB-IMPL::REFILL-INPUT-BUFFER 0.53 3.70 97.35 17 SB-KERNEL:%UNARY-TRUNCATE 0.53 0.53 97.88 18 SB-THREAD::MAKE-SPINLOCK 0.53 0.53 98.41 19 (FLET SB-IMPL::TRICK) 0.53 0.53 98.94 20 SB-KERNEL:%SXHASH-SIMPLE-STRING 0.53 1.06 99.47 *** and an interesting last piece: using (ppcre:split "~" line :sharedp t), adding a (declare (type simple-base-string target-string)) to split and avoiding apply #'subseq, it's hardly slower: Evaluation took: 3.028 seconds of real time 2.772173 seconds of user run time 0.256016 seconds of system run time [Run times include 0.556 seconds GC run time.] 0 calls to %EVAL 0 page faults and 269,862,768 bytes consed. -- attila |