Hi, I am trying to do some text file processing in sbcl, but found it is too slow to split and join strings, for example:
(defvar *x*)
(defvar *y*)
(defun f1 () (setf *x* (iter (for r in-file "test.txt" using #'read-line) (collect (split-sequence:split-sequence #\, r)))) nil)
(defun f2 () (setf *y* (iter (for i in *x*) (collect (format nil "~{~a~^,~}" i)))) nil)
where "test.txt" is a file with 1 million identical rows, each row being "1,2,3,4,5". I installed split-sequence and iterate via asdf-install.
When I run it on sbcl 1.0.28 on my Linux machine, I got
* (time (f1))
Evaluation took:
6.660 seconds of real time
5.404337 seconds of total run time (4.940308 user, 0.464029 system)
[ Run times consist of 2.460 seconds GC time, and 2.945 seconds non-GC time. ]
81.14% CPU
14,016,425,883 processor cycles
224,696,648 bytes consed
* (time (f2))
Evaluation took:
23.080 seconds of real time
21.113320 seconds of total run time (19.593225 user, 1.520095 system)
[ Run times consist of 8.037 seconds GC time, and 13.077 seconds non-GC time. ]
91.48% CPU
48,570,513,254 processor cycles
9,079,813,896 bytes consed
This is already much better than the performance in clisp 2.34 on the same machine, where f1 and f2 took 15.24 and 61.92 seconds, respectively. However, a Perl script on the same machine like this
open F, "<test.txt";
@x = ();
while (<F>) {
chomp;
$r = [];
@$r = split /,/;
push @x, $r;
}
close F;
@y = ();
for $r (@x) {
push @y, (join ",", @$r);
}
can finish in 8 seconds. So why is it taking so long here? Thanks!
|