I presume that you want base-char, not standard-char.

--
With best regards, Stas.

On Mar 28, 2012 8:58 PM, "Akshay Srinivasan" <akshaysrinivasan@gmail.com> wrote:
Hello,
       I found that reading from large files is about 10x slower using C
read(..).

----Vanilla version-----------------------------------
(defun file->string (path)
 (with-open-file (s path :element-type 'standard-char)
   (let* ((len (file-length s))
          (data (make-array len :element-type 'standard-char)))
     (values data (read-sequence data s)))))
------------------------------------------------------

However this one is just as fast as C
------------------------------------------------------------------------
(defun file->string (path)
 (let* ((fsize (with-open-file (s path)
                 (file-length s)))
        (data (make-array fsize :element-type 'standard-char))
        (fd (sb-posix:open path 0)))
   (unwind-protect (sb-posix:read fd (sb-sys:vector-sap data) fsize)
     (sb-posix:close fd))
   (values data fsize)))
------------------------------------------------------------------------

and so is this, although the data is in char-code
------------------------------------------------------------------
(defun file->string (path)
 (with-open-file (s path :element-type '(unsigned-byte 8))
   (let* ((len (file-length s))
      (data (make-array len :element-type '(unsigned-byte 8))))
     (values data (read-sequence data s)))))
------------------------------------------------------------------

I traced this issue to sb-impl:make-fd-stream clobbering the stream
type 'standard-char to 'character, and hence taking the slow branch
for copy inside ansi-read-sequence. This didn't look like a bug,
though. It'd be nice to have fast file reads without resorting to
clumsy hacks though (Python2.7 is faster by the way :).

I wouldn't mind helping out with implemeting this; I'm afraid I'll
break something in the process though.

Octets-to-string also defaults to the bigger character type, so
conversion from the '(unsigned-byte 8) array to a string is also quite
slow.

This was on a Linux 3.2 AMD64, with iso8859-1 locale, with SBCL 1.0.54.

Akshay

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Sbcl-devel mailing list
Sbcl-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel