From: Jefferson P. <jp...@cs...> - 2002-02-01 00:29:35
|
On 1/31/02 5:40 PM, "Tunc Simsek" <si...@ee...> wrote: > what do you mean by native. both allegro and cmucl produce native code. > looking at mtimes.lisp; it looks like all the declarations are in place > for m.*! so the only reason that it may be slow is if the optimization > flags were not set during compilation. I personally never checked that > they are. It is worth checking this. Sorry, I misspoke: by native I meant to say implemented in fortran. Here are the average times of a few operations. operation | msecs (mean of 100 runs) -----------+--------- (m.*! a b) | 1351.6 (m.+! a b) | 8.6 (m* c d) | 15.2 A,B = 500x500 real-matrices C = 500x1 real matrix D = 1x500 real matrix (Machine: 1.2GHz Athlon w/ 512M RAM running Allegro 6.0 on Debian Linux) Jeff |
From: Jefferson P. <jp...@cs...> - 2002-02-01 17:25:07
|
On 2/1/02 7:28 AM, "Raymond Toy" <to...@rt...> wrote: > Jefferson> D = 1x500 real matrix > > You do realize that a*b is O(500^3), a+b is O(500^2) and c*d is also > O(500^2). So it's not surprising that a*b takes 157 times longer than > a+b and 88 times longer than c*d. Yes, I realize that. But I wasn't doing a*b, I was doing a.*b, which is O(500x500), same as a+b, and c*d. Looking in mtimes.lisp, m* is implemented by calling GEMM, which is implemented in fortran, but m.* and m.*! are written in lisp. It doesn't surprise me that the routines written in LISP are much slower. I was just surprised to find that function written in lisp. I have to problem writing in myself, but I wanted to make sure I wasn't missing something first. J. |
From: Raymond T. <to...@rt...> - 2002-02-01 17:34:18
|
>>>>> "Jefferson" == Jefferson Provost <jp...@cs...> writes: Jefferson> On 2/1/02 7:28 AM, "Raymond Toy" <to...@rt...> wrote: Jefferson> D = 1x500 real matrix >> >> You do realize that a*b is O(500^3), a+b is O(500^2) and c*d is also >> O(500^2). So it's not surprising that a*b takes 157 times longer than >> a+b and 88 times longer than c*d. Jefferson> Yes, I realize that. But I wasn't doing a*b, I was doing a.*b, which is Jefferson> O(500x500), same as a+b, and c*d. Sorry, I missed the dot! You are right. A peek at the generated code on Solaris seems to show that we are calling out to generic functions way too often and could be vastly optimized. I know with CMUCL, a Lisp version of m+ was almost as fast (< 5% slower?) than Fortran, so m.* should be as fast, even in Lisp. I'll look in to it soon. Ray |
From: Jefferson P. <jp...@cs...> - 2002-02-01 18:29:24
|
On 2/1/02 11:34 AM, "Raymond Toy" <to...@rt...> wrote: >>>>>> "Jefferson" == Jefferson Provost <jp...@cs...> writes: > > You are right. A peek at the generated code on Solaris seems to show > that we are calling out to generic functions way too often and could > be vastly optimized. > > I know with CMUCL, a Lisp version of m+ was almost as fast (< 5% > slower?) than Fortran, so m.* should be as fast, even in Lisp. > > I'll look in to it soon. Thanks, though I'd be happy to just write the thing in fortran myself, but I'm totally unfamiliar with the foreign function interfaces for Allegro and CMUCL. Actually, I have another routine that I use a lot which I was hoping might be in MATLISP, but which I couldn't find. I've had to write it myself (in lisp) and it's slower than I'd like it to be. It's basically the "covariance normalization" i.e. the normalization that turns a covariance matrix into a correlation coefficients matrix. J. |
From: Tunc S. <si...@ee...> - 2002-02-01 22:56:01
|
Hi Ray, Jefferson; I've played with Allegro 6.0 as follows (I asked it to explain the compilation of m.*!): (defmethod m.*! ((a real-matrix) (b real-matrix)) (let* ((nxm (number-of-elements b))) (declare (type fixnum nxm) (optimize (speed 3) (safety 0))) (dotimes (k nxm b) (declare (type fixnum k)) (let ((a-val (matrix-ref a k)) (b-val (matrix-ref b k))) (declare (type real-matrix-element-type a-val b-val) (:explain :calls :types :boxing)) (setf (matrix-ref b k) (* a-val b-val)))))) and here is what I get: ;;; Compiling file C:\usr\gigascale\shift\src\matlisp\src\mtimes.lisp ;Examining a (possibly unboxed) call to *_2OP with arguments: ; symeval A-VAL type (DOUBLE-FLOAT * *) ; symeval B-VAL type (DOUBLE-FLOAT * *) ; which returns a value of type (DOUBLE-FLOAT * *) ;Generating a DOUBLE-FLOAT box ;Examining a call to FUNCALL with arguments: ; call to CDR type KNOWN-FUNCTION ; symeval G16043 type (DOUBLE-FLOAT * *) ; symeval G16044 type #<STANDARD-CLASS REAL-MATRIX> ; symeval G16045 type (INTEGER -536870912 536870911) ; which returns a value of type T ;Examining a call to CDR with arguments: ; constant ((EVAL-WHEN-LOADED) SETF-METHOD-LOCATIVE (QUOTE MATRIX-REF) ...) type TRUE-LIST ; which returns a value of type KNOWN-FUNCTION ;;; Writing fasl file C:\usr\gigascale\shift\src\matlisp\bin\mtimes.fasl so my conclusion is that the MATRIX-REF used to access the variables is waaaaay more expensive than AREF. Insomuch that it has the overhead of a generic function dispatch and also a boxing of the results. This will explain the performance figures given by Jefferson, which I was able to duplicate (roughly on the same order) on my Windows 2000 pentium something machine. Now I quickly tried the following change and got a performance increase of exactly 10 fold (ie. from 1.7 seconds per call to .13 seconds) and the compiler is still saying that there is boxing. I don't know the Allegro flags very well, but it seems that the matlisp sources can be modified so that this performance problem disappears and there is no need to do m.*! in fortran. (disclaimer: I DIDN'T TEST IF THE FOLLOWING MODIFICATIONS GIVE CORRECT COMPUTATIONAL RESULTS OR ARE CONSISTENT WITH MATLISP CODING STANDARDS) (defmethod m.*! ((a real-matrix) (b real-matrix)) (let* ((nxm (number-of-elements b)) (aa (store a)) (bb (store b))) (declare (type fixnum nxm) (type (real-matrix-store-type (*)) aa bb) (optimize (speed 3) (safety 0))) (dotimes (k nxm b) (declare (type fixnum k)) (let ((a-val (aref aa k)) (b-val (aref bb k))) (declare (type real-matrix-element-type a-val b-val) (:explain :calls :types :boxing)) (setf (aref bb k) (* a-val b-val)))))) Good luck and keep this list posted on your results, I think a tested version of these changes should be checked in to the repository. -- Tunc ps. I don't know your terminology on covariance normalization. But it seems to me that it should be possible using matlisp functions. Jefferson Provost wrote: > > On 2/1/02 11:34 AM, "Raymond Toy" <to...@rt...> wrote: > > >>>>>> "Jefferson" == Jefferson Provost <jp...@cs...> writes: > > > > You are right. A peek at the generated code on Solaris seems to show > > that we are calling out to generic functions way too often and could > > be vastly optimized. > > > > I know with CMUCL, a Lisp version of m+ was almost as fast (< 5% > > slower?) than Fortran, so m.* should be as fast, even in Lisp. > > > > I'll look in to it soon. > > Thanks, though I'd be happy to just write the thing in fortran myself, but > I'm totally unfamiliar with the foreign function interfaces for Allegro and > CMUCL. > > Actually, I have another routine that I use a lot which I was hoping might > be in MATLISP, but which I couldn't find. I've had to write it myself (in > lisp) and it's slower than I'd like it to be. It's basically the > "covariance normalization" i.e. the normalization that turns a covariance > matrix into a correlation coefficients matrix. > > J. > > _______________________________________________ > Matlisp-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matlisp-users |
From: Raymond T. <to...@rt...> - 2002-06-13 16:22:14
|
>>>>> "Tunc" == Tunc Simsek <si...@ee...> writes: Tunc> Hi Ray, Jefferson; Tunc> I've played with Allegro 6.0 as follows (I asked it to explain Tunc> the compilation of m.*!): Tunc> (defmethod m.*! ((a real-matrix) (b real-matrix)) Tunc> (let* ((nxm (number-of-elements b))) Tunc> (declare (type fixnum nxm) Tunc> (optimize (speed 3) (safety 0))) Tunc> (dotimes (k nxm b) Tunc> (declare (type fixnum k)) Tunc> (let ((a-val (matrix-ref a k)) Tunc> (b-val (matrix-ref b k))) Tunc> (declare (type real-matrix-element-type a-val b-val) Tunc> (:explain :calls :types :boxing)) Tunc> (setf (matrix-ref b k) (* a-val b-val)))))) Jefferson, do you still need this optimized? I think Tunc's version is good enough when both matrices are real. The complex-valued versions need to be done. Ray |
From: Jefferson P. <jp...@cs...> - 2002-06-15 19:55:32
|
I'm not sure. The short answer is that I would love to have it better optmized, but if it's not a priority for you guys, I understand. I looked into optimizing it further myself, but I don't know enough about the lisp optimization to proceed. I managed to get around it by changing my algorithm so I don't need to call it as often, but that solution may be sub-optimal for me in the long run. However, I may end up reimplementing my stuff in C++ (with Guile bindings) for other reasons. I'm not sure... Jeff On 6/13/02 11:21 AM, "Raymond Toy" <to...@rt...> wrote: >>>>>> "Tunc" == Tunc Simsek <si...@ee...> writes: > > Tunc> Hi Ray, Jefferson; > Tunc> I've played with Allegro 6.0 as follows (I asked it to explain > Tunc> the compilation of m.*!): > > Tunc> (defmethod m.*! ((a real-matrix) (b real-matrix)) > Tunc> (let* ((nxm (number-of-elements b))) > Tunc> (declare (type fixnum nxm) > Tunc> (optimize (speed 3) (safety 0))) > > Tunc> (dotimes (k nxm b) > Tunc> (declare (type fixnum k)) > Tunc> (let ((a-val (matrix-ref a k)) > Tunc> (b-val (matrix-ref b k))) > Tunc> (declare (type real-matrix-element-type a-val b-val) > Tunc> (:explain :calls :types :boxing)) > Tunc> (setf (matrix-ref b k) (* a-val b-val)))))) > > Jefferson, do you still need this optimized? I think Tunc's version > is good enough when both matrices are real. The complex-valued > versions need to be done. > > Ray > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Matlisp-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matlisp-users > |
From: Raymond T. <to...@rt...> - 2002-06-15 20:01:32
|
>>>>> "Jefferson" == Jefferson Provost <jp...@cs...> writes: Jefferson> I'm not sure. The short answer is that I would love to have it better Jefferson> optmized, but if it's not a priority for you guys, I understand. If you can, please try the CVS version. I've put in an optimized version that should run as fast as a Fortran version should (on CMUCL). I'd be interested to know if it's fast enough for you. Ray |
From: Jefferson P. <jp...@cs...> - 2002-06-25 08:41:28
|
Raymond Toy wrote: >>>>>>"Jefferson" == Jefferson Provost <jp...@cs...> writes: >>>>> > > Jefferson> I'm not sure. The short answer is that I would love to have it better > Jefferson> optmized, but if it's not a priority for you guys, I understand. > > If you can, please try the CVS version. I've put in an optimized > version that should run as fast as a Fortran version should (on > CMUCL). > > I'd be interested to know if it's fast enough for you. Yes it's quite fast. Multiplying two NxN matrices is still a little slower than the outer product of two N-length vectors (which should also be O(N^2) operations) -- but this is probably from going out of cache on the large matrices, while the smaller vectors will stay in cache. As I mentioned before, I had changed my code so I didn't need to call M.* as often, but that was a sub-optimal solution. But, I realized that my main use of M.* was to "mask" a matrix, i.e. set a bunch of elements to 0.0 using another matrix as a guide: (m.*! mask x). So I wrote a special function just to do that masking. It's MUCH faster, since it does no multiplication at all... (defmethod mmask! ((a real-matrix) (mask real-matrix)) (let* ((nxm (number-of-elements a)) (aa (store a)) (bb (store mask))) (declare (type fixnum nxm) (type (real-matrix-store-type (*)) aa bb) (optimize (speed 3) (safety 0))) (dotimes (k nxm a) (declare (type fixnum k)) (let ((b-val (aref bb k)) ) (declare (type real-matrix-element-type b-val) (:explain :calls :types :boxing)) (when (zerop b-val) (setf (aref aa k) 0.0d0)))))) J. |
From: Nicolas N. <Nic...@iw...> - 2002-06-27 17:30:43
|
Jefferson Provost <jp...@cs...> writes: > Yes it's quite fast. Multiplying two NxN matrices is still a little > slower than the outer product of two N-length vectors (which should also > be O(N^2) operations) -- but this is probably from going out of cache on > the large matrices, while the smaller vectors will stay in cache. Please rethink how many operations a matrix-matrix-multiplication needs... Nicolas. |
From: Raymond T. <to...@rt...> - 2002-06-27 17:36:27
|
>>>>> "Nicolas" == Nicolas Neuss <Nic...@iw...> writes: Nicolas> Jefferson Provost <jp...@cs...> writes: >> Yes it's quite fast. Multiplying two NxN matrices is still a little >> slower than the outer product of two N-length vectors (which should also >> be O(N^2) operations) -- but this is probably from going out of cache on >> the large matrices, while the smaller vectors will stay in cache. Nicolas> Please rethink how many operations a matrix-matrix-multiplication Nicolas> needs... You missed the context, I think, and made the same mistake I originally made. I think Jefferson was talking about element-by-element matrix multiply, not a real matrix multiply. Ray |
From: Nicolas N. <Nic...@iw...> - 2002-06-27 17:40:05
|
Raymond Toy <to...@rt...> writes: > > You missed the context, I think, and made the same mistake I > originally made. I think Jefferson was talking about > element-by-element matrix multiply, not a real matrix multiply. > > Ray Of course. N. (searching for a hole in the ground) |
From: Jefferson P. <jp...@cs...> - 2002-02-02 01:26:37
|
Tunc, Thanks for the help and the code. I'll look at it and see what I can do and let you all know. I may post with some questions on optimization, since I haven't really tried to optimize my code much before. Re: "covariance normalization": I don't know if there's an official name for it, but basically it's an operation on a square matrix A to produce a new matrix B, where B(x,y) = A(x,y)/sqrt(A(x,x)*A(y,y)). If A was a covariance matrix, B will be the corresponding correlation matrix, with 1's on the diagonal and all the other values in the range [-1,1]. (i.e. B(x,y) is the correlation between variable x and variable y.) I have implemented it using MATLISP, but it's not optimized at all... I see how to optimize it now, I think, so hopefully it should speed up a lot. Thanks! Jeff On 2/1/02 4:55 PM, "Tunc Simsek" <si...@ee...> wrote: > Hi Ray, Jefferson; > > I've played with Allegro 6.0 as follows (I asked it to explain > the compilation of m.*!): > > (defmethod m.*! ((a real-matrix) (b real-matrix)) > (let* ((nxm (number-of-elements b))) > (declare (type fixnum nxm) > (optimize (speed 3) (safety 0))) > > (dotimes (k nxm b) > (declare (type fixnum k)) > (let ((a-val (matrix-ref a k)) > (b-val (matrix-ref b k))) > (declare (type real-matrix-element-type a-val b-val) > (:explain :calls :types :boxing)) > (setf (matrix-ref b k) (* a-val b-val)))))) > > > and here is what I get: > > ;;; Compiling file C:\usr\gigascale\shift\src\matlisp\src\mtimes.lisp > ;Examining a (possibly unboxed) call to *_2OP with arguments: > ; symeval A-VAL type (DOUBLE-FLOAT * *) > ; symeval B-VAL type (DOUBLE-FLOAT * *) > ; which returns a value of type (DOUBLE-FLOAT * *) > ;Generating a DOUBLE-FLOAT box > ;Examining a call to FUNCALL with arguments: > ; call to CDR type KNOWN-FUNCTION > ; symeval G16043 type (DOUBLE-FLOAT * *) > ; symeval G16044 type #<STANDARD-CLASS REAL-MATRIX> > ; symeval G16045 type (INTEGER -536870912 536870911) > ; which returns a value of type T > ;Examining a call to CDR with arguments: > ; constant ((EVAL-WHEN-LOADED) SETF-METHOD-LOCATIVE (QUOTE MATRIX-REF) > ...) type TRUE-LIST > ; which returns a value of type KNOWN-FUNCTION > ;;; Writing fasl file C:\usr\gigascale\shift\src\matlisp\bin\mtimes.fasl > > so my conclusion is that the MATRIX-REF used to access the variables is > waaaaay more expensive than AREF. Insomuch that it has the overhead of > a generic function dispatch and also a boxing of the results. > This will explain the performance figures given by Jefferson, which > I was able to duplicate (roughly on the same order) on my Windows 2000 > pentium something machine. > > Now I quickly tried the following change and got a performance increase > of exactly 10 fold (ie. from 1.7 seconds per call to .13 seconds) > and the compiler is still saying that there is boxing. I don't know > the Allegro flags very well, but it seems that the matlisp sources > can be modified so that this performance problem disappears > and there is no need to do m.*! in fortran. (disclaimer: I DIDN'T > TEST IF THE FOLLOWING MODIFICATIONS GIVE CORRECT COMPUTATIONAL > RESULTS OR ARE CONSISTENT WITH MATLISP CODING STANDARDS) > > (defmethod m.*! ((a real-matrix) (b real-matrix)) > (let* ((nxm (number-of-elements b)) > (aa (store a)) > (bb (store b))) > (declare (type fixnum nxm) > (type (real-matrix-store-type (*)) aa bb) > (optimize (speed 3) (safety 0))) > > (dotimes (k nxm b) > (declare (type fixnum k)) > (let ((a-val (aref aa k)) > (b-val (aref bb k))) > (declare (type real-matrix-element-type a-val b-val) > (:explain :calls :types :boxing)) > (setf (aref bb k) (* a-val b-val)))))) > > Good luck and keep this list posted on your results, > I think a tested version of these changes should be checked in > to the repository. > > -- Tunc > > ps. I don't know your terminology on covariance normalization. > But it seems to me that it should be possible using matlisp functions. > > Jefferson Provost wrote: >> >> On 2/1/02 11:34 AM, "Raymond Toy" <to...@rt...> wrote: >> >>>>>>>> "Jefferson" == Jefferson Provost <jp...@cs...> writes: >>> >>> You are right. A peek at the generated code on Solaris seems to show >>> that we are calling out to generic functions way too often and could >>> be vastly optimized. >>> >>> I know with CMUCL, a Lisp version of m+ was almost as fast (< 5% >>> slower?) than Fortran, so m.* should be as fast, even in Lisp. >>> >>> I'll look in to it soon. >> >> Thanks, though I'd be happy to just write the thing in fortran myself, but >> I'm totally unfamiliar with the foreign function interfaces for Allegro and >> CMUCL. >> >> Actually, I have another routine that I use a lot which I was hoping might >> be in MATLISP, but which I couldn't find. I've had to write it myself (in >> lisp) and it's slower than I'd like it to be. It's basically the >> "covariance normalization" i.e. the normalization that turns a covariance >> matrix into a correlation coefficients matrix. >> >> J. >> >> _______________________________________________ >> Matlisp-users mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/matlisp-users > > _______________________________________________ > Matlisp-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matlisp-users > |
From: Raymond T. <to...@rt...> - 2002-02-01 13:28:25
|
>>>>> "Jefferson" == Jefferson Provost <jp...@cs...> writes: Jefferson> On 1/31/02 5:40 PM, "Tunc Simsek" <si...@ee...> wrote: >> what do you mean by native. both allegro and cmucl produce native code. >> looking at mtimes.lisp; it looks like all the declarations are in place >> for m.*! so the only reason that it may be slow is if the optimization >> flags were not set during compilation. I personally never checked that >> they are. It is worth checking this. Jefferson> Sorry, I misspoke: by native I meant to say implemented in fortran. Jefferson> Here are the average times of a few operations. Jefferson> operation | msecs (mean of 100 runs) Jefferson> -----------+--------- Jefferson> (m.*! a b) | 1351.6 Jefferson> (m.+! a b) | 8.6 Jefferson> (m* c d) | 15.2 Jefferson> A,B = 500x500 real-matrices Jefferson> C = 500x1 real matrix Jefferson> D = 1x500 real matrix You do realize that a*b is O(500^3), a+b is O(500^2) and c*d is also O(500^2). So it's not surprising that a*b takes 157 times longer than a+b and 88 times longer than c*d. To speed things up, you may want to get a copy of ATLAS and build matlisp with that if you haven't already. Ray |