Re: [Matlisp-users] Calling Fortran routines on short arrays

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Here are the numbers on my ACL6.1/WinXP system:

DDOT-long: 26.38 MFLOPS
DDOT-short: 89.36 MFLOPS
DAXPY-long: 20.31 MFLOPS
DAXPY-short: 75.32 MFLOPS

BLAS-DDOT-long: 74.48 MFLOPS
BLAS-DDOT-short: 34.01 MFLOPS
BLAS-DAXPY-long: 36.24 MFLOPS
BLAS-DAXPY-short: 31.33 MFLOPS

and for reference here was the original figures you posted:

DDOT-long: 271.15 MFLOPS
DDOT-short: 679.58 MFLOPS
DAXPY-long: 143.55 MFLOPS
DAXPY-short: 488.06 MFLOPS

BLAS-DDOT-long: 267.10 MFLOPS
BLAS-DDOT-short: 63.31 MFLOPS
BLAS-DAXPY-long: 149.13 MFLOPS
BLAS-DAXPY-short: 61.01 MFLOPS

I still don't understand the problem since blas seems to be
doing better than native lisp in both my test and your test.

Tunc

----- Original Message -----
From: Raymond Toy <to...@rt...>
Date: Monday, November 24, 2003 8:35 am
Subject: Re: [Matlisp-users] Calling Fortran routines on short arrays

> >>>>> "Nicolas" == Nicolas Neuss <Nicolas.Neuss@iwr.uni-
> heidelberg.de> writes:
> 
>    Nicolas> Raymond Toy <to...@rt...> writes:
>    >> I'll try to look into this.  There's probably some 
> improvement to be
>    >> had, but I doubt we can improve it enough for you.  I think the
>    >> overhead comes from computing the necessary addresses, and 
> also having
>    >> to turn off GC during the computation.  IIRC, this involves an
>    >> unwind-protect which does add quite a bit of code.
> 
>    Nicolas> Yes, you are right.  I see this now.  If switching 
> off multithreading is
>    Nicolas> expensive, there is a problem here.  I don't know 
> enough of these things to
>    Nicolas> help you here.
> 
> It's not multithreading, per se.  It's because we can't have GC
> suddenly move the vectors before doing the foreign call, otherwise the
> foreign function will be reading and writing to some random place in
> memory.
> 
>    >> Note that I also noticed long ago that a simple vector add 
> in Lisp was
>    >> at least as fast as calling BLAS.
> 
>    Nicolas> Probably this was before I started using Matlisp.
> 
> Yeah, probably before matlisp became matlisp.
> 
>    Nicolas> I will have to do this at least for a small part of 
> the routines, if the
>    Nicolas> foreign call cannot be achieved with really little 
> overhead (say two times
>    Nicolas> a Lisp function call).  I want to implement flexible 
> sparse block matrices,
> 
> A factor of 2 will be very difficult to achieve, since a Lisp function
> call basically loads up a bunch of pointers and calls the function.
> We need to compute addresses, do the without-gc/unwind-protect stuff,
> load up the registers for a foreign call and then call it.
> 
>    Nicolas> and choosing Matlisp data for the blocks would be a 
> possibility.  But the
>    Nicolas> blocks can be small, therefore I cannot make 
> compromises when operating on
>    Nicolas> those blocks.
> 
> I assume you've profiled it so that the small blocks really are the
> bottleneck?
> 
>    Nicolas> P.S.: BTW, how does ACL perform in this respect?  
> Just today I read Duane
> 
> Don't know since I don't have a version of ACL that can run matlisp.
> 
> Ray
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> Does SourceForge.net help you be more productive?  Does it
> help you create better code?  SHARE THE LOVE, and help us help
> YOU!  Click Here: http://sourceforge.net/donate/
> _______________________________________________
> Matlisp-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matlisp-users
>