Here are the numbers on my ACL6.1/WinXP system:
DDOT-long: 26.38 MFLOPS
DDOT-short: 89.36 MFLOPS
DAXPY-long: 20.31 MFLOPS
DAXPY-short: 75.32 MFLOPS
BLAS-DDOT-long: 74.48 MFLOPS
BLAS-DDOT-short: 34.01 MFLOPS
BLAS-DAXPY-long: 36.24 MFLOPS
BLAS-DAXPY-short: 31.33 MFLOPS
and for reference here was the original figures you posted:
DDOT-long: 271.15 MFLOPS
DDOT-short: 679.58 MFLOPS
DAXPY-long: 143.55 MFLOPS
DAXPY-short: 488.06 MFLOPS
BLAS-DDOT-long: 267.10 MFLOPS
BLAS-DDOT-short: 63.31 MFLOPS
BLAS-DAXPY-long: 149.13 MFLOPS
BLAS-DAXPY-short: 61.01 MFLOPS
I still don't understand the problem since blas seems to be
doing better than native lisp in both my test and your test.
Tunc
----- Original Message -----
From: Raymond Toy <to...@rt...>
Date: Monday, November 24, 2003 8:35 am
Subject: Re: [Matlisp-users] Calling Fortran routines on short arrays
> >>>>> "Nicolas" == Nicolas Neuss <Nicolas.Neuss@iwr.uni-
> heidelberg.de> writes:
>
> Nicolas> Raymond Toy <to...@rt...> writes:
> >> I'll try to look into this. There's probably some
> improvement to be
> >> had, but I doubt we can improve it enough for you. I think the
> >> overhead comes from computing the necessary addresses, and
> also having
> >> to turn off GC during the computation. IIRC, this involves an
> >> unwind-protect which does add quite a bit of code.
>
> Nicolas> Yes, you are right. I see this now. If switching
> off multithreading is
> Nicolas> expensive, there is a problem here. I don't know
> enough of these things to
> Nicolas> help you here.
>
> It's not multithreading, per se. It's because we can't have GC
> suddenly move the vectors before doing the foreign call, otherwise the
> foreign function will be reading and writing to some random place in
> memory.
>
> >> Note that I also noticed long ago that a simple vector add
> in Lisp was
> >> at least as fast as calling BLAS.
>
> Nicolas> Probably this was before I started using Matlisp.
>
> Yeah, probably before matlisp became matlisp.
>
> Nicolas> I will have to do this at least for a small part of
> the routines, if the
> Nicolas> foreign call cannot be achieved with really little
> overhead (say two times
> Nicolas> a Lisp function call). I want to implement flexible
> sparse block matrices,
>
> A factor of 2 will be very difficult to achieve, since a Lisp function
> call basically loads up a bunch of pointers and calls the function.
> We need to compute addresses, do the without-gc/unwind-protect stuff,
> load up the registers for a foreign call and then call it.
>
> Nicolas> and choosing Matlisp data for the blocks would be a
> possibility. But the
> Nicolas> blocks can be small, therefore I cannot make
> compromises when operating on
> Nicolas> those blocks.
>
> I assume you've profiled it so that the small blocks really are the
> bottleneck?
>
> Nicolas> P.S.: BTW, how does ACL perform in this respect?
> Just today I read Duane
>
> Don't know since I don't have a version of ACL that can run matlisp.
>
> Ray
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> Does SourceForge.net help you be more productive? Does it
> help you create better code? SHARE THE LOVE, and help us help
> YOU! Click Here: http://sourceforge.net/donate/
> _______________________________________________
> Matlisp-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matlisp-users
>
|