From: <si...@EE...> - 2003-11-24 17:51:08
|
Here are the numbers on my ACL6.1/WinXP system: DDOT-long: 26.38 MFLOPS DDOT-short: 89.36 MFLOPS DAXPY-long: 20.31 MFLOPS DAXPY-short: 75.32 MFLOPS BLAS-DDOT-long: 74.48 MFLOPS BLAS-DDOT-short: 34.01 MFLOPS BLAS-DAXPY-long: 36.24 MFLOPS BLAS-DAXPY-short: 31.33 MFLOPS and for reference here was the original figures you posted: DDOT-long: 271.15 MFLOPS DDOT-short: 679.58 MFLOPS DAXPY-long: 143.55 MFLOPS DAXPY-short: 488.06 MFLOPS BLAS-DDOT-long: 267.10 MFLOPS BLAS-DDOT-short: 63.31 MFLOPS BLAS-DAXPY-long: 149.13 MFLOPS BLAS-DAXPY-short: 61.01 MFLOPS I still don't understand the problem since blas seems to be doing better than native lisp in both my test and your test. Tunc ----- Original Message ----- From: Raymond Toy <to...@rt...> Date: Monday, November 24, 2003 8:35 am Subject: Re: [Matlisp-users] Calling Fortran routines on short arrays > >>>>> "Nicolas" == Nicolas Neuss <Nicolas.Neuss@iwr.uni- > heidelberg.de> writes: > > Nicolas> Raymond Toy <to...@rt...> writes: > >> I'll try to look into this. There's probably some > improvement to be > >> had, but I doubt we can improve it enough for you. I think the > >> overhead comes from computing the necessary addresses, and > also having > >> to turn off GC during the computation. IIRC, this involves an > >> unwind-protect which does add quite a bit of code. > > Nicolas> Yes, you are right. I see this now. If switching > off multithreading is > Nicolas> expensive, there is a problem here. I don't know > enough of these things to > Nicolas> help you here. > > It's not multithreading, per se. It's because we can't have GC > suddenly move the vectors before doing the foreign call, otherwise the > foreign function will be reading and writing to some random place in > memory. > > >> Note that I also noticed long ago that a simple vector add > in Lisp was > >> at least as fast as calling BLAS. > > Nicolas> Probably this was before I started using Matlisp. > > Yeah, probably before matlisp became matlisp. > > Nicolas> I will have to do this at least for a small part of > the routines, if the > Nicolas> foreign call cannot be achieved with really little > overhead (say two times > Nicolas> a Lisp function call). I want to implement flexible > sparse block matrices, > > A factor of 2 will be very difficult to achieve, since a Lisp function > call basically loads up a bunch of pointers and calls the function. > We need to compute addresses, do the without-gc/unwind-protect stuff, > load up the registers for a foreign call and then call it. > > Nicolas> and choosing Matlisp data for the blocks would be a > possibility. But the > Nicolas> blocks can be small, therefore I cannot make > compromises when operating on > Nicolas> those blocks. > > I assume you've profiled it so that the small blocks really are the > bottleneck? > > Nicolas> P.S.: BTW, how does ACL perform in this respect? > Just today I read Duane > > Don't know since I don't have a version of ACL that can run matlisp. > > Ray > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > Does SourceForge.net help you be more productive? Does it > help you create better code? SHARE THE LOVE, and help us help > YOU! Click Here: http://sourceforge.net/donate/ > _______________________________________________ > Matlisp-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matlisp-users > |