|
From: Nicolas N. <Nic...@iw...> - 2003-11-24 16:13:03
|
Raymond Toy <to...@rt...> writes: > >>>>> "Nicolas" == Nicolas Neuss <Nic...@iw...> writes: > > > Nicolas> From the numbers it is obvious that the call is even much more expensive > Nicolas> than a daxpy for 256 double-floats. How comes? > > >> as the daxpy for the case +N-short+=256, while calling Lisp functions is > >> much faster. Is it possible to cut down these costs? > >> > >> Thanks, Nicolas. > >> > > I'll try to look into this. There's probably some improvement to be > had, but I doubt we can improve it enough for you. I think the > overhead comes from computing the necessary addresses, and also having > to turn off GC during the computation. IIRC, this involves an > unwind-protect which does add quite a bit of code. Yes, you are right. I see this now. If switching off multithreading is expensive, there is a problem here. I don't know enough of these things to help you here. > Note that I also noticed long ago that a simple vector add in Lisp was > at least as fast as calling BLAS. Probably this was before I started using Matlisp. > However, having everything go through FFI to BLAS at least allows us to > take advantage of any special libraries that might be available. > > I, however, am not opposed to implementing the BLAS in Lisp. Other > LAPACK routines will still use the original BLAS, and Lisp code can > get the faster versions. Will need thinking, design, and > experimentation. I will have to do this at least for a small part of the routines, if the foreign call cannot be achieved with really little overhead (say two times a Lisp function call). I want to implement flexible sparse block matrices, and choosing Matlisp data for the blocks would be a possibility. But the blocks can be small, therefore I cannot make compromises when operating on those blocks. Thanks, Nicolas. P.S.: BTW, how does ACL perform in this respect? Just today I read Duane writing about interoperability of ACL with C and C++. If the overhead we are suffering from is necessary in general, this might be quite a problem for some applications. |