From: David C. <da...@ar...> - 2006-11-09 06:33:13
|
Josh Marshall wrote: > > > I don't see how you are going to get around doing the copies. Matlab > is in a separate process from the Python interpreter, and there is no > shared memory. In what way do you want these proxy classes to "look > like numpy arrays"? I am not talking about the copy in the matlab <-> python interaction. This is done through pipe, handled by the OS; I don't know the details, but I know that communication through pipe is quite fast under linux (see below), and is not the bottleneck. > > Note that mlabwrap creates proxy arrays, and only copies the data if > you actually request it to. (AFAIRemember) Otherwise you aren't losing > any speed, because there aren't going to be any copies. There may be no copy for returned data you don't need, but that's not the case I am talking about. For all other cases, I don't think this is what's happening: if you take a look at mlabwrap, in the C mlabraw module, the function mlabraw_put always calls numeric2mx for arrays, which itself always calls makeMxFromNumeric, which makes a copy. Same in the other direction once you call mlabwrap_get. I am doing the same in my module, because that's the simplest thing to do. The problem is that when you are using the function engPutVariable of the matlab engine API, you need to give a pointer to a mxArray structure, which is the C representation of a matlab array. You cannot say (this is one of the brain damaged thing of matlab C api I was talking in an other mail): build a mxArray from existing data: this is the copy I am talking about, and this is one expensive. In the best case (real numpy arrays with fortran storage), you can do a memcpy, but in most cases, you need to do something which takes strides into account (because complex matlab arrays are actually not fortran, or because by default, most numpy arrays are C storage, and this makes a difference for rank >= 2), which implies non-contiguous memory access, which is *really* expensive (around 2 cycles/byte at best, on my bi Xeon 3.2 Ghz). Basically, if you want to do something like calling the resample function of matlab on an numpy array and using the result later in numpy, here is what's happening right now: 1 copy numpy (or numarray in the case of mlabwrap, but this should not matter, I guess) data into an mxArray 2 send the mxArray to matlab engine: done with pipe (imply copy ? At least, it is contiguous array copy) 3 compute the thing into matlab 4 send the result to python mxArray 5 copy the data of the mxArray to numpy array A quick profiling show that if you don't do any processing in matlab, just sending and getting an array back, 1 and 5 takes roughly 80-90 % of the time in my implementation (which is faster than mlabwrap, but I think this is just caused by the much fancier API of mlabwrap, ie the core mecanism to pass arrays should be roughly the same, as mlabwrap uses the C function makeMxFromNumeric, and I am using a similar function myself through ctypes), the 10-20% are used for the communication through the pipe. I believe that most typical usage cases involve 1 and 5. 5 should be avoidable in many cases if I know how to build a proxy class around the mxArray so that the the proxy behaves as a numpy array, with the buffer owned by the mxArray; but I don't know how to do that (particularly, how to handle the destruction of data, as the proxy should destroy the mxArray once the proxy object is garbage collected). 1 would be easy if the C matlab API was sane, which is not the case; they give functions which are impossible to use correctly (mxSetPr and mxSetData). > > What could be possible to do is add an array interface to the mlabwrap > proxy classes so they can be used as numpy arrays when required for > passing to numpy functions (or PIL, etc). Thus we only copy when we > want to use numpy functions. Then we could define the operators on the > proxy class to perform their operations on the other side of the bridge. Yes, that's what I want to do, and in theory, this should be possible without copy; my initial question in the beginning of the thread is how to build a numpy proxy class from existing buffer of data, with the proxy becoming the owner of the data (ie should do all the deallocation, including here cleaning mxArray structures). cheers, David |