Thread: [Pyobjc-dev] Re: [Python-Dev] PyBuffer* vs. array.array()
Brought to you by:
ronaldoussoren
From: Scott G. <xs...@ya...> - 2003-01-05 22:19:14
|
--- Guido van Rossum <gu...@py...> wrote: > > In writing the unit tests, I came across a problematic situation that > > could easily arise in code (feel free to comment on the silliness of > > this code, if any... and note that I'm using the comprehension style > > even after that long rant I posted earlier :-): > > > > singlePlane = array.array('B') > > singlePlane.fromlist([0 for x in range(0, width*height*3)] ) > > I'm not sure if you were joking, but why not write > > singlePlane.fromlist([0] * (width*height*3)) > > ??? > Or cheaper and faster for large width and height: singlePlane = array.array('B', [0])*width*height*3 __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Bill B. <bb...@co...> - 2003-01-06 15:06:31
|
On Sunday, Jan 5, 2003, at 16:58 US/Eastern, Guido van Rossum wrote: >> singlePlane = array.array('B') >> singlePlane.fromlist([0 for x in range(0, width*height*3)] ) > > I'm not sure if you were joking, but why not write > > singlePlane.fromlist([0] * (width*height*3)) > > ??? Not joking; not thinking and haven't really done large blob manipulation in Python before. That answers another question, though -- if I were to build an image with four channels-- red, green, blue, alpha-- and wanted the alpha channel to be set to 255 throughout, then I would do... singlePlane.fromlist([0, 0, 0, 255] * (width * height)) ... or ... array.array('B', [0, 0, 0, 255]) * width * height >> ........... >> -- > > I'm not sure I understand the problem. I was hoping that there was a single object type that could easily be used from both the C and Python side that could contain a large buffer of binary/byte data. What I really need is a fixed length buffer that supports slicing style assignments / getters. The type of the elements is largely irrelevant save for that each element needs to be accessed as a single byte. The fixed length requirement comes from the need to encapsulate buffers of memory as returned by various APIs outside of Python. In this case, I'm providing access to hunks of memory controlled by the APIs provided by the Foundation and the AppKit within Cocoa (or GNUstep). I also need to allocate a hunk of memory-- an array of bytes, a string, a buffer, whatever-- and pass it off through the AppKit/Foundation APIs. Once those APIs have the address and length of the buffer, that address and length must remain constant over time. I would really like to be able to do the allocation from the Python side of the fence-- allocate, initialize with a particular byte pattern, and pass it off to the Foundation/AppKit (while still being able to manipulate the contents in Python). The PyBuffer* C API seems to be ideal in that a buffer object produced via the PyBuffer_New() function is read/write (unlike a buffer produced by buffer() in Python), contains a reference to a fixed length array at a fixed address, and is truly a bag o' bytes. At this point, I'll probably add some kind of an 'allocate' function to the 'objc' module that simply calls PyBuffer_New(). Did that -- works except, of course, the resulting buffer is an array of chars such that slicing assignments have to take strings. Inconvenient, but workable: >>> import objc >>> b = objc.allocateBuffer(100) >>> type(b) <type 'buffer'> >>> b[0:10] = range(0,10) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: bad argument type for built-in operation >>> b[0:10] = [chr(x) for x in range(0,10)] Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: bad argument type for built-in operation >>> b[0:10] = "".join([chr(x) for x in range(0,10)]) >>> b <read-write buffer ptr 0x1ad4bc, size 100 at 0x1ad4a0> >>> b[0:15] '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\x00\x00\x00\x00\x00' > You could use the 'c' code for creating an array instead of 'B'. Right; as long as it is a byte, it doesn't matter. I chose 'B' because it is an unsigned numeric type. Since I'm generating numeric data that is shoved into the bitmap as R,G,B triplets, a numeric type seemed to be the most convenient. > Or you can use the tostring() method on the array to convert it to a > string. > > Or you could use buffer() on the array. > But why don't you just use strings for binary data, like everyone > else? Because strings are variable length, do not support slice style assignments, and require all numeric data to be converted to a string before being used as 'data'. b.bum |
From: Christian T. <ti...@ti...> - 2003-01-05 23:25:33
|
Guido van Rossum wrote: >>>I'm not sure if you were joking, but why not write >>> >>> singlePlane.fromlist([0] * (width*height*3)) >> >> Or cheaper and faster for large width and height: >> >> singlePlane = array.array('B', [0])*width*height*3 > > > Correct; then even better: > > singlePlane = array.array('B', [0]) * (width*height*3) > > i.e. do only one sequence repeat rather than three. For "large" widths and heights, like 1000*1000, this effect is remarkably small: About 3 percent only. The above is true for simple lists. There are also counterexamples, where you are extremely wrong (sorry), most probably due to the mplementation, but also by the effect, that medium sized flat objects can be copied more efficiently than very small ones. >>> if 1: ... t = time.clock() ... for i in xrange(100): ... s = ' ' * 1000 * 1000 ... print time.clock()-t ... 0.674784644417 >>> if 1: ... t = time.clock() ... for i in xrange(100): ... s = ' ' * 1000000 ... print time.clock()-t ... 6.28695295072 >>> Did I hear you head knocking on the keyborard? ciao - chris -- Christian Tismer :^) <mailto:ti...@ti...> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ |
From: Christian T. <ti...@ti...> - 2003-01-05 23:44:58
|
Christian Tismer wrote: > Guido van Rossum wrote: ... >> Correct; then even better: >> >> singlePlane = array.array('B', [0]) * (width*height*3) >> >> i.e. do only one sequence repeat rather than three. Here an addition to my former note. Doing some simple analysis of this, I found that it is generally safer *not* to do huge repetitions of very small objects. If you always use intermediate steps, you are creating some slight overhead, but you will never step into traps like these: > >>> if 1: > ... t = time.clock() > ... for i in xrange(100): > ... s = ' ' * 1000 * 1000 > ... print time.clock()-t > ... > 0.674784644417 > >>> if 1: > ... t = time.clock() > ... for i in xrange(100): > ... s = ' ' * 1000000 > ... print time.clock()-t > ... > 6.28695295072 > >>> Analysis: The central copying code in stringobject.c is the following tight loop: for (i = 0; i < size; i += a->ob_size) memcpy(op->ob_sval+i, a->ob_sval, (int) a->ob_size); For my example, this memcpy is started for every single of the one million bytes. So the overhead of memcpy, let is be a function call or a macro, will be executed a million times. On the other hand, doing ' ' * 1000 * 1000 only has to call memcpy 2000 times. My advice: Do not go from very small to very large in one big step, but go to reasonable chunks. ciao - chris -- Christian Tismer :^) <mailto:ti...@ti...> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ |
From: Jack J. <Jac...@or...> - 2003-01-06 01:45:41
|
On maandag, jan 6, 2003, at 00:46 Europe/Amsterdam, Christian Tismer wrote: > The central copying code in stringobject.c is the following > tight loop: > > for (i = 0; i < size; i += a->ob_size) > memcpy(op->ob_sval+i, a->ob_sval, (int) a->ob_size); > > For my example, this memcpy is started for every single > of the one million bytes. So the overhead of memcpy, > let is be a function call or a macro, will be executed > a million times. Oops, I replied before seeing this message, this does sound plausible. But that gives an easy way to fix it: for copies larger than a certain factor just copy the source object, then duplicate the source object until you're at size/2, then duplicat the last bit. That is, if it is worth the trouble to optimize this, -- - Jack Jansen <Jac...@or...> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - |