Thread: [Pyobjc-dev] Re: [Python-Dev] PyBuffer* vs. array.array()

Brought to you by: ronaldoussoren

pyobjc-dev

[Pyobjc-dev] Re: [Python-Dev] PyBuffer* vs. array.array()

From: Scott G. <xs...@ya...> - 2003-01-05 22:19:14

--- Guido van Rossum <gu...@py...> wrote:
> > In writing the unit tests, I came across a problematic situation that  
> > could easily arise in code (feel free to comment on the silliness of  
> > this code, if any... and note that I'm using the comprehension style  
> > even after that long rant I posted earlier :-):
> > 
> >          singlePlane = array.array('B')
> >          singlePlane.fromlist([0 for x in  range(0, width*height*3)] )
> 
> I'm not sure if you were joking, but why not write
> 
>          singlePlane.fromlist([0] * (width*height*3))
> 
> ???
> 

  Or cheaper and faster for large width and height:

           singlePlane = array.array('B', [0])*width*height*3









__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

[Pyobjc-dev] Re: [Python-Dev] PyBuffer* vs. array.array()

From: Bill B. <bb...@co...> - 2003-01-06 15:06:31

On Sunday, Jan 5, 2003, at 16:58 US/Eastern, Guido van Rossum wrote:
>>          singlePlane = array.array('B')
>>          singlePlane.fromlist([0 for x in  range(0, width*height*3)] )
>
> I'm not sure if you were joking, but why not write
>
>          singlePlane.fromlist([0] * (width*height*3))
>
> ???

Not joking; not thinking and haven't really done large blob 
manipulation in Python before.   That answers another question, though 
-- if I were to build an image with four channels-- red, green, blue, 
alpha-- and wanted the alpha channel to be set to 255 throughout, then 
I would do...

     singlePlane.fromlist([0, 0, 0, 255] * (width * height))

... or ...

     array.array('B', [0, 0, 0, 255])  * width * height

>> ...........
>> --
>
> I'm not sure I understand the problem.

I was hoping that there was a single object type that could easily be 
used from both the C and Python side that could contain a large buffer 
of binary/byte data.

What I really need is a fixed length buffer that supports slicing style 
assignments / getters.  The type of the elements is largely irrelevant 
save for that each element needs to be accessed as a single byte.

The fixed length requirement comes from the need to encapsulate buffers 
of memory as returned by various APIs outside of Python.   In this 
case, I'm providing access to hunks of memory controlled by the APIs 
provided by the Foundation and the AppKit within Cocoa (or GNUstep).

I also need to allocate a hunk of memory-- an array of bytes, a string, 
a buffer, whatever-- and pass it off through the AppKit/Foundation 
APIs.   Once those APIs have the address and length of the buffer, that 
address and length must remain constant over time.   I would really 
like to be able to do the allocation from the Python side of the 
fence-- allocate, initialize with a particular byte pattern, and pass 
it off to the Foundation/AppKit (while still being able to manipulate 
the contents in Python).

The PyBuffer* C API seems to be ideal in that a buffer object produced 
via the PyBuffer_New() function is read/write (unlike a buffer produced 
by buffer() in Python), contains a reference to a fixed length array at 
a fixed address, and is truly a bag o' bytes.

At this point, I'll probably add some kind of an 'allocate' function to 
the 'objc' module that simply calls PyBuffer_New().

Did that -- works except, of course, the resulting buffer is an array 
of chars such that slicing assignments have to take strings.  
Inconvenient, but workable:

 >>> import objc
 >>> b = objc.allocateBuffer(100)
 >>> type(b)
<type 'buffer'>
 >>> b[0:10] = range(0,10)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation
 >>> b[0:10] = [chr(x) for x in range(0,10)]
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation
 >>> b[0:10] = "".join([chr(x) for x in range(0,10)])
 >>> b
<read-write buffer ptr 0x1ad4bc, size 100 at 0x1ad4a0>
 >>> b[0:15]
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\x00\x00\x00\x00\x00'

> You could use the 'c' code for creating an array instead of 'B'.

Right;  as long as it is a byte, it doesn't matter.  I chose 'B' 
because it is an unsigned numeric type.   Since I'm generating numeric 
data that is shoved into the bitmap as R,G,B triplets, a numeric type 
seemed to be the most convenient.

> Or you can use the tostring() method on the array to convert it to a
> string.
>
> Or you could use buffer() on the array.
> But why don't you just use strings for binary data, like everyone
> else?

Because strings are variable length, do not support slice style 
assignments, and require all numeric data to be converted to a string 
before being used as 'data'.

b.bum

[Pyobjc-dev] Re: [Python-Dev] PyBuffer* vs. array.array()

From: Christian T. <ti...@ti...> - 2003-01-05 23:25:33

Guido van Rossum wrote:
>>>I'm not sure if you were joking, but why not write
>>>
>>>         singlePlane.fromlist([0] * (width*height*3))
>>
>>  Or cheaper and faster for large width and height:
>>
>>           singlePlane = array.array('B', [0])*width*height*3
> 
> 
> Correct; then even better:
> 
>              singlePlane = array.array('B', [0]) * (width*height*3)
> 
> i.e. do only one sequence repeat rather than three.

For "large" widths and heights, like 1000*1000, this
effect is remarkably small: About 3 percent only.
The above is true for simple lists.

There are also counterexamples, where you are
extremely wrong (sorry), most probably due to
the mplementation, but also by the effect, that
medium sized flat objects can be copied more
efficiently than very small ones.

 >>> if 1:
... 	t = time.clock()
... 	for i in xrange(100):
... 		s = ' ' * 1000 * 1000
... 	print time.clock()-t
...
0.674784644417
 >>> if 1:
... 	t = time.clock()
... 	for i in xrange(100):
... 		s = ' ' * 1000000
... 	print time.clock()-t
...
6.28695295072
 >>>

Did I hear you head knocking on the keyborard?

ciao - chris
-- 
Christian Tismer             :^)   <mailto:ti...@ti...>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/

[Pyobjc-dev] Slow String Repeat (was: [Python-Dev] PyBuffer* vs. array.array())

From: Christian T. <ti...@ti...> - 2003-01-05 23:44:58

Christian Tismer wrote:
> Guido van Rossum wrote:
...

>> Correct; then even better:
>>
>>              singlePlane = array.array('B', [0]) * (width*height*3)
>>
>> i.e. do only one sequence repeat rather than three.

Here an addition to my former note.

Doing some simple analysis of this, I found that
it is generally safer *not* to do huge repetitions
of very small objects.
If you always use intermediate steps, you are
creating some slight overhead, but you will never
step into traps like these:

>  >>> if 1:
> ...     t = time.clock()
> ...     for i in xrange(100):
> ...         s = ' ' * 1000 * 1000
> ...     print time.clock()-t
> ...
> 0.674784644417
>  >>> if 1:
> ...     t = time.clock()
> ...     for i in xrange(100):
> ...         s = ' ' * 1000000
> ...     print time.clock()-t
> ...
> 6.28695295072
>  >>>

Analysis:
The central copying code in stringobject.c is the following
tight loop:

	for (i = 0; i < size; i += a->ob_size)
		memcpy(op->ob_sval+i, a->ob_sval, (int) a->ob_size);

For my example, this memcpy is started for every single
of the one million bytes. So the overhead of memcpy,
let is be a function call or a macro, will be executed
a million times.

On the other hand, doing ' ' * 1000 * 1000 only
has to call memcpy 2000 times.

My advice: Do not go from very small to very large in
one big step, but go to reasonable chunks.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:ti...@ti...>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/

[Pyobjc-dev] Re: Slow String Repeat (was: [Python-Dev] PyBuffer* vs. array.array())

From: Jack J. <Jac...@or...> - 2003-01-06 01:45:41

On maandag, jan 6, 2003, at 00:46 Europe/Amsterdam, Christian Tismer 
wrote:

> The central copying code in stringobject.c is the following
> tight loop:
>
> 	for (i = 0; i < size; i += a->ob_size)
> 		memcpy(op->ob_sval+i, a->ob_sval, (int) a->ob_size);
>
> For my example, this memcpy is started for every single
> of the one million bytes. So the overhead of memcpy,
> let is be a function call or a macro, will be executed
> a million times.

Oops, I replied before seeing this message, this does sound plausible. 
But that gives an easy way to fix it: for copies larger than a certain 
factor just copy the source object, then duplicate the source object 
until you're at size/2, then duplicat the last bit.

That is, if it is worth the trouble to optimize this,
--
- Jack Jansen        <Jac...@or...>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -