Psyco, the Python Specializing Compiler / Bugs / #14 compiled code doesn't get compiled ;o)

#14 compiled code doesn't get compiled ;o)

Status: closed

Owner: Armin Rigo

Labels: Psyco compiler (33)

Priority: 5

Updated: 2002-10-03

Created: 2002-09-16

Creator: Alexandre Fayolle

Private: No

Hello,

I'm working on a project where I need to eval()
user-supplied expressions a huge number of time.
Typical expressions are 'a+b>0'. I use the compile()
builtin function to compile the expression into a code
object, and then call the eval() builtin function with
the appropriate local dictionnary.

I think I could get a huge performance boost if psyco
could work on code objects, but this is unfortunately
not currently possible. Is there a problem I'm not
aware of, or is this just a use case you had not imagined ?

>>> import psyco
>>> a = compile('a==b','toto','eval')
>>> a
<code object ? at 0x815ae58, file "toto", line -1>
>>> b = psyco.proxy(a)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/home/alf/lib/python/psyco/__init__.py", line
89, in proxy
raise TypeError, 'function or method required'
TypeError: function or method required

Cheers,

Alexandre

Discussion

Armin Rigo - 2002-09-16

Logged In: YES
user_id=4771

I haven't thought about this case. The main problem is about
the locals: Psyco only efficiently handles the so-called "fast
locals" of Python, which are the variables that can be found to
be locals at compile-time. For example, in "lambda a: a+1",
the 'a' always refers to a local, but in "compile('a+1')", the 'a'
might be a local or a global. Thus the code object is not the
same in the previous two examples, and Psyco would not be
efficient on the second one (even if it could be made to work
on it).

I will try to come up with a good solution; right now, I suggest
a hack: embed your compiled code into a function. For
example, try

user_expr = "a==b"
f = eval("lambda a,b: %s" % user_expr)
g = psyco.proxy(f)
g(5,6)

In fact, the above idiom seems clean enough (cleaner than
having to build a custom locals dict). Maybe Python itself
would benefit from a standard function that compiles an
expression into a lambda, given specified arguments;
something like

compile_lambda('a==b', ['a','b'])

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Fayolle - 2002-09-24

Logged In: YES
user_id=116727

Hello,

I've tested what you suggest in your reply. The good news is
that using a lambda instead of a compiled code object is
much faster, so actually gain some speed (about 40% faster).
The bad news is that the proxyfied code is slower than the
original.

looping over f and g in the code you gave as an example
illustrates this dramatically: it takes 3 times longer with
g than with f on my machine.

Alexandre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Armin Rigo - 2002-09-24

Logged In: YES
user_id=4771

Calling a proxyfied object involves some overhead, so that you cannot
hope to speed up a single very simple operation between Python objects
coming from "outside" Psyco. You will get much better results by
compiling the function that contains the loop. In general, you will want
Psyco to compile the function that contains the core of your algorithm,
and not just a loop-less function.

There are subtle considerations involved in calling variable functions with
Psyco. The following should be fast because 'f' is a global variable that
doesn't get modified:

user_expr = "a==b"
f = eval("lambda a,b: %s" % user_expr)
def test(list1, list2):
for a,b in zip(list1, list2):
f(a,b)
test(range(100000), range(100000,0,-1))
psyco.bind(test)
test(range(100000), range(100000,0,-1))

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Fayolle - 2002-09-24

Logged In: YES
user_id=116727

Well, in that case, I'm already doing this, since the
calling function is in a classe deriving from psyobj.

You were asking for benchmarks the other day on c.l.py.
Here's one.

Solving the N-queens problem using logilab.constraint for 9
queens takes about 21 seconds on my machine without psyco.
With psyco, it goes down to 13 seconds (about 35% shorter).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Armin Rigo - 2002-09-24

Logged In: YES
user_id=4771

Ok. I was expecting that calling variable functions from the
same point in the source code would call them without
Psyco, so that they would run at Python speed -- but not
three times slower! I guess there is a problem in the code
calling Psyco proxies. I will try to come up with a solution
that can massively speed up the calls of variable but
explicitely proxyfied functions (but not variable and non-
proxyfied functions -- this might blow up the memory by
unexpectedly compiling new code over and over).

I moved your report back to the "bugs" category :-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Armin Rigo - 2002-09-24

labels: --> Psyco compiler

assigned_to: nobody --> arigo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Armin Rigo - 2002-10-03

Logged In: YES
user_id=4771

According to some tests I made:

def testing(user_expr):
f = eval("lambda a,b: %s" % user_expr)
# f = psyco.proxy(f)
for i, j in something:
f(i, j)

Timings, with or without the commented proxy(f) line, with or
without a prior psyco.bind(testing):

* no psyco at all: 0.39 s
* with bind(testing) only: 0.07 s
* with proxy(f) only: 1.49 s
* with both: 0.07 s

so bind(testing) is the thing to do. In this case, adding a
proxy(f) doesn't hurt and doesn't help; but just don't do it: it
hurts a lot if Python has to call your proxy at each iteration of
the loop.

I'm considering this behavior as "expected" and closing the
report. I expect future work on profilers to automatically
detect that it is a good idea to bind testing() and not just f().
A reasonable heuristic might be to choose to bind the
functions with the higher time-spent-per-call ratio.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Armin Rigo - 2002-10-03

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.