[Sqlalchemy-tickets] Issue #3176: faster KeyedTuple (zzzeek/sqlalchemy)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

New issue 3176: faster KeyedTuple
https://bitbucket.org/zzzeek/sqlalchemy/issue/3176/faster-keyedtuple

Mike Bayer:

collections.namedtuple() is much much faster on creation of tuples, and KeyedTuple is actually slower than straight object.  But namedtuple() is very slow to create new types as the code to do so uses eval(), stack frame inspection, and other overbuilt things that slow us down too much on a per-query basis.  What to do?   We can hedge between both approaches.  Here is a speed breakdown:

```
#!python

import collections
import timeit

from sqlalchemy.util import KeyedTuple, lightweight_named_tuple

def go1(size):
    nt = collections.namedtuple('a', ['x', 'y', 'z'])
    result = [
        nt(1, 2, 3) for i in range(size)
    ]

def go2(size):
    labels = ['x', 'y', 'z']
    result = [
        KeyedTuple([1, 2, 3], labels) for i in range(size)
    ]

def go3(size):
    nt = lightweight_named_tuple('a', ['x', 'y', 'z'])
    result = [
        nt([1, 2, 3]) for i in range(size)
    ]

for size, num in [
    (10, 10000),
    (100, 1000),
    (10000, 100),
    (1000000, 10),
]:
    print "-----------------"
    print "size=%d num=%d" % (size, num)
    print "namedtuple:", timeit.timeit("go1(%s)" % size, "from __main__ import go1", number=num)
    print "keyedtuple:", timeit.timeit("go2(%s)" % size, "from __main__ import go2", number=num)
    print "lw keyed tuple:", timeit.timeit("go3(%s)" % size, "from __main__ import go3", number=num)

```

output:

```
#!

-----------------
size=10 num=10000
namedtuple: 3.60116696358
keyedtuple: 0.257042884827
lw keyed tuple: 0.571335792542
-----------------
size=100 num=1000
namedtuple: 0.362484931946
keyedtuple: 0.24974322319
lw keyed tuple: 0.0887930393219
-----------------
size=10000 num=100
namedtuple: 0.562417030334
keyedtuple: 2.53507685661
lw keyed tuple: 0.607440948486
-----------------
size=1000000 num=10
namedtuple: 5.84964299202
keyedtuple: 28.8070271015
lw keyed tuple: 6.69921588898

```

we can see that namedtuple is very slow for lots of distinct types.  But then that keyedtuple is *really* slow for a lot of instances on a small number of types.   the new lw_tuple is almost as fast as namedtuple on instance create and just a bit slower than keyedtuple on making new types.   it is definitely the best option in the graph.