Re: [Modeling-users] Fetching raw rows

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I wrote:
[...]
> [raw row] 1st fetch (ec empty):                0.618131995201
> [raw row] 2nd fetch (objects already loaded):  0.61448597908
> [raw row] 3rd fetch (objects already loaded):  2.24008309841
[...]
>   2. You probably already noticed that fetching raw rows is
>      significantly slower when the objects are loaded. The reason is
>      that objects already loaded are checked for modification, because
>      as I explained it in my previous post we have to return the
>      modifications, not the fetched data, if an object has been
>      modified.
>=20
>      I'm currently studying this, I have an implementation that does not
>      consume more time when no objects are modified:
>=20
> [raw row] 1st fetch (ec empty):                0.595005989075
> [raw row] 2nd fetch :                          0.585139036179
> [raw row] 3rd fetch (objects already loaded):  0.607128024101
>=20
>      However, still, we cannot avoid the additional payload when some
>      objects are modified, and the more modified objects we have, the
>      slower will be the fetch (first figure, 2.24, would be the upper
>      limit here, when all objects are modified).
>=20
>      Second, I do not want to commit this quicker implementation now,
>      because a problem remains: if the database has been changed in the
>      mean time, you can get raw rows whose value are not the same then
>      _unmodified_ objects in the EditingContext. I'm not sure if this is
>      a significant problem, or put it differently, if we have to pay for
>      extra cpu-time for ensuring that this does not happen. But I feel a
>      little touchy in making a exception in the general rule.

Okay, I thought I solved this and committed it in cvs
[DatabaseChannel.fetchObject() v1.15].

It gave the following figures, on the same basis (py2.2 -O / no psyco):

[raw row] 1st fetch (ec empty):                0.60
[raw row] 2nd fetch :                          0.59
[raw row] 3rd fetch (objects already loaded):  0.66

  Alas, I tried it then with all objects modified, and it took... about
  1 minute for 5000 objects. I thought testing if the object was
  modified was a good idea, and it was when no objects were modified.

  But when all 5000 objects are modified, looking in a list of len(5000)
  if an object is there takes avg. 2500 look-ups, hence 2500 calls for
  __eq__. For 5000 objects, that 5000*2500=3D12.5e6 calls to __eq__!

So back to the old behaviour, and the following figures (py2.2 -O)

[raw row] 1st fetch (ec empty):                0.522832036018
[raw row] 2nd fetch :                          0.516697049141
[raw row] 3rd fetch (objects already loaded):  1.80115604401
[raw row] 4th fetch (all objects modified):    1.70906305313

    I'll commit this soon. And I guess it's time for me to stop annoying
    you stop w/ all these performance considerations.

-- S=E9bastien.