|
From: Derrick 'd. H. <dm...@dm...> - 2002-06-26 18:21:22
|
On Wed, Jun 26, 2002 at 01:26:41AM +0200, Michael Str=F6der wrote:
| Derrick 'dman' Hudson wrote:
| >I tried using python-ldap today (1.9.999.pre04-1, python 2.1.3-3), but
| >it is way too inefficient. A simple search that results in 2 entries
| >returned takes 30 seconds.
|=20
| If that would be normal I would not use python-ldap. Let's see.
:-).
| >Watching with top shows nearly 100% CPU
| >usage for the 30 seconds, on an otherwise idle Athlon XP 1800+.
| >OpenLDAP (2.0) is running on that same machine, however using
| >ldapsearch or exim yields immediate results. =20
|=20
| Frankly this is a very unprecise performance measurement.
True, but the ability to measure the time by my watch is rather, umm,
noticeable :-).
| > '(mailGroupLocalPart=3D%s)' % listname ,
|=20
| Is attribute mailGroupLocalPart indexed?
No (AFAIK).
| An equality index should be sufficient here.
| > I need to do some integration of LDAP and some web-based programs,
| > and would like to work with python, but this sort of performance
| > hit just won't be usable.
|=20
| As you might have noticed I'm doing web programming with=20
| python-ldap. ;-) I'm using web2ldap for maintaining and searching=20
| my personal address book and it's pretty responsive when using a=20
| fast browser. I'm also browsing very large data sets (>150000).
That's good. I installed web2ldap and ldapexplorer yesterday to
evaluate them. web2ldap didn't look very useful --
modification/timestamp attributes are shown for any entry.
Actually, trying it again today, but using a different
host as the LDAP server (not localhost), it looks much more useful (it
shows all the attributes). I wonder why that is.
| Just to give you a clue what I'm actually doing with python-ldap=20
| in a commercial pilot project: I'm scanning 170000 entries in far=20
| less than an hour (mainly just reading the uid attribute). I'm=20
| doing diffing whole entries at a rate of 50 entries/second (some=20
| other work with a SQL DB is involved here).
Interesting. The "People" node on our tree has 500 child nodes, each
of which has no children.
| The process runs on a P-III 450 Mhz box against a 4-CPU, 1GB RAM
| server running iPlanet Directory Server 5.1.
I notice that your ldap server is not on the same machine as
python-ldap. As Jens mentioned, and I subsequently discovered for
myself, running python-ldap on a separate host from the ldap server
doesn't have the performance problem. I only experience the problem
with python-ldap and slapd on the same machine.
| > I'm willing to help with the code, if you point me
| >to the interesting parts (and help me learn the C API of python and
| >openldap as I go).
|=20
| To find out the interesting parts one has to do proper performance=20
| measuring.
:-).
=20
| And I would be really glad to see some *real* numbers. Please take=20
| this advice to produce numbers I can take serious:
Ok, here we go.
| 1. Eliminate all disk access =3D> turn off all logging.
Logging is (and was) off.
| 2. Eliminate caching issues =3D> do many searches, throw away first resul=
t.
Right. My earlier, crude, measurements were repeatable every time.
If it was a caching issue I would expect the first to be slow but not
the latter ones.
| 3. Eliminate DB backend issues =3D> only search RootDSE.
| (This hint by Kurt Zeilenga.)
DSE ... I don't think I've run across this TLA before.
=20
| 4. Maximize performance impact of python-ldap =3D> use faster LDAP server.
By "server" are you referring to hardware or software?
| I took some numbers on my P-III laptop against a locally installed=20
| Netscape Directory Server 4.16SP1 which is much faster than recent=20
| OpenLDAP.
Can I get Netscape Directory Server for Debian? Is it Free?
If not then there is very little possibility of using it.
| Test script is attached.
Thanks!
| There are three test cases especially for the guys who are blaming
| python-ldap for bad performance but are reconnecting to the LDAP
| server for each query. ;-)
Is it allowed to reconnect for each query if each query is run from a
separate process at disparate times? ;-)
Here is what I get :
Server :
Hardware :
Athlon XP 1800+ (1.5 GHz clock, I think)
256 MB DDR RAM
IDE disk (I don't know much on the specs, but it is
relatively new and fairly quick)
Software :
OpenLDAP 2.0.23 , ldbm backend
Debian woody/sid
Linux 2.4.18
Other Load :
light
Client 1 :
Hardware :
same
Software
(same)
OpenLDAP client library (v 2.0.23)
python 2.1.3
python-ldap 1.9.999.pre04
Other Load :
same
Client 2 :
Hardware :
Duron 750 (750 MHz clock)
256 MB PC133 SDRAM
Software
(same)
OpenLDAP client library (v 2.0.23)
python 2.1.3
python-ldap 1.9.999.pre04
Other Load :
light-moderate
Test 1 :
Client 1
*** Read the RootDSE on same connection
1719.493984 searches/second
*** Read the RootDSE on newly created connection without extra simple bind
30.879468 searches/second
*** Read the RootDSE on newly created connection with an extra simple bind
26.940053 searches/second
Watching with top shows the system working as hard as it could,
with the client eating most of the CPU and slapd using very very
little.
Test 2 :
Client 2
*** Read the RootDSE on same connection
895.992294 searches/second
*** Read the RootDSE on newly created connection without extra simple bind
161.131824 searches/second
*** Read the RootDSE on newly created connection with an extra simple bind
146.020832 searches/second
Watching both systems with top showed (CPU wise) the server not
even sweating while the client got a decent workout.
Test 3 :
Client 1 ,=20
without the async implementation of LDAPObject.result(),
directly wrapping the built-in C implementation:
*** Read the RootDSE on same connection
2039.669716 searches/second
*** Read the RootDSE on newly created connection without extra simple bind
279.827530 searches/second
*** Read the RootDSE on newly created connection with an extra simple bind
262.486823 searches/second
Watching with top shows _both_ python-ldap and slapd getting a
fair amount of CPU time, and the CPU was only running at ~50%
of its capacity (good things, IMO :-)).
Test 4 :
same as test 3 but with Client 2
*** Read the RootDSE on same connection
1069.199140 searches/second
*** Read the RootDSE on newly created connection without extra simple bind
163.548164 searches/second
*** Read the RootDSE on newly created connection with an extra simple bind
151.169328 searches/second
Similar observations in top, except that the server wasn't working
nearly as hard as in test3.=20
This seems to show that the difference in the result() method is more
significant/noticeable on a fast system than on a slow one.
| =3D> I have yet to see some serious numbers proving the "30 seconds=20
| vs. immediate results".
=20
If I add
index mailGroupLocalPart eq
to slapd.conf and restart the daemon, my script runs really fast on
"client1", but (incorrectly) doesn't return any results.
Using that script on "client2" yields these results (just for
comparison) :
real 0m0.147s
user 0m0.120s
sys 0m0.020s
This sort of time is quite acceptable :-).
Does this help?
Oh, BTW, some of the docs are missing (404) on the web site. For
example, go to=20
http://python-ldap.sourceforge.net/pydoc/ldap.html
and click on the "functions" link.
-D
--=20
The way of a fool seems right to him,
but a wise man listens to advice.
Proverbs 12:15
=20
http://dman.ddts.net/~dman/
|