[Fwd: Re: [Cpu-users] Slow user creation]

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Blake Matheny wrote:

>Well, there are a couple of issues here to consider. If there is a large
>userbase, grabbing all of the IDs could be terribly slow.
>

I guess part of the problem is the sparseness of the ldap system for 
complex optimized queries.  no select max(uid) from users :)

I would think that even in this worst case getting a single large query 
should be an order of magnitude faster than several hundred individual 
queries and would scale a lot better.  Right now the real speed problem 
I think is not the sorting it is the constant queries to the directory. 
Even a linear search of an unordered list that is grabbed from the 
directory in one query would be a lot faster than the current setup I think.

Instead of a binary tree you could set up a linked list and do a 
quicksort. Not having a complex data structure should be ok since we are not 
"searching" the list, rather we are sorting it and grabbing the max 
value. This does not find holes in the list of UIDs but it does give you 
constantly incrementing UID and GID's. Of course if you want to find 
holes a binary tree insertion plus search for the lowest available gap 
would work too, perhaps that is a more general solution.

In the interim I guess I will use the following technique, either 
manually or in a script which accomplishes the same thing externally to cpu.

# cpu -w cat| awk -F : {'print $3'} | sort | uniq|tail -n 1
5545
# cpu -w useradd -u 5546 -g 5546 -ptest test3

btw Blake thank you for cpu, it is exactly the kind of tool that is 
needed to interface well with the use of LDAP in authentication. There 
are very few similar tools that I have found and none that work nearly 
as well.

Thanks,

Terrence

> If there isn't a
>large user base, the problem is that the order in which users (or groups) are
>returned is not by UID/GID, but rather by time of last modification (usually).
>So as we are grabbing UIDs/GIDs, we would need to insert them into a binary
>tree (or something similar), then search the tree. We could do this also with
>a bitvector (which would likely be faster and smaller) but both take up extra
>memory and time. I'll have to think about it.
>
>Of course, setting MIN_UID to the MAX_UID in use would still allow for fast
>linear searches as Eric suggested, but it's not ideal.
>
>Ideas welcome.
>
>-Blake
>
>Whatchu talkin' 'bout, Willis?
>  
>
>>That should but I would like both, linear and relatively quick.
>>
>>CPU seems to do a lot of small searchs, could it not get the list of 
>>UIDs in one go and then figure out the UID to use from there internally? 
>>A cpu -w cat comes back in a little over 2 seconds on my system and only 
>>makes one query to the directory rather than searching one entry at a 
>>time to determine its relatively usefulness.
>>
>>For example if you run the following you quickly get a sorted list of 
>>all the existing UID's
>>
>>cpu -w cat| awk -F : {'print $3'} | sort | uniq
>>
>>Then just grab the last one, increment by one and you are done. I can do 
>>this by hand of course but having CPU do it for me is nicer. :)
>>
>>That would also avoid the need to cache the UID.
>>
>>Terrence
>>
>>    
>>
>
>  
>