[pyDC-devel] Latest pyDC updates

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

In the last two weeks I've made a number of changes to pyDC sources. Here's 
the list:

pyDClib/AsyncSocket.py: added unrecv() method;
pyDClib/AsyncSocket.py: added getsockname() method;
pyDClib/AsyncSocket.py: some speed optimizations;
pyDClib/AsyncSocket.py: use lambda function to avoid memory leak;
pyDClib/bloom.py: initial import;
pyDClib/Channel.py: added ChannelEventListener;
pyDClib/Channel.py: added onChannelClosed event notification;
pyDClib/DCChannel.py: avoid loosing commands when changing handler (call 
sock.unrecv());
pyDClib/DCChannel.py: replaced deprecated apply();
pyDClib/DCChannel.py: some speed optimizations;
pyDClib/DCChannel.py: new calling scheme for command handlers;
pyDClib/DCDownload.py: support for segmented zlib-compressed downloads;
pyDClib/DCDownload.py: support for new removeSource() signature;
pyDClib/DCFileList.py: no longer needed;
pyDClib/DCHub.py: match new DCLocalList interface;
pyDClib/DCHub.py: avoid memory leaks by using class.method instead of 
self.method references;
pyDClib/DCHub.py: better PM parsing;
pyDClib/DCHub.py: fixed registerListener signature;
pyDClib/DCHub.py: cleanup of socket event handlers;
pyDClib/DCHubEventListener.py: now subclass of ChannelEventListener;
pyDClib/DCItem.py: __slots__ to reduce memory consumption;
pyDClib/DCLocalList.py: full rewrite;
pyDClib/DCLocalList.py: Bloom filter support;
pyDClib/DCLocalListWriter.py: match DCLocalList changes;
pyDClib/DCQueue.py: new addItem() signature;
pyDClib/DCQueue.py: upper limit to the number of queue items processed by 
single poll() calls;
pyDClib/DCQueueItem.py: full rewrite;
pyDClib/DCQueueItem.py: new removeSource() signature;
pyDClib/DCSearchResult.py: __slots__ to reduce memory consumption;
pyDClib/DCSettings.py: moved default values to class constants;
pyDClib/DCUpload.py: support for segmented zlib-compressed uploads;
pyDClib/DCUpload.py: avoid memory leaks by using class.method instead of 
self.method references;
pyDClib/DCUser.py: __slots__ to reduce memory consumption;
pyDClib/DCUserList.py: DCUserList is no longer a Job;
pyDClib/DCUserList.py: removeSource() dequeues item;
pyDClib/DCUserList.py: moved UserDir and UserFile from DCFileList.py;
pyDClib/DCUserList.py: delete downloaded list after parsing;
pyDClib/DCUserList.py: new removeSource() signature;
pyDClib/DCWorker.py: call DCLocalList.freeze();
pyDClib/DCWorker.py: experimental IP change detection;
pyDClib/DCXfer.py: support for segmented zlib-compressed xfers;
pyDClib/DCXfer.py: avoid memory leaks by using class.method instead of 
self.method references;
pyDClib/DCXferWorker.py: xferBandwidth() returns 0 to signal unlimited 
bandwidth;
pyDClib/DNS.py: improved /etc/resolve.conf parsing;
pyDClib/DNS.py: avoid memory leak by using lambda functions;
ChatViewer.py: multirow message support;
FileListViewer.py: removed DCFileList references;
FileListViewer.py: support for delayed queue refresh;
HubViewer.py: added 'Copy nick' and 'Remove user from queue' commands;
HubViewer.py: support for multirow messages;
HubViewer.py: match DCUserList changes;
HubViewer.py: sendMsg displays error messages in main chat view;
HubsPanel.py: listen for Channel events;
HubsPanel.py: handle onChannelClosed instead of onHubDisconnection;
HubsPanel.py: bugfix: call deregisterListener() to free resources;
MainWnd.py: bugfix: searchChat is no longer messing with viewer stack;
MainWnd.py: removed DCFileList references;
MainWnd.py: added enableQueueRefresh() method;
MultirowMessage.py: initial import;
QueuePanel.py: match DCQueueItem changes;
QueuePanel.py: identify user lists by user nick;
QueuePanel.py: experimental support for delayed refresh;
SearchesPanel.py: call deregisterListener() to free resources;
SearchViewer.py: match DCUserList changes;
Settings.py: use default values from DCSettings;
XfersPanel.py: 3:2 ratio for downloads/uploads lists;
XfersPanel.py: added popup menu;
XfersPanel.py: '*' to signal zlib-compressed xfers;
XfersPanel.py: '!' to signal multi-source downloads;
pydc.py: catch all queue errors;
pydc.py: clear clipboard on exit to avoid segfaults;
pydc.py: don't try to save malformed settings;

-------------------

pyDC now supports the 'GetZBlock' protocol extension, thus allowing segmented 
zlib-compressed file transfers. To get the most, I rewrote the queue item 
manager from scratch. The new code is able to deal with segments of variable 
size and to rearrange them on the fly; each segment is independent from the 
others and that means that pyDC can download a single item from multiple 
sources at once. Since most hub owners forbid multidownloads, the feature is 
disabled by default: if you are allowed to and want to try it out, simply 
enter the following command in the Python interpreter:

DCQueue.MAX_CONCURRENT_DOWNLOADS=5

You may change the value depending on your needs, but keep it low (since your 
bandwidth is limited, setting it to 8 or to 100 yields no difference; in fact 
bigger values also means bigger overhead, so 100 may even be *worse* than 8).
Resetting MAX_CONCURRENT_DOWNLOADS to 1 will disable multidownloads (0 will 
stop downloads at all :-) ).
The GUI has been updated to inform you about the extrension: a '*' mark before 
the filename in the Xfers panel means that the xfer is using the zlib 
compression; a '!' means that the download refers to part of a 
multidownloaded item.

The other important area I've been working on is the search engine (the piece 
of code handling search request from other peers to you). Believe it or not, 
this is the place where pyDC spends most of its time; it is crucial to make 
"local" searches as fast as possibile.
The old code seemed fast; it was also severely bugged (it never returned 
results to queries containing more than one search term).
As a first task I wrote a new generic search engine; by replacing objects with 
tuples and by using a smarter algorithm, I was able to make the new code 
twice as fast.
Then I moved to size-constrained searches (you know: at least 1MB / at most 
1GB); measurements showed me that around 85% of all searches fall into this 
category. One can exploit that constraints to avoid filename pattern searches 
(slow) on files outside size limits; that reduces the search space and, 
hopefully, makes the search engine faster. The algorithm I used is the 
following: when the file list is initially created, maximum and minimum file 
size are noted. At that point all files are placed into one of two big lists, 
depending on their size:

[ all files with size below (max-min)/2 + min, all files above ]

The process continues by splitting each of those two lists in the same way; at 
the end all files will be grouped into classes having a fixed size range. 
When a search request with size constraints comes in, the search engine finds 
the first class matching the limits; only at that poing filename pattern 
search starts.

The last improvement is also the biggest. It is based on the genial idea that 
goes under the name of "Bloom filter". I won't cover the details here, since 
a number of guides on the subject already exist on the web (google is your 
friend) and since the implementation is straightforward (take a look at 
pyDClib/bloom.py).

As a last note I'd like to give you an idea of the impact of the 
optimizations. On my Athlon 1800 server, having 293 indexed files, the 
percentage of CPU usage by pyDC as reported by top before the changes was 
fluctuating around 15%; now it's around 0.7%.

As always you can get all the changes using CVS or you may download the 
packages from:

http://pydc.sourceforge.net/pydc/rc2.html

-- 
	Anakim Border
	ab...@us...
	http://pydc.sourceforge.net