[pyDC-devel] Latest pyDC updates
Status: Beta
Brought to you by:
aborder
From: Anakim B. <ab...@us...> - 2004-03-15 14:01:17
|
In the last two weeks I've made a number of changes to pyDC sources. Here's the list: pyDClib/AsyncSocket.py: added unrecv() method; pyDClib/AsyncSocket.py: added getsockname() method; pyDClib/AsyncSocket.py: some speed optimizations; pyDClib/AsyncSocket.py: use lambda function to avoid memory leak; pyDClib/bloom.py: initial import; pyDClib/Channel.py: added ChannelEventListener; pyDClib/Channel.py: added onChannelClosed event notification; pyDClib/DCChannel.py: avoid loosing commands when changing handler (call sock.unrecv()); pyDClib/DCChannel.py: replaced deprecated apply(); pyDClib/DCChannel.py: some speed optimizations; pyDClib/DCChannel.py: new calling scheme for command handlers; pyDClib/DCDownload.py: support for segmented zlib-compressed downloads; pyDClib/DCDownload.py: support for new removeSource() signature; pyDClib/DCFileList.py: no longer needed; pyDClib/DCHub.py: match new DCLocalList interface; pyDClib/DCHub.py: avoid memory leaks by using class.method instead of self.method references; pyDClib/DCHub.py: better PM parsing; pyDClib/DCHub.py: fixed registerListener signature; pyDClib/DCHub.py: cleanup of socket event handlers; pyDClib/DCHubEventListener.py: now subclass of ChannelEventListener; pyDClib/DCItem.py: __slots__ to reduce memory consumption; pyDClib/DCLocalList.py: full rewrite; pyDClib/DCLocalList.py: Bloom filter support; pyDClib/DCLocalListWriter.py: match DCLocalList changes; pyDClib/DCQueue.py: new addItem() signature; pyDClib/DCQueue.py: upper limit to the number of queue items processed by single poll() calls; pyDClib/DCQueueItem.py: full rewrite; pyDClib/DCQueueItem.py: new removeSource() signature; pyDClib/DCSearchResult.py: __slots__ to reduce memory consumption; pyDClib/DCSettings.py: moved default values to class constants; pyDClib/DCUpload.py: support for segmented zlib-compressed uploads; pyDClib/DCUpload.py: avoid memory leaks by using class.method instead of self.method references; pyDClib/DCUser.py: __slots__ to reduce memory consumption; pyDClib/DCUserList.py: DCUserList is no longer a Job; pyDClib/DCUserList.py: removeSource() dequeues item; pyDClib/DCUserList.py: moved UserDir and UserFile from DCFileList.py; pyDClib/DCUserList.py: delete downloaded list after parsing; pyDClib/DCUserList.py: new removeSource() signature; pyDClib/DCWorker.py: call DCLocalList.freeze(); pyDClib/DCWorker.py: experimental IP change detection; pyDClib/DCXfer.py: support for segmented zlib-compressed xfers; pyDClib/DCXfer.py: avoid memory leaks by using class.method instead of self.method references; pyDClib/DCXferWorker.py: xferBandwidth() returns 0 to signal unlimited bandwidth; pyDClib/DNS.py: improved /etc/resolve.conf parsing; pyDClib/DNS.py: avoid memory leak by using lambda functions; ChatViewer.py: multirow message support; FileListViewer.py: removed DCFileList references; FileListViewer.py: support for delayed queue refresh; HubViewer.py: added 'Copy nick' and 'Remove user from queue' commands; HubViewer.py: support for multirow messages; HubViewer.py: match DCUserList changes; HubViewer.py: sendMsg displays error messages in main chat view; HubsPanel.py: listen for Channel events; HubsPanel.py: handle onChannelClosed instead of onHubDisconnection; HubsPanel.py: bugfix: call deregisterListener() to free resources; MainWnd.py: bugfix: searchChat is no longer messing with viewer stack; MainWnd.py: removed DCFileList references; MainWnd.py: added enableQueueRefresh() method; MultirowMessage.py: initial import; QueuePanel.py: match DCQueueItem changes; QueuePanel.py: identify user lists by user nick; QueuePanel.py: experimental support for delayed refresh; SearchesPanel.py: call deregisterListener() to free resources; SearchViewer.py: match DCUserList changes; Settings.py: use default values from DCSettings; XfersPanel.py: 3:2 ratio for downloads/uploads lists; XfersPanel.py: added popup menu; XfersPanel.py: '*' to signal zlib-compressed xfers; XfersPanel.py: '!' to signal multi-source downloads; pydc.py: catch all queue errors; pydc.py: clear clipboard on exit to avoid segfaults; pydc.py: don't try to save malformed settings; ------------------- pyDC now supports the 'GetZBlock' protocol extension, thus allowing segmented zlib-compressed file transfers. To get the most, I rewrote the queue item manager from scratch. The new code is able to deal with segments of variable size and to rearrange them on the fly; each segment is independent from the others and that means that pyDC can download a single item from multiple sources at once. Since most hub owners forbid multidownloads, the feature is disabled by default: if you are allowed to and want to try it out, simply enter the following command in the Python interpreter: DCQueue.MAX_CONCURRENT_DOWNLOADS=5 You may change the value depending on your needs, but keep it low (since your bandwidth is limited, setting it to 8 or to 100 yields no difference; in fact bigger values also means bigger overhead, so 100 may even be *worse* than 8). Resetting MAX_CONCURRENT_DOWNLOADS to 1 will disable multidownloads (0 will stop downloads at all :-) ). The GUI has been updated to inform you about the extrension: a '*' mark before the filename in the Xfers panel means that the xfer is using the zlib compression; a '!' means that the download refers to part of a multidownloaded item. The other important area I've been working on is the search engine (the piece of code handling search request from other peers to you). Believe it or not, this is the place where pyDC spends most of its time; it is crucial to make "local" searches as fast as possibile. The old code seemed fast; it was also severely bugged (it never returned results to queries containing more than one search term). As a first task I wrote a new generic search engine; by replacing objects with tuples and by using a smarter algorithm, I was able to make the new code twice as fast. Then I moved to size-constrained searches (you know: at least 1MB / at most 1GB); measurements showed me that around 85% of all searches fall into this category. One can exploit that constraints to avoid filename pattern searches (slow) on files outside size limits; that reduces the search space and, hopefully, makes the search engine faster. The algorithm I used is the following: when the file list is initially created, maximum and minimum file size are noted. At that point all files are placed into one of two big lists, depending on their size: [ all files with size below (max-min)/2 + min, all files above ] The process continues by splitting each of those two lists in the same way; at the end all files will be grouped into classes having a fixed size range. When a search request with size constraints comes in, the search engine finds the first class matching the limits; only at that poing filename pattern search starts. The last improvement is also the biggest. It is based on the genial idea that goes under the name of "Bloom filter". I won't cover the details here, since a number of guides on the subject already exist on the web (google is your friend) and since the implementation is straightforward (take a look at pyDClib/bloom.py). As a last note I'd like to give you an idea of the impact of the optimizations. On my Athlon 1800 server, having 293 indexed files, the percentage of CPU usage by pyDC as reported by top before the changes was fluctuating around 15%; now it's around 0.7%. As always you can get all the changes using CVS or you may download the packages from: http://pydc.sourceforge.net/pydc/rc2.html -- Anakim Border ab...@us... http://pydc.sourceforge.net |