[pyDC-devel] pyDClib - part 1
Status: Beta
Brought to you by:
aborder
From: Anakim B. <ab...@us...> - 2003-12-13 14:24:56
|
I finally found some time to start describing pyDC internals :-) As I said before, don't exitate to tell me if you find something unclear. ------------------------------------------------------------------ As I've already mentioned, pyDC is composed of three parts: * pyDClib; * cli; * gui. In this message I'll talk about the first one. ==> pyDClib <== pyDClib is a standalone library providing an API to access the Direct Connect P2P network. It's not meant to be directly used by the end user; scripters, on the other side, can access it to extend pyDC. - Low level job model The greatest part of networking apps that has been written use one of the following approaches when dealing with sockets: 1) use asynchronous sockets, calling select() or poll() to sleep 'till there's something to read or write; 2) use blocking sockets, creating as many threads as needed to handle open streams. pyDClib uses neither of them. The first solution is perfectly suited to stateless protocols such as HTTP: each request-reply sequence is completely independent of other ones and it only lasts a few seconds; the goal here is to serve content as soon as a request arrives (low delay). pyDClib deals with a completely different model: connection lifetimes are measured in hours (possibly days) and the priority goes to throughput (downloads/uploads) and conservative CPU usage (hubs commands). There's no point in processing hub messages conveying user state info as soon as they arrive; a much more efficient solution is to buffer them and to go through an entire buch in a single run. Approach 2, though probably the easiest to code, has its own drawbacks. As the number of threads increases, the system spends more and more time bookkeeping; published tests show that once you reach a certain number of threads, the net effect of adding a new one is not to increase, but to reduce the total throughput! pyDClib uses state machines to mimic threads without paying their full cost. To understand what I mean, start by taking a look at Job.py. As you may notice the file define a single interface (the class Job provides no implementation); "real" jobs subclass it. Now suppose you want to implement a Job whose only task is to send data through an open socket. A naive implementation is the following: class EasyJob(Job): def __init__(self, source, sock): self.source = source self.sock = sock #store for later usage def poll(self): data = self.source.read(1024) if len(data) == 0: return True self.sock.send(data) return False The key point here is the following: every time poll() is called, it executes part of the task the class has been written to achieve. Since poll() will be called many times, its execution time can be reduced to a fraction of a second (by subdividing the job in small parts). Now take take a pool of jobs and call their poll() methods in sequence (as JobPool [JobPools.py] does) and you'll get something similar to a multithreaded program (in fact this *is* non-preemptive multithreading, the one win 3.x used). And now some details: - poll returns True when it has nothing more to do; this way JobPools can detect completed jobs and remove them from their internal list; - JobPools are used in a single place: inside DCWorker [DCWorker.py]. When DCWorker.start() is called a new thread is created (the only "real" one) and DCWorker.run() is executed. From that point on pyDClib lives inside a loop, alternating job execution [line 98 and following] and sleep [line 101]; - Jobs must implement other 2 methods: isAlive() and stop() (doing obvious things). Please note that after you call stop() you still need to call poll() 'till it returns True; this way Jobs can complete their cleanups. ------------------------------------------------------------------ Bye -- Anakim Border ab...@us... http://pydc.sf.net |