I finally found some time to start describing pyDC internals :-)
As I said before, don't exitate to tell me if you find something unclear.
------------------------------------------------------------------
As I've already mentioned, pyDC is composed of three parts:
* pyDClib;
* cli;
* gui.
In this message I'll talk about the first one.
==> pyDClib <==
pyDClib is a standalone library providing an API to access the Direct Connect
P2P network. It's not meant to be directly used by the end user; scripters,
on the other side, can access it to extend pyDC.
- Low level job model
The greatest part of networking apps that has been written use one of the
following approaches when dealing with sockets:
1) use asynchronous sockets, calling select() or poll() to sleep 'till there's
something to read or write;
2) use blocking sockets, creating as many threads as needed to handle open
streams.
pyDClib uses neither of them.
The first solution is perfectly suited to stateless protocols such as HTTP:
each request-reply sequence is completely independent of other ones and it
only lasts a few seconds; the goal here is to serve content as soon as a
request arrives (low delay). pyDClib deals with a completely different model:
connection lifetimes are measured in hours (possibly days) and the priority
goes to throughput (downloads/uploads) and conservative CPU usage (hubs
commands). There's no point in processing hub messages conveying user state
info as soon as they arrive; a much more efficient solution is to buffer them
and to go through an entire buch in a single run.
Approach 2, though probably the easiest to code, has its own drawbacks. As the
number of threads increases, the system spends more and more time
bookkeeping; published tests show that once you reach a certain number of
threads, the net effect of adding a new one is not to increase, but to reduce
the total throughput!
pyDClib uses state machines to mimic threads without paying their full cost.
To understand what I mean, start by taking a look at Job.py. As you may
notice the file define a single interface (the class Job provides no
implementation); "real" jobs subclass it. Now suppose you want to implement a
Job whose only task is to send data through an open socket. A naive
implementation is the following:
class EasyJob(Job):
def __init__(self, source, sock):
self.source = source
self.sock = sock #store for later usage
def poll(self):
data = self.source.read(1024)
if len(data) == 0: return True
self.sock.send(data)
return False
The key point here is the following: every time poll() is called, it executes
part of the task the class has been written to achieve. Since poll() will be
called many times, its execution time can be reduced to a fraction of a
second (by subdividing the job in small parts).
Now take take a pool of jobs and call their poll() methods in sequence (as
JobPool [JobPools.py] does) and you'll get something similar to a
multithreaded program (in fact this *is* non-preemptive multithreading, the
one win 3.x used).
And now some details:
- poll returns True when it has nothing more to do; this way JobPools can
detect completed jobs and remove them from their internal list;
- JobPools are used in a single place: inside DCWorker [DCWorker.py]. When
DCWorker.start() is called a new thread is created (the only "real" one) and
DCWorker.run() is executed. From that point on pyDClib lives inside a loop,
alternating job execution [line 98 and following] and sleep [line 101];
- Jobs must implement other 2 methods: isAlive() and stop() (doing obvious
things). Please note that after you call stop() you still need to call poll()
'till it returns True; this way Jobs can complete their cleanups.
------------------------------------------------------------------
Bye
--
Anakim Border
ab...@us...
http://pydc.sf.net
|