[pyDC-devel] pyDClib - part 1

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I finally found some time to start describing pyDC internals :-)
As I said before, don't exitate to tell me if you find something unclear.

------------------------------------------------------------------

As I've already mentioned, pyDC is composed of three parts:
* pyDClib;
* cli;
* gui.

In this message I'll talk about the first one.

==> pyDClib <==

pyDClib is a standalone library providing an API to access the Direct Connect 
P2P network. It's not meant to be directly used by the end user; scripters, 
on the other side, can access it to extend pyDC.

- Low level job model

The greatest part of networking apps that has been written use one of the 
following approaches when dealing with sockets:

1) use asynchronous sockets, calling select() or poll() to sleep 'till there's 
something to read or write;
2) use blocking sockets, creating as many threads as needed to handle open 
streams.

pyDClib uses neither of them.
The first solution is perfectly suited to stateless protocols such as HTTP: 
each request-reply sequence is completely independent of other ones and it 
only lasts a few seconds; the goal here is to serve content as soon as a 
request arrives (low delay). pyDClib deals with a completely different model: 
connection lifetimes are measured in hours (possibly days) and the priority 
goes to throughput (downloads/uploads) and conservative CPU usage (hubs 
commands). There's no point in processing hub messages conveying user state 
info as soon as they arrive; a much more efficient solution is to buffer them 
and to go through an entire buch in a single run.
Approach 2, though probably the easiest to code, has its own drawbacks. As the 
number of threads increases, the system spends more and more time 
bookkeeping; published tests show that once you reach a certain number of 
threads, the net effect of adding a new one is not to increase, but to reduce 
the total throughput!
pyDClib uses state machines to mimic threads without paying their full cost. 
To understand what I mean, start by taking a look at Job.py. As you may 
notice the file define a single interface (the class Job provides no 
implementation); "real" jobs subclass it. Now suppose you want to implement a 
Job whose only task is to send data through an open socket. A naive 
implementation is the following:

class EasyJob(Job):
	def __init__(self, source, sock):
		self.source = source
		self.sock = sock   #store for later usage

	def poll(self):
		data = self.source.read(1024)
		if len(data) == 0: return True
		self.sock.send(data)
		return False

The key point here is the following: every time poll() is called, it executes 
part of the task the class has been written to achieve. Since poll() will be 
called many times, its execution time can be reduced to a fraction of a 
second (by subdividing the job in small parts).
Now take take a pool of jobs and call their poll() methods in sequence (as 
JobPool [JobPools.py] does) and you'll get something similar to a 
multithreaded program (in fact this *is* non-preemptive multithreading, the 
one win 3.x used).

And now some details:
- poll returns True when it has nothing more to do; this way JobPools can 
detect completed jobs and remove them from their internal list;
- JobPools are used in a single place: inside DCWorker [DCWorker.py]. When 
DCWorker.start() is called a new thread is created (the only "real" one) and 
DCWorker.run() is executed. From that point on pyDClib lives inside a loop, 
alternating job execution [line 98 and following] and sleep [line 101];
- Jobs must implement other 2 methods: isAlive() and stop() (doing obvious 
things). Please note that after you call stop() you still need to call poll() 
'till it returns True; this way Jobs can complete their cleanups.

------------------------------------------------------------------

Bye

-- 
	Anakim Border
	ab...@us...
	http://pydc.sf.net