Thanks Michael,
Don't be too surprised by the similarities because I've been a fan a
Kamaelia for some time now. I really like the style of messaging
passing so many of my ideas came from Kamaelia. Although there are
similarities, I don't think Pypes is quite as ambitious. Pypes started
out as a way of feeding huge volumes of data to a search index. It was
originally linear and I was using function composition to chain
together the new style generators in Python (PEP 342).
Although this worked well, it made it a bit tough to setup at runtime
(thinking in terms of UI work). I also wanted the ability to publish
to multiple locations and/or branch based on certain attributes in a
packet. I turned to Stackless because I think it's a really cool. I
love seeing these sort of old school concepts resurrected (like Flow-
Based Programming).
Too many of us become complacent with "what is" and just assume that
it's the best (or only) way to do things. Both Stackless (coroutines)
and Flow-Based programming challenge some very fundamental concepts in
computer science. More importantly, they seem to be ideal concepts
that are quite applicable to the sort of problems that we're trying to
solve with these frameworks.
Quite honestly I see pypes (Visual Design Studio) as more of an ETL
style framework and for this reason, I haven't addressed the issue of
cycles. I wanted to keep things as simple as possible because lots of
complexity leads to the potential for more bugs. With this ETL
mindset, performance is a priority. A typical installation of pypes
(VDS) in an enterprise setting is 3 quad Xeon machines all processing
content in parallel. It's not uncommon to push 15 million documents
through the system.
If something fails after 3 million documents, we have to go back and
start the feed over again which can really start to eat up time as
well as the customer's patience. For this reason simplicity is also a
priority because it allows us to minimize the possibility for errors.
The UI work was really inspired by Yahoo Pipes. I hadn't done any Ajax
or Javascript coding prior to this and I would have never thought this
type of UI was possible had it not been for Yahoo Pipes. The idea of a
web UI was really cool because I had been writing an HTTP service
layer on top of pypes that exposed a REST API where external
applications could inject content into the system by issuing HTTP POST
commands. This style of processing is easy to scale out using hardware
load balancing.
The fact that I was able to build the UI as a web application still
baffles me. Best of all, it really cuts down on dependencies. Visual
Design Studio is pure Python with no C-extensions so it's simple to
install. Of course, C-extensions can be shipped/built separately to
address specific performance concerns. Right now I'm using Elementtree
which is shipped with Python 2.6 and it's blazing fast (about twice as
fast as libxml -- yes I deal with a LOT of XML content).
I have some Bayesian and Fisher classifiers as well as tools for
creating decision trees since I deal with a lot of taxonomy mapping.
Information Extraction is another focal point.
If pypes gains a following I suspect we'll see some RSS mashup tools
surface. I've been in a few organizations that would love to leverage
Yahoo Pipes but are afraid because there's no SLA. Yahoo Pipes also
doesn't allow you to write custom components and both these problems
are address in pypes.
At any rate, thanks for taking the time to check out pypes and provide
some feedback. I could definitely see some collaboration in our
future. Will you be at PyCon this year by any chance? I thought I saw
a tweet stating that it wasn't looking too promising.
-Eric
|