Thread: Re: [Pypes-user] Pypes meet Axon ?
Status: Beta
Brought to you by:
egaumer
From: Eric G. <eg...@py...> - 2009-08-18 03:05:29
|
Thanks Michael, Don't be too surprised by the similarities because I've been a fan a Kamaelia for some time now. I really like the style of messaging passing so many of my ideas came from Kamaelia. Although there are similarities, I don't think Pypes is quite as ambitious. Pypes started out as a way of feeding huge volumes of data to a search index. It was originally linear and I was using function composition to chain together the new style generators in Python (PEP 342). Although this worked well, it made it a bit tough to setup at runtime (thinking in terms of UI work). I also wanted the ability to publish to multiple locations and/or branch based on certain attributes in a packet. I turned to Stackless because I think it's a really cool. I love seeing these sort of old school concepts resurrected (like Flow- Based Programming). Too many of us become complacent with "what is" and just assume that it's the best (or only) way to do things. Both Stackless (coroutines) and Flow-Based programming challenge some very fundamental concepts in computer science. More importantly, they seem to be ideal concepts that are quite applicable to the sort of problems that we're trying to solve with these frameworks. Quite honestly I see pypes (Visual Design Studio) as more of an ETL style framework and for this reason, I haven't addressed the issue of cycles. I wanted to keep things as simple as possible because lots of complexity leads to the potential for more bugs. With this ETL mindset, performance is a priority. A typical installation of pypes (VDS) in an enterprise setting is 3 quad Xeon machines all processing content in parallel. It's not uncommon to push 15 million documents through the system. If something fails after 3 million documents, we have to go back and start the feed over again which can really start to eat up time as well as the customer's patience. For this reason simplicity is also a priority because it allows us to minimize the possibility for errors. The UI work was really inspired by Yahoo Pipes. I hadn't done any Ajax or Javascript coding prior to this and I would have never thought this type of UI was possible had it not been for Yahoo Pipes. The idea of a web UI was really cool because I had been writing an HTTP service layer on top of pypes that exposed a REST API where external applications could inject content into the system by issuing HTTP POST commands. This style of processing is easy to scale out using hardware load balancing. The fact that I was able to build the UI as a web application still baffles me. Best of all, it really cuts down on dependencies. Visual Design Studio is pure Python with no C-extensions so it's simple to install. Of course, C-extensions can be shipped/built separately to address specific performance concerns. Right now I'm using Elementtree which is shipped with Python 2.6 and it's blazing fast (about twice as fast as libxml -- yes I deal with a LOT of XML content). I have some Bayesian and Fisher classifiers as well as tools for creating decision trees since I deal with a lot of taxonomy mapping. Information Extraction is another focal point. If pypes gains a following I suspect we'll see some RSS mashup tools surface. I've been in a few organizations that would love to leverage Yahoo Pipes but are afraid because there's no SLA. Yahoo Pipes also doesn't allow you to write custom components and both these problems are address in pypes. At any rate, thanks for taking the time to check out pypes and provide some feedback. I could definitely see some collaboration in our future. Will you be at PyCon this year by any chance? I thought I saw a tweet stating that it wasn't looking too promising. -Eric |