From: Kieren D. <ki...@di...> - 2007-08-08 12:52:16
|
Hi, I've been wanting to write this email for a while, and you can consider me a strong supporter of what Connotea/NPG is trying to do. However there are some major problems that I think need to be addressed if connotea is to become a self sustaining open source project with a sustainable developer community. A bit about me: I'm a social researcher and perl hacker. Most of the perl I write is for research data management, but I do bibliographic stuff, web robots and occasional web applications programming too. I've been a significant contributor to the Catalyst web application framework (mainly with the documentation and example code). I'm computer literate, and I've used mod_perl enough to have a good idea of what irritates me about it. I think that Catalyst is the natural successor for 95% of the mod_perl or Apache::Registry scripts out there. Problems I've experienced with connotea as an open source project. Suggested solutions in [square brackets]. Roughly in order of importance. 1. No test suite. Lack of unit/behaviour tests along with the impoverished debugging environment with mod_perl make for painful development. [ SOLUTION: Port to catalyst with built in testing utilities and trivial support for perl -d ] 2. Class::DBI + memcached. These seem to be very tightly coupled. Class::DBI is flawed software, and while it has proved very useful, it clearly has severe limitations. One example is the difficulty of inspecting the sql that's it's generating. It's main problem is a general lack of transparency and implicitly generated code. CDBI "died" as an ongoing open source project about a year ago. The original author of CDBI's last project resulted in him fixing DBIx::Class' CDBI Compatibility layer so he could port his client's code from CDBI to DBIx::Class. [SOLUTION: implied by the last comment DBIx::Class (DBIC) is the natural successor to CDBI, originally developed by the author of CDBI::Sweet. DBIC + memcached has production users. Discussion on IRC (#dbix-class on irc.perl.org) indicates to me that swappable caching engines (including a null cache for debugging) ) should be trivial and transparent with a DBIC based data model. The killer dbic feature for me is that running your dbic script like so: "DBIC_TRACE=1 perl run_my_script.pl" gives you the exact sql being passed to the dbic classes. This outputs to standard error, thus making debugging hugely easier. As noted, there's a CDBICompat layer to ease transition from CDBI to DBIC. CDBICompat has a more extensive test suite than CDBI for good measure. Finally DBIC tends to result in much more efficent SQL than CDBI and swapping out database engines (eg mysql to pg to sqlite) is much easier - I've seen this done trivially from pg to sqlite for a database representing a directed acyclic graph. ] 3. Template toolkit. The templates in connotea seem to be populated with weird coderefs that make debugging/interrogation even more difficult. TT is one of the best templating solutions out there, but connetea seems to misuse it. [SOLUTION: Port to explicit templates resident in files, using PROCESS, INCLUDE and MACRO blocks where appropriate. This should be fairly simple to do concurrently with the rest of the catalyst/dbic port. ] 4. Connotea doesn't scale down well. I've used it on an ibook g4 for testing (performance verging on reasonable), and a pentium 3 linux machine with 128mb of RAM where performance was unacceptable. [SOLUTON: port to catalyst where fast_cgi, mod_perl and other more exotic engines are all viable solutions. Removing the mod_perl depencency opens up shared hosting possibilities] There are some excellent things about connotea (auto-import and the database schema being two big examples), but the above are show stoppers which are going to cause huge problems for the sustainability of the project. The usability for end users is great, but as far as programmer-usability goes connotea needs major improvements otherwise it doesn't have a future as a viable open source project. Personally I'd like to hack in storage of pdf/other fulltext into the database, but I can't do this in it's current state. I can also see why NPG won't do this themselves. My vision is for distributed collaborative bibliographies, which is why I'd like so much for the project to scale down to a level where it could be used by 2-10 researchers on shared hosting. I think optional re-import back to a master connotea would be fairly easy to implement after the scaling down problem was addressed. Part of the problem is that connotea came about at a time of great flux during the web app programming space, and the technology to program these things has improved massively during this time. Unfortunately it leaves connotea with more technical debt. I'm happy to clarify anything I've written here on request. If the response is going to be "isn't going to happen, sorry about that", I'm prepared to accept that too. As a first step I'd recommend popping on to #catalyst at irc.perl.org and asking about porting mod_perl apps to Catalyst. -- Kieren Diment Centre for Leadership and Knowledge Management School of Management, Marketing |