Thread: [Connotea-code-devel] feedback

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I've been wanting to write this email for a while, and you can  
consider me a strong supporter of what Connotea/NPG  is trying to  
do.  However there are some major problems that I think need to be  
addressed if connotea is to become a self sustaining open source  
project with a sustainable developer community.

A bit about me:  I'm a social researcher and perl hacker.  Most of  
the perl I write is for research data management, but I do  
bibliographic stuff, web robots and occasional web applications  
programming too.  I've been a significant contributor to the Catalyst  
web application framework (mainly with the documentation and example  
code).  I'm computer literate, and I've used mod_perl enough to have  
a good idea of what irritates me about it.  I think that Catalyst is  
the natural successor for 95% of the mod_perl or Apache::Registry  
scripts out there.

Problems I've experienced with connotea as an open source project.   
Suggested solutions in [square brackets].  Roughly in order of  
importance.

1.  No test suite.  Lack of unit/behaviour tests along with the  
impoverished debugging environment with mod_perl make for painful  
development. [ SOLUTION:  Port to catalyst with built in testing  
utilities and trivial support for perl -d ]

2.  Class::DBI + memcached.  These seem to be very tightly coupled.   
Class::DBI is flawed software, and while it has proved very useful,  
it clearly has severe limitations.  One example is the difficulty of  
inspecting the sql that's it's generating.  It's main problem is a  
general lack of transparency and implicitly generated code.  CDBI   
"died" as an ongoing open source project about a year ago.  The  
original author of CDBI's last project resulted in him fixing  
DBIx::Class' CDBI Compatibility layer  so he could port his client's  
code from CDBI  to DBIx::Class.

[SOLUTION:  implied by the last comment DBIx::Class (DBIC)  is the  
natural successor to CDBI, originally developed by the author of  
CDBI::Sweet.  DBIC + memcached has production users.  Discussion on  
IRC  (#dbix-class on irc.perl.org) indicates to me that swappable  
caching engines (including a null cache for debugging) ) should be  
trivial and transparent with a DBIC based data model.  The killer  
dbic feature for me is that running your dbic script like so:   
"DBIC_TRACE=1 perl run_my_script.pl" gives you the exact sql being  
passed to the dbic classes.  This outputs to standard error, thus  
making debugging hugely easier.  As noted, there's a CDBICompat layer  
to ease transition from CDBI to DBIC.  CDBICompat has a more  
extensive test suite than CDBI for good measure.  Finally DBIC tends  
to result in much more efficent SQL than CDBI and swapping out  
database engines (eg mysql to pg to sqlite) is much easier - I've  
seen this done trivially from pg to sqlite for a database  
representing a directed acyclic graph. ]

3.  Template toolkit.  The templates in connotea seem to be populated  
with weird coderefs that make debugging/interrogation even more  
difficult.  TT is one of the best templating solutions out there, but  
connetea seems to misuse it. [SOLUTION:  Port to explicit templates  
resident in files, using PROCESS, INCLUDE and MACRO blocks where  
appropriate.  This should be fairly simple to do concurrently with  
the rest of the catalyst/dbic port. ]

4.  Connotea doesn't scale down well.  I've used it on an ibook g4  
for testing (performance verging on reasonable), and a pentium 3  
linux machine with 128mb of RAM where performance was unacceptable.  
[SOLUTON:  port to catalyst where fast_cgi, mod_perl and other more  
exotic engines are all viable solutions.  Removing the mod_perl  
depencency opens up shared hosting possibilities]

There are some excellent things about connotea (auto-import and the  
database schema being two big examples), but the above are show  
stoppers which are going to cause huge problems for the  
sustainability of the project.  The usability for end users is great,  
but as far as programmer-usability goes connotea needs major  
improvements otherwise it doesn't have a future as a viable open  
source project.  Personally I'd like to hack in storage of pdf/other  
fulltext into the database, but I can't do this in it's current  
state.  I can also see why NPG won't do this themselves.

My vision is for distributed collaborative bibliographies, which is  
why I'd like so much for the project to scale down to a level where  
it could be used by 2-10 researchers on shared hosting.  I think  
optional  re-import back to a master connotea  would be fairly easy  
to implement after the scaling down problem was addressed.

Part of the problem is that connotea came about at a time of great  
flux during the web app programming space, and the technology to  
program these things has improved massively during this time.   
Unfortunately it leaves connotea with more technical debt.

I'm happy to clarify anything I've written here on request.  If the  
response is going to be "isn't going to happen, sorry about that",  
I'm prepared to accept that too.  As a first step I'd recommend  
popping on to #catalyst at irc.perl.org and asking about porting  
mod_perl apps to Catalyst.

--
Kieren Diment
Centre for Leadership and Knowledge Management
School of Management, Marketing

Thread: [Connotea-code-devel] feedback

connotea-code-devel