Re: [Connotea-code-devel] feedback

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thanks for the quick feedback to my feedback :).  This appears to be  
quite complicated, hence the top posting (sorry ;-) )

I'd love to hack on connotea but in it's present state I can't mainly  
because of the mod_perl dependency and the tight coupling of the CDBI  
code with the application logic. I'd like to have a look at  
connotea.cgi - if it's reasonably simple I might even be able to  
demonstrate a partial catalyst port (no guarantees).

Fundamentally the connotea  codebase lacks the MVC pattern.  Moving  
to a more MVC architecture with test coverage is going to make life  
easier for everyone, senior and junior developers alike, as well as  
casual hackers like me.   So as a second iteration I'm going to deal  
with this as an MVC problem (CMV actually)

1.  Controller

  Porting to Catalyst provides the framework for this, with sane url/ 
action dispatching (I couldn't work out how this works looking at the  
connotea code base  with my limited available time.  This is another  
problem).  Using the catalyst dispatcher makes it very quick for new  
developers to get their head around the flow of logic within the  
application (the controller part).  The main task here is refactoring  
the mod_perl handler dispatching logic into something catalyst  
compatible.  I've seen people doing this a few times, and it should  
be pretty quick and straightforward for someone familliar with the  
existing mod_perl code base.  An experienced mod_perl hacker (which  
I'm not) from the catalyst project was going to have a look at the  
connotea code base for me to tell me if my assumptions above are  
correct, but because there's no code viewable in a web browser, this  
proved too much friction ... for now.  (Here illustrates a major  
benifit of open source software communities - a lot of my comments  
don't come from my expertise, but from knowing how to ask the right  
questions on the appropriate irc channel).

A major benefit of catalyst  is that perl -d script/ 
connotea_server.pl (with appropriate $DB::single=1 statements in your  
code) provides excellent debugging support.  I suspect that porting  
to catalyst would pay itself back in reduced development and training  
time very quickly.

2.  Model

DBIC is much more flexible and much faster than CDBI.  A port would  
again reduce technical overhead.  However if you can get the CDBI  
model working independently of the mod_perl app, it becomes usable  
within catalyst, and you can defer the decision to port while  
improving the structure of the codebase.  However, the tight coupling  
of the cdbi models with memcached is another impediment to casual  
hacking on connotea.  I suspect that porting to DBIC would make life  
a lot easier for junior and senior developers alike, and be the  
easiest way to get a web-app-independent model available as well.   
Thanks for reminding me about the sql trace stuff you added.  It was  
a while ago, but not being able to see the do() stuff was  a show  
stopper for me I'm afraid.

3.  View

Based on your feedback, if using  catalyst, you'd probably want to  
make a souped up Catalyst::View::ConnoteaTT based on  
Catalyst::View::TT.  Maybe also  make a  Catalyst::View::RDF as  
well.  Improving the architecture to something more MVC is going to  
make all these little components much more modular and easier for new  
(junior and senior) developers alike to understand.

A core problem here is that with the current architecture/libraries  
of connotea you don't have very good ability to leverage the perl  
community to help with development.  You should also upload the  
connotea perl api libraries to CPAN so that connotea becomes more  
visible to the perl community at large, and make a post to  
use.perl.org once the new version is out with it's vc repository.

I didn't suggest Rails because Catalyst is far superior both in  
library support and that it's much less opinionated, and thus has the  
flexibility that connotea requires. :-D

Kieren

On 9 Aug 2007, at 05:00, Martin Flack wrote:

> Hi Kieren,
>
> Thanks for your email. I'm responsible for most of the codebase so  
> I will step into the firing line first. ;-)
>
> Kieren Diment wrote:
>> I've been wanting to write this email for a while, and you can   
>> consider me a strong supporter of what Connotea/NPG  is trying to   
>> do.  However there are some major problems that I think need to  
>> be  addressed if connotea is to become a self sustaining open  
>> source  project with a sustainable developer community.
>
> Again thanks. I'm aware of most of what you raise, and in fact have  
> discussed them with folks, but it doesn't hurt to acknowledge on  
> the devel list.
>
> Actually this works out nicely since we are about to release a new  
> version of Connotea Code literally any day now, and we are  
> committed to posting a public darcs repository and inviting more  
> public support. I'm actually thinking you have seen the new code by  
> some of your comments.
>
>> A bit about me:  I'm a social researcher and perl hacker.  Most  
>> of  the perl I write is for research data management, but I do   
>> bibliographic stuff, web robots and occasional web applications   
>> programming too.  I've been a significant contributor to the  
>> Catalyst  web application framework (mainly with the documentation  
>> and example  code).  I'm computer literate, and I've used mod_perl  
>> enough to have  a good idea of what irritates me about it.  I  
>> think that Catalyst is  the natural successor for 95% of the  
>> mod_perl or Apache::Registry  scripts out there.
>
> I've looked at Catalyst but I've not used it.
>
>> 1.  No test suite.  Lack of unit/behaviour tests along with the   
>> impoverished debugging environment with mod_perl make for painful   
>> development. [ SOLUTION:  Port to catalyst with built in testing   
>> utilities and trivial support for perl -d ]
>
> The new version of Connotea Code sports a fledgling test suite. It  
> doesn't shoot for code coverage, just to test the basic functions.  
> It's an area I'd like to strengthen.
>
> I concur on the mod_perl debugging situation.
>
>> 2.  Class::DBI + memcached.  These seem to be very tightly  
>> coupled.   Class::DBI is flawed software, and while it has proved  
>> very useful,  it clearly has severe limitations.  One example is  
>> the difficulty of  inspecting the sql that's it's generating.   
>> It's main problem is a  general lack of transparency and  
>> implicitly generated code.  CDBI   "died" as an ongoing open  
>> source project about a year ago.  The  original author of CDBI's  
>> last project resulted in him fixing  DBIx::Class' CDBI  
>> Compatibility layer  so he could port his client's  code from  
>> CDBI  to DBIx::Class.
>
> Accurate observation. At the time it seemed like the best way to  
> provide abstractions that made the model possible.
>
> I'd add a point that you didn't make, which is that  
> Bibliotech::Query is  very complicated, not very clear, and the  
> queries it generates are huge, but to some extent it's by necessity.
>
> For the type of thing like Connotea, Class::DBI is way too slow  
> left to its own devices - a page was taking 800 queries to pull up  
> all the pieces of data, and we solved that with a scary-looking but  
> fast-executing query. Another problem might be that it was designed  
> to be a very flexible query engine, and as it turns out that  
> flexibility is obscuring some of the functionality, i.e. tightening  
> the restraints might make prettier source code.
>
> What we got from Class::DBI was a convenient abstraction that  
> database rows would be objects that could relate to each other.  
> Beyond that we've extended the heck out of it, and probably not  
> canonically, although in our defense Class::DBI is only designed to  
> have a canonical usage for the theoretical database model, not a  
> whole application.
>
> The new version of Connotea Code has a Bibliotech::Query with even  
> more speed optimizations but again it's another layer of complexity.
>
>> [SOLUTION:  implied by the last comment DBIx::Class (DBIC)  is  
>> the  natural successor to CDBI, originally developed by the author  
>> of  CDBI::Sweet.  DBIC + memcached has production users.   
>> Discussion on  IRC  (#dbix-class on irc.perl.org) indicates to me  
>> that swappable  caching engines (including a null cache for  
>> debugging) ) should be  trivial and transparent with a DBIC based  
>> data model.  The killer  dbic feature for me is that running your  
>> dbic script like so:   "DBIC_TRACE=1 perl run_my_script.pl" gives  
>> you the exact sql being  passed to the dbic classes.  This outputs  
>> to standard error, thus  making debugging hugely easier.  As  
>> noted, there's a CDBICompat layer  to ease transition from CDBI to  
>> DBIC.  CDBICompat has a more  extensive test suite than CDBI for  
>> good measure.  Finally DBIC tends  to result in much more efficent  
>> SQL than CDBI and swapping out  database engines (eg mysql to pg  
>> to sqlite) is much easier - I've  seen this done trivially from pg  
>> to sqlite for a database  representing a directed acyclic graph. ]
>
> I've got SQL tracing - everything except do() calls and without  
> question-mark translation - if you call Bibliotech::DBI- 
> >activate_warn_sql(). In mod_perl this sends up in Apache's error_log.
>
> Again I've hard of DBIx::Class but was not aware that it offered  
> any major benefits.
>
>> 3.  Template toolkit.  The templates in connotea seem to be  
>> populated  with weird coderefs that make debugging/interrogation  
>> even more  difficult.  TT is one of the best templating solutions  
>> out there, but  connetea seems to misuse it. [SOLUTION:  Port to  
>> explicit templates  resident in files, using PROCESS, INCLUDE and  
>> MACRO blocks where  appropriate.  This should be fairly simple to  
>> do concurrently with  the rest of the catalyst/dbic port. ]
>
> Anything that looks like misuse of TT is probably a result of  
> yielding functionality from our original Component system to TT,  
> where the rest of the Component system is still there, and which  
> was more capable than what we perceived TT could do, and we needed  
> to preserve functionality.
>
> It may be that TT could do more than we realized, or could not then  
> but can now, and in that case it would be, as you point out, better  
> to use the native TT functionality rather than homegrown  
> functionality. I agree on principle.
>
> As an example, our Component system was designed to put all the  
> logic for a component in one place, and avoid expensive  
> recalculation. To that end, a component's result is not just a  
> snippet of HTML like an INCLUDE, but an object that contains HTML  
> parts that can be put in different places on the calling web page,  
> as well as the Javascript for the <head> and Javascript for <body  
> onload=""> that facilitate the component, so if you want to, say,  
> position the cursor in the first form field, you can have the  
> component return the HTML form and that Javascript together.
>
> The new Connotea Code release has a section in the README to cover  
> the TT function calls which should explain things better.
>
>> 4.  Connotea doesn't scale down well.  I've used it on an ibook  
>> g4  for testing (performance verging on reasonable), and a pentium  
>> 3  linux machine with 128mb of RAM where performance was  
>> unacceptable.  [SOLUTON:  port to catalyst where fast_cgi,  
>> mod_perl and other more  exotic engines are all viable solutions.   
>> Removing the mod_perl  depencency opens up shared hosting  
>> possibilities]
>
> We used to have a bibliotech.cgi that could do exactly that (using  
> Apache::Emulator) but we dropped it because NPG had no need for it  
> and it was after all just a hack. I'd be happy to share it with you  
> if you are interested, but the added memcached requirement is the  
> main obstacle I think.
>
>> There are some excellent things about connotea (auto-import and  
>> the  database schema being two big examples), but the above are  
>> show  stoppers which are going to cause huge problems for the   
>> sustainability of the project.  The usability for end users is  
>> great,  but as far as programmer-usability goes connotea needs  
>> major  improvements otherwise it doesn't have a future as a viable  
>> open  source project.  Personally I'd like to hack in storage of  
>> pdf/other  fulltext into the database, but I can't do this in it's  
>> current  state.  I can also see why NPG won't do this themselves.
>
> Thanks for the compliments!
>
> At least you're not complaining it's not Ruby on Rails. ;-)
>
> That's a joke, but I acknowledge that CPAN marches forwards even  
> after we have selected our libraries and yes platforms make a huge  
> difference. We selected solutions to match the engineering and then  
> we tried to move to some newer approaches as dictated by practical  
> concerns. Even moving to TT was only a direct response to needing  
> non-programmers to do editing and design work.
>
>> My vision is for distributed collaborative bibliographies, which  
>> is  why I'd like so much for the project to scale down to a level  
>> where  it could be used by 2-10 researchers on shared hosting.  I  
>> think  optional  re-import back to a master connotea  would be  
>> fairly easy  to implement after the scaling down problem was  
>> addressed.
>
> Your work sounds interesting.
>
>> Part of the problem is that connotea came about at a time of  
>> great  flux during the web app programming space, and the  
>> technology to  program these things has improved massively during  
>> this time.
>
> You are saying something that sounds to me like:
>
> - refactoring for better library support would make easier development
>
> I'd also add:
>
> - like all things, more man-hours on the codebase for general  
> refactoring or more test suite scripts would make easier  
> development, keeping the same libraries even
>
> - refactoring to remove some early assumptions would make a cleaner  
> codebase as well
>
> ...and I'd argue that those two points are just as important.
>
>> Unfortunately it leaves connotea with more technical debt.
>
> Well, I think you may be being a bit hard on us here. ;-)
>
> This is a classic challenge facing companies with working code;  
> whether or not to spend time refactoring things.
>
> When allocating sparse resources carefully, while being customer  
> driven, a lot of the push is inevitably on keeping it running and  
> adding features rather than revising the abstractions for something  
> that already works.
>
> Further, things that may make programming easier but don't actually  
> lower the complexity level of the codebase are not as appealing in  
> general as refactoring projects; e.g. switching from Class::DBI to  
> DBIx::Class is not going to make it easier to understand, (maybe)  
> just easier to work with. The concepts don't change much, you still  
> have an abstraction layer that provides SQL support. You and I can  
> appreciate that, but companies generally are interested in  
> refactoring code when it can introduce junior programmers, not just  
> make it easier for the senior programmers.
>
> Having said that, your suggestions that we may be doing something  
> in a non-standard manner, if they can be done in a standard manner,  
> I do consider bugs to fix, as we should do things the same way as  
> everyone else where possible.
>
> If Class::DBI -> DBIx::Class is thought of as that type of problem,  
> then I can appreciate the desire to switch over. But I'm speaking  
> here more of things like using homegrown approaches over TT  
> directives, as discussed above.
>
>> I'm happy to clarify anything I've written here on request.  If  
>> the  response is going to be "isn't going to happen, sorry about  
>> that",  I'm prepared to accept that too.  As a first step I'd  
>> recommend  popping on to #catalyst at irc.perl.org and asking  
>> about porting  mod_perl apps to Catalyst.
>
> I'm not the final word, and even if I was, what you've presented  
> deserves study and thought before issuing the final word. ;-)
>
> I suggest we keep the conversation going.
>
> Regards,
> Martin Flack
>