Re: [sprog-users] Re: DBI Gear

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, 2005-07-01 at 21:42 +1000, Matthew Keene wrote:
> --- Grant McLean <gr...@mc...> wrote:
> 
> > On Thu, 2005-06-30 at 19:45 +1000, Matthew Keene
> > wrote:
> > > >Obviously Sprog needs the ability to source data
> > from
> > > >a database via DBI. 
> > > 
> > > I'm trying to write one at the moment.
> > 
> > Great!  (See I said it was an obvious need).
> > 
> I've got a synchronous DBIGear working, 

Excellent.  I just tried it and had no problems.

> I'd appreciate any comments (I'll attach it here).

And likewise, I'd appreciate comments from people who are 
building gears and have questions about how things are meant 
to work (or why) or comments about how things could be better.

Here are some thoughts from browsing through your code ...

Package Namespace:

You might not have got the memo, but all 'third-party' extensions
to Sprog should come under the SprogEx namespace.  Also, in your 
case the word 'Gear' appears redundantly.  I'd suggest changing 
from:

  package Sprog::Gear::DBIGear;

to something like:

  package SprogEx::Gear::DBIQuery;

The 'engage' Method:

Your gear does all its work in the 'engage' method.  The result 
is that a query which returns a million row result set will dump 
a million messages in the input queue of the next gear - which 
isn't how I intended it too be used.

The framework relies on cooperative multitasking, so if any one 
method is working for a long time and not returning, no other 
gear is getting a chance to process its incoming messages.  By 
sending messages in chunks, machines can work with arbitrarily 
large data sets.  

It's not usually a consideration for filter gears since they 
just process each message as it is received.  Data source gears 
which generate messages are expected to be a bit 'nicer'.

The Sprog scheduler will call the 'send_data' method of a data 
source gear whenever it is ready for the next chunk of data.  The 
gear may respond by sending more than one message, but shouldn't 
send an arbitrarily large number.  I'd suggest restructuring your 
code so that the connect/prepare/execute happen in the engage 
method, but the fetch happens in the send_data method.  Something 
like this:

sub send_data {
  my $self = shift;

  if(my $hash_ref = $self->{_sth}->fetchrow_hashref) {
    $self->msg_out(record => $hash_ref) ;
    return;
  } 

  $self->{_dbh}->disconnect ;
  $self->disengage ;
}

Password Input:

You can set the password text to be obscured (eg: ******), if 
you want to.  In Glade, select the password Entry widget and 
toggle the 'Text Visible' button from 'Yes' to 'No'.

POD:

You obviously haven't gotten around to doing the POD yet, but you really
out to put your own name in the copyright section at the very least.

> I'm working on a design which will allow the query to
> run asynchronously, I'm going to start a worker
> process which will use sockets to communicate back to
> the caller.  This is apparently what POE does to
> simulate asynchronous database calls, and I figure
> that it might be a bit easier to have structured (and
> potentially binary) communication than IPC::Open*.

The only reason I was thinking of IPC::Open3 was because it is 
meant to handle spawning another process properly on Win32 (without
using fork emulation).

> I'm a bit concerned that starting a worker process and
> using interprocess communication (of some form) is
> quite cumbersome and error-prone (not to mention
> slower)

Ultimately, I intend to have more support in the Sprog framework 
for doing this type of thing.  It will be needed for other data 
sources (and sinks) as well.

> and I'll probably end up making it a
> user-selectable option to run it asynchronously or
> not.  This gives the user the ability to run it in
> process if they're confident that the select statement
> will return in a reasonable amount of time or are
> willing to wait for it.  

Personally, I don't see the need for two different approaches, 
but if you want to then the simplest approach is to just make 
two gears.  But whether the user chooses a different gear or a 
checkbox in a dialog, it's not clear to me what they'd base 
their choice on.

> It seems a bit like the tail
> wagging the dog if the whole thing has to run out of
> process just so someone can click the stop button,
> when this is probably the exceptional case.  Your
> thoughts ?

It is more complex than the equivalent Perl script you'd run 
from the command line.  But that's the nature of working in an 
event driven environment.

Cheers
Grant