[DataVision-users] Infrastructure needed for testing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Frank W. Zammetti wrote:

> What I was saying is that if we're going to be in the business of
> writing reports to test with, which we don't have right now, wouldn't it
> be better if they were in a form that we could easily add them to the
> build script as automated tests? Clearly we run the test task before we
> roll a release, and hopefully before we commit code, and I think it'd be
> hella-cool, to quote South Park, if we could also add to that test task
> the running of some test reports with automated checking of the results
> as I suggested earlier. I think this is feasible.

OK.  Sure.  I suppose I'm seeing a set of cross-database tests, for 
which more initial setup is by definition needed (you need to have lots 
of database backends ready to go!).  If we're already doing that, then 
using one backend that is easy to install and relatively popular for the 
kinds of databases likely to have a need for multiple schemas is a good 
choice for initial testing of schema-related changes.  And given that 
starting point, Derby seemed an  odd choice.

> I'm also saying that I'd prefer to not raise the bar with regard to what
> a DV developer has to have installed.

Developer?  Or committer?  Or release engineer?  To *develop* you need 
the development environment, and that's all.  You need any one database 
back end of your choice to test against (at least, in an ideal setup you 
would need that -- right now I'm not clear if ant test just assumes 
Derby is available, or doesn't even need a backend?).

Going forward, additionally, we are (hopefully) creating some tests that 
are intended to reveal how well we cope with the differences between 
backend databases.  For those, by definition, whoever runs them will 
need multiple databases available to them.  So for that not-yet-existent 
set of tests, more setup will be needed.  That's unavoidable.  That is 
*only* a new burden on all developers *if* we have a rule that "all 
developers must be able to run all the new multi-back-end tests on all 
available backends"... but that is not the current situation, there is 
no such rule because there are no such tests yet.

We're breaking new ground.  Until we start Wiki pages called "Commit 
Requirements" and "Pre-release Testing Requirements" or whatever, the 
definition of what a committer or what a release engineer "has to have 
installed" remains very informal.  Basically, it is whatever you tell me 
it is, and adding new tests doesn't change that :-)

>  ... saying they need MySQL, Postgres, maybe SQL Server, Oracle,
>  whatever else, would be much more unusual.

Definitely.  But adding a new test of tests does not require every 
developer, nor even every committer, to install the prerequisites those 
tests demand.

> I'd claim Oracle is the most important RDBMS out there anyway, and I
> suspect I'd get little argument from most users ...

Interesting; I'd have thought those who could afford Oracle could 
probably also afford commercial report generators to go with it? :-)  In 
one sense, it's a nice indicator of DataVision's quality and usefulness, 
if it is commonly being used in environments where high initial software 
cost is not a significant obstacle to deployment!

> ... from experience I can say Oracle is not as trivial.

My last experience doing an Oracle install was at least 5 years ago, and 
I agree, it definitely wasn't trivial!

But that's not the point: surely, long term, we should have an automated 
set of tests that can be run against a large set of different database 
backends, and the output that results indicates which ones were tested, 
and which were skipped, as well as how the tests did on each platform on 
which they were attempted.

What percentage of the possible backends we require a given release to 
have been run against before we roll out DataVision 2.6.0 is totally 
undecided at this point :-)

We could decide that a release needs someone somewhere to have run the 
testset such that all tests have succeeded on all backends for which the 
tests are configured; given enough developers that might be reasonable.  
Expecting one person to have all of them... is unreasonable.  Supposing 
we add the ability for this test suite to run against a JDBC driver for 
a backend that only runs on VMS minis?  Or on Amigas? :-)

> And even saying those others aren't a big deal to install and configure,
> do we really want to add those hurdles for developers anyway?

No.  But increasing hurdles for some roles... committers, release 
engineers, QA managers... as the team grows, that is entirely 
reasonable.  Initial developer hurdles should be lowered as far as we 
can, definitely -- but that doesn't mean all possible development roles 
will have equally low hurdles.

> You also might want to say that only those rolling releases need worry
> about those tests, and therefore the requirements bar is only higher for
> them... I *might* agree with that to a point, except that (a) we want to
> ensure every DV developer can roll a release if need be, that's only
> prudent with community-driven projects where people can come and go as
> they please, and (b) I'd hope every committer (I'm being optimistic here
> and saying it won't just be you and I forever!) runs *all* available
> automated tests before checking any code in.

For (a): even 'developer' and 'committer' are different levels of 
commitment to a project, in larger projects at least.  And in an 
emergency where you previously made a rule saying "thou shalt run all 
these N tests successfully before making a release" but no-one who can 
do this is available... those left in the game declare a temporary 
emergency, change that rule to suit current circumstances, and keep 
right on going!  In general, humans are fairly well-equipped to break 
human-made rules and regulations :-)

For (b): "running" the set of tests, as long as they automatically skip 
all locally unavailable database backends, is fine with me!  Then, being 
a committer requires only whatever small set of backends "the rule" says 
need to be reported as having been used and successfully tested for each 
commit!  That is a policy decision, not a technical one.

Of course we can't reasonably expect every developer to have Oracle, 
DB2, Sybase, and MS SQL Server available to them for testing before they 
can do anything at all with the code.  We'd (maybe) like a 16 year old 
gifted genius using a 4 year old PC running FreeBSD in his parents 
basement to be able to develop with us!  But we *can* still automate a 
bunch of tests that work on all those (big or expensive or OS-specific) 
database platforms, and several easier to install and less costly to 
acquire platforms too, and get useful overall testing done that is not 
currently being attempted at all :-)

One more (possibly blue-sky) idea: could we set up a bunch of test 
database backends on a SourceForge shell machine, and set up automated 
test runs on that system nightly, or something along those lines?  I 
think we are some distance away away from this kind of thing being the 
most useful way to spend developer time (!), but it's not inconceivable  
as a project once we have a significant set of automated tests across 
several backends.

> So even if we were to say
> split them out, i.e., have the test task that does what it does today,
> and a testReports tasks maybe that does the report-based testing with
> the idea that the later only needs to be run by those rolling releases,
> I'd hope every developer ran both all the time anyway before committing,
> not just the test task.

And that's unrealistic, if the new set of tests 100% requires every 
single possible database backend that it can use.  But that's not 
something I've ever imagined it as doing.

> I was only saying Derby, and the other
> embeddable RDBMS's because they can easily be included with the source 
> tree.

Well, yes and no.  The sqlite one in there now doesn't work except in 
Windows (I'll fix that soon!).  And, BTW, we may be slightly shy of the 
LGPL licencing requirements here or there, too, with some of the .jar 
files in 1.1.0.  I think including third party stuff is more work than 
it looks, to do 100% right.

I think including one smallish and portable embeddable backend (Derby is 
a logical choice for this) makes good sense, so tutorials and examples 
etc. can depend on it being there. 
More than that is extra work keeping them up to date etc., for minimal 
real benefit, IMO.  Better to have our documentation say where to get 
and how to use JDBC drivers for 40 databases of all shapes and sizes, 
than to spend equivalent developer effort including perhaps 4 or 5 
embeddable ones directly, I'd think.  Let's keep the release tarball 
size sane, and let the end user choose what to database software to use, 
once he knows his DataVision setup basically works using the included 
single backend, and has learn how to use it.

>> We also may want to avoid keeping test and
>> example database files (like dvtestdb) at the top of the subversion 
>> tree? ...

> Well, I think that maybe leads to a larger question... at present, what
> you download when you grab a DV release is binary AND source combined.
> That's a bit atypical, most open-source projects (but not all) split the
> two out.

I think reducing release tarball size would be more readily effected by 
removing test data, unnecessary libraries, etc. than by removing the 
source code and build.xml file... only once we have removed other larger 
items that are not really as much a part of the project, and still feel 
a need to get it even smaller, would the idea of removing the source 
code from it come up in my own mind :-)   Removing developer test data 
and test result files that are the "correct" output from the release 
package, though might well be smart, if that collection grows as big as 
I hope it eventually will!

>  I'm inclined to just leave things at it is and add a new test
>  directory right off the root as you describe here.

OK, can do.

>> Hmm, now I think about it, do we really want those database files in the 
>> subversion tree at all?

> I agree there, and in fact I've run into issues with SVN and locked 
> files in the sample database.  I think probably the ideal approach is 
> what you describe, and also have the database created fresh as part of 
> the automated testing, assuming we go down that path at all.

Yes, we're in agreement there.

> My only concern is that my whole intent with the live database example 
> report was to give something in the distro that was never there before, 
> namely an example report that actually ran against a database, that a 
> new user could run and play with *immediately*.  I think that's an 
> attractive thing to include, and yet I do agree with your point about 
> binaries in the source tree (also something I'm real anal about with my 
> other projects).  I'm not sure what the best answer is, but you can see 
> how I'm trying to balance the two concerns.

Yes.  If we already include derby.jar and jruby.jar, couldn't the 
startup function in an example report actually call back into a little 
"example database creation" DataVision component, and so cause it to 
create the example database?  Best of both worlds, if that is doable via 
Ruby and BSF -- no static binary data lurking in subversion, but no 
extra work for novices running the example report.  The example database 
just magically appears when needed :-)

> I *do* however think there's a *ton* of value in an automated 
> test, part of the build script in some way, that runs a number of 
> reports against a number of back-ends and verifies the output.  That's 
> an extremely attractive idea to me.  I'm no expert in the current test 
> tasks to be sure, but it doesn't seem that it goes to that extent, it's 
> much more unit test-oriented, which is of course very useful, but what 
> we're both describing is probably more accurately termed automated 
> integration testing.

Good.  So maybe I won't worry too much about integrating what I'm doing 
back into the existing ant test stuff (at least not yet).

> Now, if we go down the path of having the DV code smart enough to make 
> the determination, and assuming it works in all cases we can imagine, 
> then no, clearly there's no compatibility issue, or at least there 
> shouldn't be.  But that does away with the need for any report setting I 
> think, so we're really talking about two different things, and my 
> comment about compatibility was with regard to the report setting, not 
> the automatic determination code.

OK, sounds like I got confused there.  Thanks.
> I think the right answer, the one I'd most like to see implemented, is 
> "automatic determination by DV".  The user has to do nothing, there's no 
> settings in the report file, no command-line switches, etc., it's just 
> figured out by DV and everything works, old reports, new reports, etc. 
> I believe that's the right answer.
>
> However...
>
> How hard is that to implement?  I don't know.

Actually, it's beginning to look rather possible... :-)  Of course even 
if I had a 100% working implementation I felt good about, I still 
wouldn't have a library of thousands of "old reports" and their 
corresponding databases to automatically regression test against, so 
taking this approach means that at some point we trust our understanding 
of how things work.  But I think we can (not in 1.2.0!!) get to the 
point where "automatic determination by DV" works, and seems clean 
enough that we're likely to "trust ourselves" enough to release it.

Jonathan (finally well enough to go to work tomorrow, yay!)