From: Jonathan M. <jma...@fa...> - 2008-03-26 07:26:00
|
Frank W. Zammetti wrote: > What I was saying is that if we're going to be in the business of > writing reports to test with, which we don't have right now, wouldn't it > be better if they were in a form that we could easily add them to the > build script as automated tests? Clearly we run the test task before we > roll a release, and hopefully before we commit code, and I think it'd be > hella-cool, to quote South Park, if we could also add to that test task > the running of some test reports with automated checking of the results > as I suggested earlier. I think this is feasible. OK. Sure. I suppose I'm seeing a set of cross-database tests, for which more initial setup is by definition needed (you need to have lots of database backends ready to go!). If we're already doing that, then using one backend that is easy to install and relatively popular for the kinds of databases likely to have a need for multiple schemas is a good choice for initial testing of schema-related changes. And given that starting point, Derby seemed an odd choice. > I'm also saying that I'd prefer to not raise the bar with regard to what > a DV developer has to have installed. Developer? Or committer? Or release engineer? To *develop* you need the development environment, and that's all. You need any one database back end of your choice to test against (at least, in an ideal setup you would need that -- right now I'm not clear if ant test just assumes Derby is available, or doesn't even need a backend?). Going forward, additionally, we are (hopefully) creating some tests that are intended to reveal how well we cope with the differences between backend databases. For those, by definition, whoever runs them will need multiple databases available to them. So for that not-yet-existent set of tests, more setup will be needed. That's unavoidable. That is *only* a new burden on all developers *if* we have a rule that "all developers must be able to run all the new multi-back-end tests on all available backends"... but that is not the current situation, there is no such rule because there are no such tests yet. We're breaking new ground. Until we start Wiki pages called "Commit Requirements" and "Pre-release Testing Requirements" or whatever, the definition of what a committer or what a release engineer "has to have installed" remains very informal. Basically, it is whatever you tell me it is, and adding new tests doesn't change that :-) > ... saying they need MySQL, Postgres, maybe SQL Server, Oracle, > whatever else, would be much more unusual. Definitely. But adding a new test of tests does not require every developer, nor even every committer, to install the prerequisites those tests demand. > I'd claim Oracle is the most important RDBMS out there anyway, and I > suspect I'd get little argument from most users ... Interesting; I'd have thought those who could afford Oracle could probably also afford commercial report generators to go with it? :-) In one sense, it's a nice indicator of DataVision's quality and usefulness, if it is commonly being used in environments where high initial software cost is not a significant obstacle to deployment! > ... from experience I can say Oracle is not as trivial. My last experience doing an Oracle install was at least 5 years ago, and I agree, it definitely wasn't trivial! But that's not the point: surely, long term, we should have an automated set of tests that can be run against a large set of different database backends, and the output that results indicates which ones were tested, and which were skipped, as well as how the tests did on each platform on which they were attempted. What percentage of the possible backends we require a given release to have been run against before we roll out DataVision 2.6.0 is totally undecided at this point :-) We could decide that a release needs someone somewhere to have run the testset such that all tests have succeeded on all backends for which the tests are configured; given enough developers that might be reasonable. Expecting one person to have all of them... is unreasonable. Supposing we add the ability for this test suite to run against a JDBC driver for a backend that only runs on VMS minis? Or on Amigas? :-) > And even saying those others aren't a big deal to install and configure, > do we really want to add those hurdles for developers anyway? No. But increasing hurdles for some roles... committers, release engineers, QA managers... as the team grows, that is entirely reasonable. Initial developer hurdles should be lowered as far as we can, definitely -- but that doesn't mean all possible development roles will have equally low hurdles. > You also might want to say that only those rolling releases need worry > about those tests, and therefore the requirements bar is only higher for > them... I *might* agree with that to a point, except that (a) we want to > ensure every DV developer can roll a release if need be, that's only > prudent with community-driven projects where people can come and go as > they please, and (b) I'd hope every committer (I'm being optimistic here > and saying it won't just be you and I forever!) runs *all* available > automated tests before checking any code in. For (a): even 'developer' and 'committer' are different levels of commitment to a project, in larger projects at least. And in an emergency where you previously made a rule saying "thou shalt run all these N tests successfully before making a release" but no-one who can do this is available... those left in the game declare a temporary emergency, change that rule to suit current circumstances, and keep right on going! In general, humans are fairly well-equipped to break human-made rules and regulations :-) For (b): "running" the set of tests, as long as they automatically skip all locally unavailable database backends, is fine with me! Then, being a committer requires only whatever small set of backends "the rule" says need to be reported as having been used and successfully tested for each commit! That is a policy decision, not a technical one. Of course we can't reasonably expect every developer to have Oracle, DB2, Sybase, and MS SQL Server available to them for testing before they can do anything at all with the code. We'd (maybe) like a 16 year old gifted genius using a 4 year old PC running FreeBSD in his parents basement to be able to develop with us! But we *can* still automate a bunch of tests that work on all those (big or expensive or OS-specific) database platforms, and several easier to install and less costly to acquire platforms too, and get useful overall testing done that is not currently being attempted at all :-) One more (possibly blue-sky) idea: could we set up a bunch of test database backends on a SourceForge shell machine, and set up automated test runs on that system nightly, or something along those lines? I think we are some distance away away from this kind of thing being the most useful way to spend developer time (!), but it's not inconceivable as a project once we have a significant set of automated tests across several backends. > So even if we were to say > split them out, i.e., have the test task that does what it does today, > and a testReports tasks maybe that does the report-based testing with > the idea that the later only needs to be run by those rolling releases, > I'd hope every developer ran both all the time anyway before committing, > not just the test task. And that's unrealistic, if the new set of tests 100% requires every single possible database backend that it can use. But that's not something I've ever imagined it as doing. > I was only saying Derby, and the other > embeddable RDBMS's because they can easily be included with the source > tree. Well, yes and no. The sqlite one in there now doesn't work except in Windows (I'll fix that soon!). And, BTW, we may be slightly shy of the LGPL licencing requirements here or there, too, with some of the .jar files in 1.1.0. I think including third party stuff is more work than it looks, to do 100% right. I think including one smallish and portable embeddable backend (Derby is a logical choice for this) makes good sense, so tutorials and examples etc. can depend on it being there. More than that is extra work keeping them up to date etc., for minimal real benefit, IMO. Better to have our documentation say where to get and how to use JDBC drivers for 40 databases of all shapes and sizes, than to spend equivalent developer effort including perhaps 4 or 5 embeddable ones directly, I'd think. Let's keep the release tarball size sane, and let the end user choose what to database software to use, once he knows his DataVision setup basically works using the included single backend, and has learn how to use it. >> We also may want to avoid keeping test and >> example database files (like dvtestdb) at the top of the subversion >> tree? ... > Well, I think that maybe leads to a larger question... at present, what > you download when you grab a DV release is binary AND source combined. > That's a bit atypical, most open-source projects (but not all) split the > two out. I think reducing release tarball size would be more readily effected by removing test data, unnecessary libraries, etc. than by removing the source code and build.xml file... only once we have removed other larger items that are not really as much a part of the project, and still feel a need to get it even smaller, would the idea of removing the source code from it come up in my own mind :-) Removing developer test data and test result files that are the "correct" output from the release package, though might well be smart, if that collection grows as big as I hope it eventually will! > I'm inclined to just leave things at it is and add a new test > directory right off the root as you describe here. OK, can do. >> Hmm, now I think about it, do we really want those database files in the >> subversion tree at all? > I agree there, and in fact I've run into issues with SVN and locked > files in the sample database. I think probably the ideal approach is > what you describe, and also have the database created fresh as part of > the automated testing, assuming we go down that path at all. Yes, we're in agreement there. > My only concern is that my whole intent with the live database example > report was to give something in the distro that was never there before, > namely an example report that actually ran against a database, that a > new user could run and play with *immediately*. I think that's an > attractive thing to include, and yet I do agree with your point about > binaries in the source tree (also something I'm real anal about with my > other projects). I'm not sure what the best answer is, but you can see > how I'm trying to balance the two concerns. Yes. If we already include derby.jar and jruby.jar, couldn't the startup function in an example report actually call back into a little "example database creation" DataVision component, and so cause it to create the example database? Best of both worlds, if that is doable via Ruby and BSF -- no static binary data lurking in subversion, but no extra work for novices running the example report. The example database just magically appears when needed :-) > I *do* however think there's a *ton* of value in an automated > test, part of the build script in some way, that runs a number of > reports against a number of back-ends and verifies the output. That's > an extremely attractive idea to me. I'm no expert in the current test > tasks to be sure, but it doesn't seem that it goes to that extent, it's > much more unit test-oriented, which is of course very useful, but what > we're both describing is probably more accurately termed automated > integration testing. Good. So maybe I won't worry too much about integrating what I'm doing back into the existing ant test stuff (at least not yet). > Now, if we go down the path of having the DV code smart enough to make > the determination, and assuming it works in all cases we can imagine, > then no, clearly there's no compatibility issue, or at least there > shouldn't be. But that does away with the need for any report setting I > think, so we're really talking about two different things, and my > comment about compatibility was with regard to the report setting, not > the automatic determination code. OK, sounds like I got confused there. Thanks. > I think the right answer, the one I'd most like to see implemented, is > "automatic determination by DV". The user has to do nothing, there's no > settings in the report file, no command-line switches, etc., it's just > figured out by DV and everything works, old reports, new reports, etc. > I believe that's the right answer. > > However... > > How hard is that to implement? I don't know. Actually, it's beginning to look rather possible... :-) Of course even if I had a 100% working implementation I felt good about, I still wouldn't have a library of thousands of "old reports" and their corresponding databases to automatically regression test against, so taking this approach means that at some point we trust our understanding of how things work. But I think we can (not in 1.2.0!!) get to the point where "automatic determination by DV" works, and seems clean enough that we're likely to "trust ourselves" enough to release it. Jonathan (finally well enough to go to work tomorrow, yay!) |