[matplotlib-devel] John: Thoughts on a standard test system

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

John,
Sometime in January, we are going to spend some time fixing a few minor MPL bugs we've hit and a probably work on a few enhancements (I'll send you a list in Jan before we start anything - it's nothing major).  We're also going to work on writing a set of tests that try various plots w/ units.  I was thinking this would be a good time to introduce a standard test harness into the MPL CM tree.

I think we should:

1) Select a standard test harness.  The two big hitters seem to be unittest and nose.  unittest has the advantage that it's shipped w/ Python.  nose seems to do better with automatic discovery of test cases.

2) Establish a set of testing requirements.  Naming conventions, usage conventions, etc.  Things like tests should never print anything to the screen (i.e. correct behavior is encoded in the test case) or rely on a GUI unless that's what is being tested (allows tests to be run w/o an X-server).  Basically write some documentation for the test system that includes how to use it and what's required of people when they add tests.

3) Write a test 'template' for people to use.  This would define a test case and put TODO statements or something like it in place for people to fill in.  More than one might be good for various classes of tests (maybe an image comparison template for testing agg drawing and a non-plot template for testing basic computations like transforms?).

Some things we do on my project for our Python test systems:

We put all unit tests in a 'test' directory inside the python package being tested.  The disadvantage of this is that potentially large tests are inside the code to be delivered (though a nice delivery script can easily strip them out).  The advantage of this is that it makes coverage checking easier.  You can run the test case for a package and then check the coverage in the module w/o trying to figure out which things should be coverage checked or not.  If you put the test cases in a different directory tree, then it's much harder to identify coverage sources.  Though in our case we have 100's of python modules - in MPL's case, there is really just MPL, projections, backends, and numerix so maybe that's not too much of a problem.

Automatic coverage isn't something that is must have, but it is really nice.  I've found that it actually causes developers to write more tests because they can run the coverage and get a "score" that other people will see.  It's also a good way to check a new submission to see if the developer has done basic testing of the code.

For our tests, we require that the test never print anything to the screen, clean up any of its output files (i.e. leave the directory in the same state it was before), and only report that the test passed or failed and if it failed, add some error message.  The key thing is that the conditions for correctness are encoded into the test itself.  We have a command line option that gets passed to the test cases to say "don't clean up" so that you can examine the output from a failing test case w/o modifying the test code.  This option is really useful when an image comparison fails.

We've wrapped the basic python unittest package.  It's pretty simple and reasonably powerful.  I doubt there is anything MPL would be doing that it can't handle.  The auto-discovery of nose is nice but unnecessary in my opinion.  As long as people follow a standard way of doing things, auto-discovery is fairly easy.  Of course if you prefer nose and don't mind the additional tool requirement, that's fine too.  Some things that are probably needed:

- command line executable that runs the tests.
        - support flags for running only some tests
        - support flags for running only tests that don't need a GUI backend
          (require Agg?).  This allows automated testing and visual testing to be
          combined.  GUI tests could be placed in identified directories and then
          only run when requested since by their nature they require specific backends
          and user interaction.
        - nice report on test pass/fail status
        - hooks to add coverage checking and reporting in the future
- test utilities
        - image comparison tools
        - ??? basically anything that helps w/ testing and could be common across
          test cases

As a first cut, I would suggest is something like this:

.../test/run.py
         mplTest/
         test_unit/
         test_transform/
         test_...

The run script would execute all/some of the tests.  Any common test code would be put in the mplTest directory.  Any directory named 'test_XXX' is for test cases where 'XXX' is some category name that can be used in the run script to run a subset of cases.  Inside each test_XXX directory, one unittest class per file.  The run script would find the .py files in the test_XXX directories, import them, find all the unittest classes, and run them.  The run script also sets up sys.path so that the mplTest package is available.

Links:
http://docs.python.org/library/unittest.html
http://somethingaboutorange.com/mrl/projects/nose/
http://kbyanc.blogspot.com/2007/06/pythons-unittest-module-aint-that-bad.html

coverage checking:
http://nedbatchelder.com/code/modules/coverage.html
http://darcs.idyll.org/~t/projects/figleaf/doc/

Thoughts?
Ted
ps: looking at the current unit directory, it looks like at least one test (nose_tests) is using nose even though it's not supplied w/ MPL.  Most of the tests do something and show a plot but the correct behavior is never written into the test.