From: Paul L. <pl...@us...> - 2005-11-16 23:14:50
|
My name is Paul Larson and I work with James on the LTC Test team. I've looked at the design document a little today and have a few questions/comments about it. First, as Nigel already pointed out, NFS being called out specifically seemed a little odd. Was this just an example or is there a particular reason why it needs to be NFS? Is there a mechanism for specifying hardware requirements for the test to run correctly? I talked to James about this and I get the impression now that part of defining the job includes limiting the job to run on a set of specified machines. What would be very nice to see here in the design, is how a job is defined. I would assume that you have some definition of what a job control file should look like? Probably short of a full blown language, but it's worth designing this now and specifying how it should work so that you can poke holes in it and find shortcomings at design time. How is the searchspace definition structured? What is the LoadModules step and what happens at this step? What logging is performed during the build/install/test process? I would think it necessary to log test and log patch status for instance... lets stop here a second. Let's say you have a set of changes in your searchspace A,B,C,D,E,F,G. You know that A-1 passed, and G+1 failed. So you take D, and either pull the changeset, date, or apply the patch. Now lets say it doesn't even build. How do you treat that failure? Probably you want to either back off some, or move forward some because you probably pulled at a point that either had merge patches following it, or a point which broke the build. You cannot say with certainty whether this would have passed the test or failed it though, so this needs an additional state to classify it as both unknown and unusable. It may also be necessary to log some other form of remote output. In terms of kernel testing, this often takes the form of either output over a serial line, or from another ip:port which is redirecting serial output from the box. This comes with additional baggage of needing to have tests specifications that not only includes 0 or non-zero, but also expecting output over this other line. For instance, if your test crashes the machine, then there's probably something in the debug output, or at least loss of heartbeat to the machine that can tell you it failed. Perhaps this goes beyond the scope of what you are trying to accomplish. If so, then it should be explicitly called out in the design that the tool is only being designed with the idea of hunting for non-fatal errors. It is certainly more complicated to allow the searching of fatal errors, but significantly more useful as well. Thanks, Paul Larson |