Re: [Teem-users] feedback requested: uniform documentation of whether and when a function uses biff

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I don't know how well this mailing list works at this point, but I thought that I'd point out here that I've gone ahead and implemented something for this.

teem/src/biff/README.txt documents the format of comments that document: whether a function uses biff, and if it uses biff, what return values constitute an error, and, if it only sometimes uses biff (via biffMaybeAddf), which function parameter value indicates that biff is being used.  Some examples:

  int /* Biff: 1 */
  nrrdSlice(Nrrd *nout, const Nrrd *cnin, unsigned int saxi, size_t pos) {

  int /* Biff: (private) maybe:2:nrrdField_unknown */
  _nrrdReadNrrdParseField(NrrdIoState *nio, int useBiff) {

  static int /* Biff: nope # unlike other parsers, for reasons described below */
  _nrrdReadNrrdParse_number(FILE *file, Nrrd *nrrd, NrrdIoState *nio, int useBiff) {

Because all of Teem has recently been reformatted with clang-format, the code is easier to analyze with things that aren't full C parsers; I've worked up some python scripts that seem to work.  Some malicious things could have escaped analysis (like hiding a biff call inside a #define), but dozens of bugs in biff usage were found and fixed by this analysis.  teem/src/biff/README.txt describes more.

The biff annotations are being used in the new teem/python/cffi python wrappers that turn biff errors into exceptions, and only check for those errors on the functions that use biff (all 522 of them).

Gordon

> On Jul 12, 2011, at 7:12 AM, Gordon L. Kindlmann <gl...@uc...> wrote:
> 
> Hi Michael,
> 
> I'm collecting your responses below.  I may have started this  
> discussion off on the wrong foot.  Fundamentally (and despite the  
> subject line), what I'm talking about here is not just about biff, but  
> about how to interpret the function return value as an indicator of  
> error.
> 
> Relevant to this discussion, there are really two kinds of information  
> about any given Teem function:
> 
> #1: How to interpret the return value as an error indicator.  For  
> nrrdNew, returning NULL means error, nrrdLoad returning 1 means error,  
> for nrrdCheck returning 0 means error.
> 
> #2: Whether a function uses biff.  nrrdNew does not, nrrdLoad and  
> nrrdCheck do, and nrrdSameSize does iff its "useBiff" argument is non- 
> zero.
> 
> For reference, to answer #2 for his python Teem wrappings, Carlos read  
> source and comments and generated the __no_biff list at line 283 here:
> http://code.google.com/p/python-teem/source/browse/trunk/teem/capi/__init__.py <http://code.google.com/p/python-teem/source/browse/trunk/teem/capi/__init__.py>
> I'd like to facilitate generating this kind of list in a more  
> automated way, by documenting #2 in the source.
> 
> I think that *learning* #1 and #2 by automated means is hard, and I'm  
> not going to try it (others are welcome to).
> 
> My proposal is simply a way of documenting, for some subset of Teem,  
> the answer to #2, and biff is used, the answer to #1.  The proposal  
> facilitates automated parsing, so that we can generate the right error  
> checker in the python wrapper.
> 
> I view this is as equivalent to using the \return or \retval commands  
> in Doxygen, and enforcing some uniformity in how they are used.  In  
> Doxygen, nothing automatically enforces the correctness of the  
> information in the comments that Doxygen parses, but it is nonetheless  
> a usefully uniform way of documenting what we know about functions.
> 
>> Why not always call the callback but make it be happy if there are no
>> errors detected?
> 
> I'm not sure how to do this without knowing the answer to #1.
> 
>> Biff doesn't use stderr.  Or it doesn't have to rather.  It uses the
> 
>> global structure in biffbiff.c to store errors.
> 
> Yes.  But, it prints "WARNING: no information for key <key>" to stderr  
> if you call biffGet() with an unused key.  That's the stderr usage I  
> was referring to.
> 
>> You could add a
>> callback for python to see if there's anything there and return status
>> or not based upon return value && isBiffMessage() or similar.
> 
> The return value is uninformative without knowing the answer to #1.
> 
> Your point reminds me that we do have a function biffCheck(key), which  
> returns the number of error messages for a key.  We could possibly  
> resort to using biffCheck as the sole indicator of #2 (and use that to  
> avoid calling biffGet() when there are no messages to get).  I'd feel  
> slightly defeated if that's the only way to go.
> 
>> Note that Biff is problematic anyway as it is not thread safe.  I
>> wouldn't do anything too fancy regarding a replacement without keeping
>> that in mind.
> 
> Agreed.
> 
>> Just thought I'd throw in that comments are not authoritative and  
>> using
>> them as such will result in a bunch of difficult to track and rare to
>> produce bugs.
> 
> I'll take as a vote against my proposal :) Alternative suggestions  
> welcome.
> 
> Gordon
> 
>> 
>> Michael
>> 
>> On Mon, 2011-07-11 at 13:44 -0500, Gordon L. Kindlmann wrote:
>>> Thanks for your thoughts.
>>> 
>>> Even if we could assume that all function use biff (and we can't),
>>> we'd still need a way of documenting what test on the return value
>>> indicates error; this is not (and cannot, without big API changes) be
>>> uniform across all functions.  That's part of the little comment  
>>> tag I
>>> was proposing (e.g. "if (ret)" vs "if (!ret)").
>>> 
>>> As for using stderr.  I agree it would be better if biff didn't, but
>>> this is a bigger problem that should probably be revisited for Teem
>>> 2.0.  If you "ack PANIC" (**) in a teem source tree, you see lots of
>>> times where a PANIC message to stderr precedes an exit() from Teem,
>>> usually to indicate an error or gap in implementation.  Given how
>>> surprising exit() can be, this should probably be revisited too.
>>> 
>>> Your point about the fragility of hand-written annotations, and
>>> ensuring their consistency with the code, is completely valid.  I'm
>>> hoping to hear alternative proposals.  Given how heterogenous biff
>>> usage is, I see this as more about (speaking uncharitably)  
>>> documenting
>>> the mess we're in, rather than fixing the mess.
>>> 
>>> Gordon
>>> 
>>> (**) http://betterthangrep.com/
>>> 
>>> On Jul 11, 2011, at 1:28 PM, James Bigler wrote:
>>> 
>>>> I was trying to determine if we could just assume that all functions
>>>> used biff.  This would simply the implementation for any wrapper
>>>> that uses teem (at the expense of performance). However if querying
>>>> a biff error for a non biff function generates stderr output, then
>>>> this isn't a satisfactory solution.
>>>> 
>>>> My general complaint about annotating code in such a way as
>>>> described below is making sure those annotations stay consistent
>>>> with the implementation.  The minute they start to disagree then all
>>>> trust in the annotations goes out the window, and users will be
>>>> forced to read the code anyway.
>>>> 
>>>> Though this brings up a point of using stderr within a library which
>>>> might be a separate thread.  Generally, using stderr is problematic,
>>>> because not all applications have a stderr stream.  Plus as
>>>> developers may not wish to have output they are not in control of.
>>>> In the case of the biff API, it might be useful to have a function
>>>> that can check if the enum is legitimate, before querying it and
>>>> generating the WARNING.
>>>> 
>>>> James
>>>> 
>>>> On Mon, Jul 11, 2011 at 10:48 AM, Gordon L. Kindlmann <gl...@uc...
>>>>> wrote:
>>>> 
>>>> If you're asking about the overhead of using python/ctypes/Teem in a
>>>> way that assumes all functions use biff- that's a good question.  It
>>>> depends on what kind of function is set for the error checker, and I
>>>> haven't started thinking about those details yet.  My thinking is
>>>> that the value of the ctypes interface is about convenience not
>>>> performance- from what I understand the python interpreter has non-
>>>> trivial work to do with every ctypes foreign function call, so its
>>>> not fast (this is the complaint about ctypes from cython
>>>> enthusiasts).  The ctypes's errcheck would add to this overhead, but
>>>> so would whatever error handling the user should be doing anyway,
>>>> for any functions that do use biff.  If functions don't use biff but
>>>> still use error codes to indicate problems, one might try to
>>>> retrieve biff error messages, but this won't work (and will cause
>>>> "WARNING: no information for key <key>" message to go to stderr).
>>>> That is, one can't really afford to be ignorant about whether or not
>>>> a function uses biff.
>>>> 
>>>> If what you're asking is the overhead, in the process of generating
>>>> info about which functions use biff and how, independent of python/
>>>> ctypes), of first assuming that all functions use biff, then I'd say
>>>> that this isn't workable.  A big class of functions don't use biff
>>>> at all (pretty much all *New() and *Nix() functions), and there are
>>>> many functions that return error codes but have non-biff ways of
>>>> communicating the details.  I'd like a consistent way of documenting
>>>> that subset that does use biff.
>>>> 
>>>> Gordon
>>>> 
>>>> 
>>>> On Jul 11, 2011, at 11:06 AM, James Bigler wrote:
>>>> 
>>>> What would be the overhead of assuming that all functions used biff?
>>>> 
>>>> James
>>>> 
>>>> On Mon, Jul 11, 2011 at 12:32 AM, Gordon L. Kindlmann <gl...@uc...
>>>>> wrote:
>>>> Hello,
>>>> 
>>>> In a continuing effort to improve automatically-generated python
>>>> wrappers to all of Teem (see teem/python/ctypes/teem.py, generated  
>>>> by
>>>> teem/python/ctypes/teem-gen.py), I want to help automate the calling
>>>> of biffGet() when a Teem function has had an error that resulted  
>>>> in an
>>>> error message being saved with biff.  The errcheck attribute of  
>>>> ctypes
>>>> provides a very nice hook for doing just this:
>>>> 
>>>> http://docs.python.org/library/ctypes.html#foreign-functions
>>>> 
>>>> With a little work, one can write a python function that is called
>>>> after each Teem call, *if* it is known that the function might
>>>> generate a biff error.
>>>> 
>>>> Unfortunately, this is not very useful unless you know whether a
>>>> function uses biff at all, and how to check whether it used biff  
>>>> after
>>>> this most recent call, in a uniform way for all the Teem functions  
>>>> you
>>>> might call.  Right now you can clean this info from reading the
>>>> comments preceeding the function definition in a .c file (at best)  
>>>> or
>>>> reading through the function itself (at worst).  In previous work on
>>>> wrapping Teem in python, Carlos Scheidegger ended up writing by  
>>>> hand a
>>>> list of functions that did NOT conform to expected biff behavior,  
>>>> but
>>>> this not sustainable, given how easy it would be to add new  
>>>> functions
>>>> to Teem but fail to update this special list in a completely  
>>>> unrelated
>>>> python file.
>>>> 
>>>> I'd like to start using a formatting convention for a comment in the
>>>> source code that documents how a function uses biff.  The short
>>>> comment always starts with /*biff (which should be invisible to the
>>>> comment parsing of doxygen) and then documents when there has been  
>>>> an
>>>> error. Some examples:
>>>> 
>>>> /* returns non-zero if there was a problem, else 0 */
>>>> int /*biff if (ret) */
>>>> nrrdCheck(const Nrrd *nrrd) {
>>>> static const char me[]="nrrdCheck";
>>>> 
>>>> if (_nrrdCheck(nrrd, AIR_TRUE, AIR_TRUE)) {
>>>>  biffAddf(NRRD, "%s: trouble", me);
>>>>  return 1;
>>>> }
>>>> return 0;
>>>> }
>>>> 
>>>> /* returns 0 if there was a problem, else non-zero */
>>>> int /*biff if (!ret) */
>>>> nrrdSanity(void) {
>>>> static const char me[]="nrrdSanity";
>>>> ...
>>>> 
>>>> if (_nrrdSanity) {
>>>>  /* no error, is well */
>>>>  return 1;
>>>> }
>>>> if (some check) {
>>>>   biffAddf(NRRD, "%s: error", me);
>>>>  return 0;
>>>> }
>>>> ...
>>>> 
>>>> /* only uses biff if arg[2] is non-zero; returns 0 if there was a
>>>> problem, else non-zero */
>>>> int /*biff?2 if (!ret) */
>>>> nrrdSameSize(const Nrrd *n1, const Nrrd *n2, int useBiff) {
>>>> static const char me[]="nrrdSameSize";
>>>> unsigned int ai;
>>>> 
>>>> if (!(n1 && n2)) {
>>>>  biffMaybeAddf(useBiff, NRRD, "%s: got NULL pointer", me);
>>>>  return 0;
>>>> }
>>>> ...
>>>> 
>>>> 
>>>> My plan would be to start going to Teem functions and annotating
>>>> functions according to this, and write a python script that looks  
>>>> for
>>>> comments of this form and annotates the function name (appearing in
>>>> the next line) accordingly.
>>>> 
>>>> Some kind of uniform documentation had to be done, sooner or later,
>>>> since biff isn't going anywhere.  Also, the work of generating this
>>>> information by automated source code analysis seems daunting  
>>>> (prove me
>>>> wrong!).  So I want to get feedback on this approach before going
>>>> ahead.  To me this scheme is simple, appropriately located (right in
>>>> the middle of the function declaration itself), and actually not
>>>> specific to python (this information could also be used in C++ Teem
>>>> wrappers).
>>>> 
>>>> Thanks,
>>>> Gordon
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------------
>>>> All of the data generated in your IT infrastructure is seriously
>>>> valuable.
>>>> Why? It contains a definitive record of application performance,
>>>> security
>>>> threats, fraudulent activity, and more. Splunk takes this data and
>>>> makes
>>>> sense of it. IT sense. And common sense.
>>>> http://p.sf.net/sfu/splunk-d2d-c2
>>>> _______________________________________________
>>>> teem-users mailing list
>>>> tee...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/teem-users
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> All of the data generated in your IT infrastructure is seriously  
>>> valuable.
>>> Why? It contains a definitive record of application performance,  
>>> security
>>> threats, fraudulent activity, and more. Splunk takes this data and  
>>> makes
>>> sense of it. IT sense. And common sense.
>>> http://p.sf.net/sfu/splunk-d2d-c2
>>> _______________________________________________
>>> teem-users mailing list
>>> tee...@li...
>>> https://lists.sourceforge.net/lists/listinfo/teem-users
>> 
>> 
>> 
> 
> 
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security 
> threats, fraudulent activity, and more. Splunk takes this data and makes 
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2 <http://p.sf.net/sfu/splunk-d2d-c2>
> _______________________________________________
> teem-users mailing list
> tee...@li... <mailto:tee...@li...>
> https://lists.sourceforge.net/lists/listinfo/teem-users <https://lists.sourceforge.net/lists/listinfo/teem-users>

Re: [Teem-users] feedback requested: uniform documentation of whether and when a function uses biff

Tools to process and visualize scientific data and images

Re: [Teem-users] feedback requested: uniform documentation of whether and when a function uses biff