Re: [TCLCORE] TIP #329 [try] updated

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Since I'm actually far shorter on time than this mail wold suggest, this 
is as much to document my thoughts & findings for later reference as it 
is to explain my reasoning.
> Further feedback on error matching:
>
> It looks like glob matching is not going to cut it.  List prefix 
> matching will be a similarly-powerful and generally safer approach.
>   
Okay, we've done a quick analysis on our source repository (C, C++, Java 
and Tcl, more than 1m lines over hundreds of applications & utils 
developed over 10+ years).  In most cases we avoid branching based on 
errors/exceptions (a number of Best Practice authors advise against 
doing so), so we can categorise our exception handling into "log and 
ignore", "log and rethrow", "log and abort", "recover/retry" (just try 
it again, it may work) and "intelligent recover/retry" (attempt to 
overcome the specific problem then try it again).  Exceptions that 
demand user interaction are included in the "log and X" categories.  
Very few exceptions fall outside these categories.
Since this was a _quick_ analysis I can only talk in impressions, but 
our impression is that "log and X" is far and away the most common use 
case, and the vast majority (way about 80%) of these cases are "catch 
all, log and X". 
The next category down is  intelligent retry (we have applications with 
some really domain specific retry logic), which needs to catch error 
classes (IO errors) and specific errors (ApiException with cause 1125).  
In Java and C++ we catch on classes near the top of the hierarchy and 
then switch or if/then for more specific errors.  In most cases the IO 
errors are coming from a subsystem and we catch all SystemIoExceptions 
rather than (say) java.lang.IOException.

What we learned from this is that if we represent an error as a unique 
prefix word followed by a unique error name or code, then an exact 
prefix match is going to be good enough for us 80% of the time.  If we 
represent an error as a list of increasingly specific elements (API 
SUBSYSTEM ERRNAME ...) then an exact prefix match is going to be good 
enough upwards of 90% of the time _assuming we separate code into 
subsystems that have high cohesion and low coupling_ (which is generally 
a good idea), and capable of greater specificity in error handling than 
Java or C++ (but not of greater generality). 

We identified only one placed in our entire code base that cannot be 
adequately handled by a prefix match against an errorcode list.  A base 
exception class has two integer fields indicating the cause of the 
error; each function in the API has its own associated exception class 
that inherits from the base class.  Yes, it sounds very weird (it is 
very weird), but it allows very high level code to determine which 
_function_ failed, which is the essential bit of information needed to 
determine how to recover.  Without getting into more details about the 
hierarchy, let me assure you that there is no list representation that 
can be matched with a prefix that covers all types of catch we need to 
do (i.e. catch on one of the error fields or on the subclass type). 
If we constructed the errorcode as "XAPI code1 code2 FUNCNAME" then a 
string glob match _could_ work (e.g. "XAPI * FUNCNAME").  But that 
solution isn't good enough -- there is a special case subclass of one of 
the function exceptions, and it was added after the first drop of the 
product.  If we made the errorcode "XAPI code1 code2 FUNCNAME sub1" it 
would break existing trap patterns (that don't have a trailing *).  
Using "XAPI code1 code2 sub1 FUNCNAME" may or may not be backwards 
compatible (e.g. code could be trapping code1 == 5 and then logging 
FUNCNAME ... but did it use lindex end or lindex 3?).

So the general rules one must follow with -errorcode to avoid shooting 
yourself in the foot are:
(1) When trapping errors exact matching against the full errorcode is 
always a bad idea.  It prevents any future extension of the errorcode to 
distinguish between different errors that currently share the same 
errorcode (or new functionality that must conform to an existing error 
model and thus share an existing code).  A match (prefix, glob, etc.) is 
pretty much required if you want maintainable code.
(2) If you are building errorcodes with [list] and matching them with 
glob it becomes impossible to distinguish between error subclasses and 
adjacent errorcodes that share a common prefix.  e.g. "ABC 4" vs "ABC 
42", or "WIN32 INVALID_DATA" vs "WIN32 INVALID_DATATYPE".  To use glob 
you must build errorcodes as a string and add a trailing space or other 
appropriate delimiter, so that you can match "ABC 4 *" instead of "ABC 4*".
(3) If your errorcode information is represented as a list then you 
should assume that the user trapping the error is parsing the list to 
extract useful information, and you should further assume that such 
parsing involves positional arguments (e.g. lindex $errorcode 2).  It is 
therefore only safe to extend errorcodes at one end - you cannot safely 
add more fields in the middle of the errorcode.
(4) The only thing that a glob match can do - that a prefix/suffix match 
cannot - is match stuff in the middle of an errorcode.  Since you can't 
safely extended errorcodes in the middle this is of limited use unless 
you have two different dimensions on which to trap.  Most other 
languages don't support this sort of thing directly in their try/catch 
syntax.

So I'm calling it at this:  using glob is going to lead to mistakes and 
design inflexibility that are hard to overcome unless you notice them 
early, and there is little practical benefit associated with this cost.

A list prefix match (for each element in the pattern there must exist a 
corresponding element with the identical value in the list under 
consideration) is Good Enough, and far safer.

Anything else can be handled with an extension when we know more about 
the problem.

I'll update the TIP accordingly.

Regards,
Twylite

Re: [TCLCORE] TIP #329 [try] updated

The Tool Command Language implementation

Re: [TCLCORE] TIP #329 [try] updated