From: Twylite <tw...@cr...> - 2008-12-02 13:56:43
|
Since I'm actually far shorter on time than this mail wold suggest, this is as much to document my thoughts & findings for later reference as it is to explain my reasoning. > Further feedback on error matching: > > It looks like glob matching is not going to cut it. List prefix > matching will be a similarly-powerful and generally safer approach. > Okay, we've done a quick analysis on our source repository (C, C++, Java and Tcl, more than 1m lines over hundreds of applications & utils developed over 10+ years). In most cases we avoid branching based on errors/exceptions (a number of Best Practice authors advise against doing so), so we can categorise our exception handling into "log and ignore", "log and rethrow", "log and abort", "recover/retry" (just try it again, it may work) and "intelligent recover/retry" (attempt to overcome the specific problem then try it again). Exceptions that demand user interaction are included in the "log and X" categories. Very few exceptions fall outside these categories. Since this was a _quick_ analysis I can only talk in impressions, but our impression is that "log and X" is far and away the most common use case, and the vast majority (way about 80%) of these cases are "catch all, log and X". The next category down is intelligent retry (we have applications with some really domain specific retry logic), which needs to catch error classes (IO errors) and specific errors (ApiException with cause 1125). In Java and C++ we catch on classes near the top of the hierarchy and then switch or if/then for more specific errors. In most cases the IO errors are coming from a subsystem and we catch all SystemIoExceptions rather than (say) java.lang.IOException. What we learned from this is that if we represent an error as a unique prefix word followed by a unique error name or code, then an exact prefix match is going to be good enough for us 80% of the time. If we represent an error as a list of increasingly specific elements (API SUBSYSTEM ERRNAME ...) then an exact prefix match is going to be good enough upwards of 90% of the time _assuming we separate code into subsystems that have high cohesion and low coupling_ (which is generally a good idea), and capable of greater specificity in error handling than Java or C++ (but not of greater generality). We identified only one placed in our entire code base that cannot be adequately handled by a prefix match against an errorcode list. A base exception class has two integer fields indicating the cause of the error; each function in the API has its own associated exception class that inherits from the base class. Yes, it sounds very weird (it is very weird), but it allows very high level code to determine which _function_ failed, which is the essential bit of information needed to determine how to recover. Without getting into more details about the hierarchy, let me assure you that there is no list representation that can be matched with a prefix that covers all types of catch we need to do (i.e. catch on one of the error fields or on the subclass type). If we constructed the errorcode as "XAPI code1 code2 FUNCNAME" then a string glob match _could_ work (e.g. "XAPI * FUNCNAME"). But that solution isn't good enough -- there is a special case subclass of one of the function exceptions, and it was added after the first drop of the product. If we made the errorcode "XAPI code1 code2 FUNCNAME sub1" it would break existing trap patterns (that don't have a trailing *). Using "XAPI code1 code2 sub1 FUNCNAME" may or may not be backwards compatible (e.g. code could be trapping code1 == 5 and then logging FUNCNAME ... but did it use lindex end or lindex 3?). So the general rules one must follow with -errorcode to avoid shooting yourself in the foot are: (1) When trapping errors exact matching against the full errorcode is always a bad idea. It prevents any future extension of the errorcode to distinguish between different errors that currently share the same errorcode (or new functionality that must conform to an existing error model and thus share an existing code). A match (prefix, glob, etc.) is pretty much required if you want maintainable code. (2) If you are building errorcodes with [list] and matching them with glob it becomes impossible to distinguish between error subclasses and adjacent errorcodes that share a common prefix. e.g. "ABC 4" vs "ABC 42", or "WIN32 INVALID_DATA" vs "WIN32 INVALID_DATATYPE". To use glob you must build errorcodes as a string and add a trailing space or other appropriate delimiter, so that you can match "ABC 4 *" instead of "ABC 4*". (3) If your errorcode information is represented as a list then you should assume that the user trapping the error is parsing the list to extract useful information, and you should further assume that such parsing involves positional arguments (e.g. lindex $errorcode 2). It is therefore only safe to extend errorcodes at one end - you cannot safely add more fields in the middle of the errorcode. (4) The only thing that a glob match can do - that a prefix/suffix match cannot - is match stuff in the middle of an errorcode. Since you can't safely extended errorcodes in the middle this is of limited use unless you have two different dimensions on which to trap. Most other languages don't support this sort of thing directly in their try/catch syntax. So I'm calling it at this: using glob is going to lead to mistakes and design inflexibility that are hard to overcome unless you notice them early, and there is little practical benefit associated with this cost. A list prefix match (for each element in the pattern there must exist a corresponding element with the identical value in the list under consideration) is Good Enough, and far safer. Anything else can be handled with an extension when we know more about the problem. I'll update the TIP accordingly. Regards, Twylite |