From: David H. <da...@op...> - 2007-11-24 20:33:59
|
On Nov 14, 2007, at 10:44 PM, Mike Huot wrote: > Any thoughts? Well, since you ask ;-) I've been struggling with this dilemma that we introduced (OMG!) almost 2 years ago now. Where does the time go? I believe that on the surface of the problem, it starts with the poor use of the noun "category". (BTW: I didn't jump in on this discussion earlier because I just didn't have the bandwidth to help and wouldn't have been anything more help than than just a pest ;-) Below the surface, the problem is the model itself. I think we all know know that the current use of "Node Categories", now known as "Surveillance Categories", is to quickly provide a way that we can present the "current status" of a group of nodes as opposed to the "availability" of a group of services over "X amount of time". The node categories were introduced during the development of the Model Importer and had nothing do do with Surveillance Views (this concept had not yet been conceived). The original design of the model importer began with a use case and an XSD the supported the importing of nodes. The <category> element was defined in the XSD to automatically populate the traditional OpenNMS SLA category in categories.xml. During development, this quickly became infeasible and obviously not very well thought out, so we moved to a fall back position, due to time constraints, and introduced node categories to be persisted to the DB. This eased synchronization, aided performance, and shortened development time of the Importer. The "catinc" grammar was then added for use with filter rules. So, the term category was: a) A poor choice of naming. At the very least, another name should have been chosen: class, group, set, etc. Anything but category. b) Weakly implemented... it should be entity based categorization and not just limited to nodes (node entities). This model is as obviously confusing as it is, not so obviously, broke. It is so broke, in fact, that determining the status of a node in the surveillance view doesn't ask the Poller, where state is actually maintained, but it asks the node, which has no state for its own status, so it has to be re-calculated, deterministically, each time it is asked. It does this by plowing through current outages for each interface, every time it is asked. It can be asked possibly hundreds of times in a complex surveillance view... Yikes! I propose that rather than throwing the baby out with the bath water (getting rid of one category vs. the other), we really need to decide what to call these categories with respect to these two different functionalities and perhaps finding a way to integrate the concepts, in the underlying model, into something more a lot more unified: Currently we have: Noun Source Use 1 Category categories.xml SLA reporting 2 Category categories table Node (entity) status reporting We've now changed the semantics to present #2 as "Surveillance View Categories" and I believe that does direct one more towards the notion of what node categories are and how they are currently being used (btw, thanks Bill!), but we're still left with a model that needs fixing. What can we do before 1.8.0-1 to make all this more comprehensible or easier to use? Here are a couple of my thoughts: 1) Automate node categorizing. 2) Provide tool to create node categories from SLA catgories.xml. One would define categories in categories.xml and generate the same categories in the categories table and assign nodes based on the categories.xml defined rule. Perhaps we can be more aggressive than that prior to the 1.8 release but I don't think we have time. Thoughts? back at ya mhuot ;) David Hustace The OpenNMS Group, Inc. |