Hi Jean,
First at all, I want to thank you and all other Shinken developers. I'm glad to see how Shinken grows.
I think a own User Interfcae can be a big plus for Shinken. A new interface could unify the advantages of Shinken and a modern interface with features that a operator/admin/manager realy needs. I agree with you, that the points root problems, impacts and criticity are very importent and the focus of the new UI must be on simpleness (like the KISS principle).
I think another important point is that the new UI should be expandable. It should be easy to customise or add new features. I try to explain it. A few years ago, I used the CMS TYPO3 and I was amazed how expandable this CMS was. The TYPO3 Core has a clear focus on the root problems but it was and still is quite easy to add new features. The result was that a lot of people wrote extentions and shared them with the community. Such a dynamic community would be great for Shinken.
My personal "perfect" UI has an simple UI core, is focused on the business and is modularly expandable.
I think with a elaborate concept of the UI project could be fine.

Andreas

Am 06.05.2011 16:05, schrieb nap:
Hi,

I would like us to take some time to make a little break like we did the last year after the 0.2 version, and look at the project vision and why not, change it.

**Beware** : one time again, I wrote a lot, sorry :)

What we did great the last year

Let look at what we did this last year, since the 0.1 version. We focus on put in place the core, with huge distributed feature. I think we fulfill our goal for this part. We got a full scale architecture, we can manage all classic network or organizationals problems (DMZ, distant lans, or customers). And I"m quite proud we can say :
* distributed architecture : Done.

We add new modules (lot of retention ones, livestatus, ndo, merlin (not finished :) ), pnp, etc). So, Ninja put aside, we manage in a good way the main UIs of the Nagios world. It's still a point in progress, but it should not ask a lot of work, mainly bug fixes and small improvements. So we can say :
* export/presentation modules : really good.

One other thing that we add is a configuration enhancement and simplification (service generators or easy dependencies definitions for example). It's cool for people that wrote their conf with vi, they wrote their conf in an efficient way now.
* configuration enhancement and simplification : Done.

We also add new quality method, especially the test driven one, and so we are sure we just delete bugs, and nearly never add new ones n previous features. It's a very comfortable way for hacking code. Without it, w should not have as much feature as we got, and maybe no production installation at all :)

One other thing I'm glad we add is a new way of look at the monitoring. I'm talking about root problem/impacts + criticity. It's something very easy to use, because it just need one parameter, from 0 to 5 for the criticity, but the implications are just greats :
* far less easy to configure notification filter (only prod, not less)
* business rules that respect the root cause analysis feature, and easy to setup.
* export theses informations in LiveStatus (that became the default API) so UIs can use it to show only Business impacting problems.
So we can say :
* in core "focus on business feature & correlation" : Done.

So we can say that we reach a very good product, far better than I first thought one year and half ago. Big thanks an congrats everyone :)


What did we failed too
But all was not as good as all theses points :
* My English skill is still very low :)
* Our wiki is very sparse in tutorials. Yes we got the "official doc" from Nagios with the new features, but it's a nightmare to read and start with such a documentation.
* The UIs did not follow us a lot. Yes they solve some bugs, but I think the main addition in the monitoring from Shinken is not it's architecture, even if it's a great one, but the root problem+ criticity one, really. And this was not used by UIs, Thruk aside with shinken specific views.

I think there are our major problem right now for a shinken domination of the world.... too much? ok, for a large shinken acceptance from users, that show it as a "new Nagios" than a very enhanced one that will help us in their day to day job.

For my English skills, I start English 16 years ago, so I think it will just won't be possible. I'll try to read again the whole Harry potter books and watch films in English, it can help :)

For the wiki, I think it's mainly my fault. It's very very hard to **start** a documentation, but far more easy to enhanced it. I didn't wrote in it for some weeks, and hopefully some people remember me that features are useless without documentation. And I think, it's more than it. It's not documentation we need, but tutorials about each feature. That what I try to create in our new wiki main page, with a lof of tutorials. It's the same thing with our web site, it's more "easy" to look at what shinken offer to solve users problems.

I hope the wiki problem won't be one when the firsts 20 tutorials will be write, and every one will help for enhanced them and wrote new one.
I'll also open a forum, so users will have a easy way to ask for help, far less frightening than posting in a "devel" list :p (I don't think a user mailing list is useful, it's the same purpose, we can start with a forum, and wait some times to look at teh result).

For the third point, it's far more problematic. Today's admins are not the same than 10 years before. Nowadays, we can talk about "speed admins", because they do not have anymore the time to be expert in one thing, but must be medium in a lof of things (I'm personally a linux/windows/SAN/vmware/network/monitoring admin, and it's quite a short list). It will be even harder in the future, with the "devops" arrival.

Nearly all of people of this mailing list know the difference between a core and an UI. But a LOT of admins don't. It's not they are dumb, it's just they do not have the time to look at such "detail".

And it's is a major problem for our (lovely) project. We got no visibility. Of course our web site is cool :) but the main page that is look at is .. screenshots!

So we face a double problem :
* we lack visibility for a lot of users, because we do not have an UI. Simple problem, but terrible impacts for us.
* the other UIs do no follow us really. We use standard API and add new features easy to access in it (especially LiveStatus), but it was not a success. Thruk was the most "following" UI, and I would like to thanks Sven for his support, really, (especially because my perl code was a nightmare, and he was kind enough to correct it). But even with this inclusion, it's stil very hard to look at a Thruk with a Nagios/Icinga backend, and a Shinken one. Yes we got two new views, but it's not enough to help the user focus on what we think is important for today and especially tomorrow monitoring : focus on business first.

So? What we do?

The documentation and user helping problem will got a solution very soon, but we must look at the UI one. We say last year in our project vision that we are not here to make an UI, and if we can "enhance/influence" current ones, it will be good enough.

I think we (mainly I) were wrong for 50%. **Not** making an UI allow us to focus on core enhancement, stabilization and production ready product. And now we got this, it's time to look at how we can help the users to get the more prower from Shinken core in the most efficient way. I think add plugins to current UIs is not enough. We can't make the users focus on business first if we got the same view than Nagios 10 years ago. It's just not possible. We can't afford having hosts and services manage in a different ways anymore, both are "end user resource" after all, nothing more.

That's why I say that the root problem/criticity was so important the last year, it will give a new way of "working" for admins for day to day work. It should be simple to show links betweens elements, it should be immediate to look at business impacts, it should be immediate to look at root problems of this impacts, we do not need to see IT elements every where if they are not "important" (business supporting IT), it's far enough to look at them on or twice a day by default in such an UI.

I think it's just not possible to got such a new way with current UIs, because it will need shinken hooks every where, and no one will want this, especially because some old school users won't want this change, got their habits and long hairs and will never use such a monitoring UI. And it's good, they already got such an UI. They even got plenty of them, nearly all UIs (nagvis and business process put aside) propose the same way of thinking.

I think now with a stable core (the main need is for some retention parameters and an enhanced merlin module, not something that will five us work for one more year, more like one week :) ), it time for us to think about such an UI.

I won't fade it, it's important to get "our own" to promote the project of course, but I'm --> **strongly** <-- against doing an UI like all others, put our logo and say "cool, we got our own ui, great isn't it?". No. Doing so is not great. If we do one, it should add a new dimension, a new way of seeing users problems, like we did in the core for distributed. The main idea was not to ask "how we can make the current things scale in a good way", but "how it should be done in a perfect wold. Ok. Now, it is possible to do it with current code? Ok, let do a new one->Shinken".


An UI? For who? Which UI?
I think the main thing to ask is if the currents admins and tomorrow ones got their "perfect" UI? There are strong difference between monitoring users. We can split in 3 main parts I thinks :
* operators : they are dedicated to monitoring, they should look at ALL errors and solved them. Simple. Currents UIs are good for them (maybe a criticity sorting can help them, but plugins and patches are good for them)
* admins : they are more and more asked to focus on business, because they have less and less time to give for their monitoring solution. They should look in continuous way at IT elements that impacts productions, qualif and dev ones should be looked one or twice a day, not more.
* admins boss (N+1 for example) : they want to look at business impacts, and see "easily" what is impacting it (so they can rushed to the good admin and "help" him to solve it :) ).

So in all cases, the root problem/impacts + criticty is very very important. It's even the difference between look at a console full of red elements (like 500+) or an UI that show that we lost the distant ERP, and one click after that, that it's due to the distant firewall that cannot write logs because its hard drive is full. 10 minutes in the first case to find "what solved", 30 sec for the N°2.
As such a console user, I begin to look at how get more productive with my monitoring console. And from now it's not possible, I just lost a lof of time during large impacts.


I think first we should focus on what we add for pure monitoring that help in the instant the admins. I vote for an UI:
* very simple : who care about having 20 different views? I think a very small set of very useful and thinked ones are far better than a plenty of medium ones.
* strongly focus on business : it should be clear that IT is just here for support end user app. If the admin want a classic UI, it take one of the others, they will always be available. So the main view should be critical (as criticity, not the service status) user app impacted. Then it should be very easy to show the root problem of theses impacts. This view will be useful for our two user populations (admins with a LOT of elements, that should focus on business app first. I think in the future, most admins will be in this case, and admins bosses, that focus on prod business only. He (she) doesn't care about other "environments").

We can add another "classic" view that show host/services in problems for pure IT elements. And only ONE view for theses 2 elements. It's another thing important : host and service are here for end user app. They are resources only, do not need do separate them.

It should be easy to "tag" end user apps (so the criticity).

It should be easy also to "select" realms. So if a guy got access to some realms, it should be easy for him to select them (active/disable).

It should be easy to see realms status, and in fact daemons status.

Of course, there will be question about the configuration part, we can put this for a V2 after we solved all of theses points. A lot of huge IT use on the hand configuration tool (from CMDB, etc), and so such a tool won't help them. So the "efficient visualization" (focus on critical root problem) should be add first.

The main spirit should be "small is beautiful". There other UI with a lof of features, users can still use them if they want :)

I think for operators that must solve everything, the classic view is enough, old school admin will use it too, new hype admins will use the efficient one, like their bosses.

We should focus on what shinken add for monitoring, and I think the distributed and root problem/criticity are the key points. There are also business rules that can be quite easily added (but not in a specific view, more like a hover layout that show the tree if the user want it, no more :) ).

With this, we avoid the dangerous risk of "shinken UI do all you want". No. From now it help you to focus on business, nothing more. Then we can look at user reactions, and gather lot of development power before going too far (we should NOT forgot we got a core to maintain and develop! :) ).


So? Are you ok?
So? Is such an UI ok for you? Is this new project vision good? If it's ok, we will see how we can do for this ui conception (I've got some mockups that wait to be shown, and really are different than current (monitoring) UIs :) ) and start this new adventure :D



Jean
------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________ Shinken-devel mailing list Shinken-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/shinken-devel