I think first we should focus on what we add for pure monitoring
that help in the instant the admins. I vote for an UI:
I would like us to take some time to make a little break like we
did the last year after the 0.2 version, and look at the project
vision and why not, change it.
**Beware** : one time again, I wrote a lot, sorry :)
What we did great the last year
Let look at what we did this last year, since the 0.1 version.
We focus on put in place the core, with huge distributed
feature. I think we fulfill our goal for this part. We got a
full scale architecture, we can manage all classic network or
organizationals problems (DMZ, distant lans, or customers). And
I"m quite proud we can say :
* distributed architecture : Done.
We add new modules (lot of retention ones, livestatus, ndo,
merlin (not finished :) ), pnp, etc). So, Ninja put aside, we
manage in a good way the main UIs of the Nagios world. It's
still a point in progress, but it should not ask a lot of work,
mainly bug fixes and small improvements. So we can say :
* export/presentation modules : really good.
One other thing that we add is a configuration enhancement and
simplification (service generators or easy dependencies
definitions for example). It's cool for people that wrote their
conf with vi, they wrote their conf in an efficient way now.
* configuration enhancement and simplification : Done.
We also add new quality method, especially the test driven one,
and so we are sure we just delete bugs, and nearly never add new
ones n previous features. It's a very comfortable way for
hacking code. Without it, w should not have as much feature as
we got, and maybe no production installation at all :)
One other thing I'm glad we add is a new way of look at the
monitoring. I'm talking about root problem/impacts + criticity.
It's something very easy to use, because it just need one
parameter, from 0 to 5 for the criticity, but the implications
are just greats :
* far less easy to configure notification filter (only prod, not
* business rules that respect the root cause analysis feature,
and easy to setup.
* export theses informations in LiveStatus (that became the
default API) so UIs can use it to show only Business impacting
So we can say :
* in core "focus on business feature & correlation" : Done.
So we can say that we reach a very good product, far better than
I first thought one year and half ago. Big thanks an
congrats everyone :)
What did we failed too
But all was not as good as all theses points :
* My English skill is still very low :)
* Our wiki is very sparse in tutorials. Yes we got the "official
doc" from Nagios with the new features, but it's a nightmare to
read and start with such a documentation.
* The UIs did not follow us a lot. Yes they solve some bugs, but
I think the main addition in the monitoring from Shinken is not
it's architecture, even if it's a great one, but the root
problem+ criticity one, really. And this was not used by UIs,
Thruk aside with shinken specific views.
I think there are our major problem right now for a shinken
domination of the world.... too much? ok, for a large shinken
acceptance from users, that show it as a "new Nagios" than a
very enhanced one that will help us in their day to day job.
For my English skills, I start English 16 years ago, so I think
it will just won't be possible. I'll try to read again the whole
Harry potter books and watch films in English, it can help :)
For the wiki, I think it's mainly my fault. It's very very hard
to **start** a documentation, but far more easy to enhanced it.
I didn't wrote in it for some weeks, and hopefully some people
remember me that features are useless without documentation. And
I think, it's more than it. It's not documentation we need, but
tutorials about each feature. That what I try to create in our
new wiki main page, with a lof of tutorials. It's the same thing
with our web site, it's more "easy" to look at what shinken
offer to solve users problems.
I hope the wiki problem won't be one when the firsts 20
tutorials will be write, and every one will help for enhanced
them and wrote new one.
I'll also open a forum, so users will have a easy way to ask for
help, far less frightening than posting in a "devel" list :p (I
don't think a user mailing list is useful, it's the same
purpose, we can start with a forum, and wait some times to look
at teh result).
For the third point, it's far more problematic. Today's admins
are not the same than 10 years before. Nowadays, we can talk
about "speed admins", because they do not have anymore the time
to be expert in one thing, but must be medium in a lof of things
(I'm personally a linux/windows/SAN/vmware/network/monitoring
admin, and it's quite a short list). It will be even harder in
the future, with the "devops" arrival.
Nearly all of people of this mailing list know the difference
between a core and an UI. But a LOT of admins don't. It's not
they are dumb, it's just they do not have the time to look at
And it's is a major problem for our (lovely) project. We got no
visibility. Of course our web site is cool :) but the main page
that is look at is .. screenshots!
So we face a double problem :
* we lack visibility for a lot of users, because we do not have
an UI. Simple problem, but terrible impacts for us.
* the other UIs do no follow us really. We use standard API and
add new features easy to access in it (especially LiveStatus),
but it was not a success. Thruk was the most "following" UI, and
I would like to thanks Sven for his support, really, (especially
because my perl code was a nightmare, and he was kind enough to
correct it). But even with this inclusion, it's stil very hard
to look at a Thruk with a Nagios/Icinga backend, and a Shinken
one. Yes we got two new views, but it's not enough to help the
user focus on what we think is important for today and
especially tomorrow monitoring : focus on business first.
So? What we do?
The documentation and user helping problem will got a solution
very soon, but we must look at the UI one. We say last year in
our project vision that we are not here to make an UI, and if we
can "enhance/influence" current ones, it will be good enough.
I think we (mainly I) were wrong for 50%. **Not** making an UI
allow us to focus on core enhancement, stabilization and
production ready product. And now we got this, it's time to look
at how we can help the users to get the more prower from Shinken
core in the most efficient way. I think add plugins to current
UIs is not enough. We can't make the users focus on business
first if we got the same view than Nagios 10 years ago. It's
just not possible. We can't afford having hosts and services
manage in a different ways anymore, both are "end user resource"
after all, nothing more.
That's why I say that the root problem/criticity was so
important the last year, it will give a new way of "working" for
admins for day to day work. It should be simple to show links
betweens elements, it should be immediate to look at business
impacts, it should be immediate to look at root problems of this
impacts, we do not need to see IT elements every where if they
are not "important" (business supporting IT), it's far enough to
look at them on or twice a day by default in such an UI.
I think it's just not possible to got such a new way with
current UIs, because it will need shinken hooks every where, and
no one will want this, especially because some old school users
won't want this change, got their habits and long hairs and will
never use such a monitoring UI. And it's good, they already got
such an UI. They even got plenty of them, nearly all UIs (nagvis
and business process put aside) propose the same way of
I think now with a stable core (the main need is for some
retention parameters and an enhanced merlin module, not
something that will five us work for one more year, more like
one week :) ), it time for us to think about such an UI.
I won't fade it, it's important to get "our own" to promote the
project of course, but I'm --> **strongly** <-- against
doing an UI like all others, put our logo and say "cool, we got
our own ui, great isn't it?". No. Doing so is not great. If we
do one, it should add a new dimension, a new way of seeing users
problems, like we did in the core for distributed. The main idea
was not to ask "how we can make the current things scale in a
good way", but "how it should be done in a perfect wold. Ok.
Now, it is possible to do it with current code? Ok, let do a new
An UI? For who? Which UI?
I think the main thing to ask is if the currents admins and
tomorrow ones got their "perfect" UI? There are strong
difference between monitoring users. We can split in 3 main
parts I thinks :
* operators : they are dedicated to monitoring, they should look
at ALL errors and solved them. Simple. Currents UIs are good for
them (maybe a criticity sorting can help them, but plugins and
patches are good for them)
* admins : they are more and more asked to focus on business,
because they have less and less time to give for their
monitoring solution. They should look in continuous way at IT
elements that impacts productions, qualif and dev ones should be
looked one or twice a day, not more.
* admins boss (N+1 for example) : they want to look at business
impacts, and see "easily" what is impacting it (so they can
rushed to the good admin and "help" him to solve it :) ).
So in all cases, the root problem/impacts + criticty is very
very important. It's even the difference between look at a
console full of red elements (like 500+) or an UI that show that
we lost the distant ERP, and one click after that, that it's due
to the distant firewall that cannot write logs because its hard
drive is full. 10 minutes in the first case to find "what
solved", 30 sec for the N°2.
As such a console user, I begin to look at how get more
productive with my monitoring console. And from now it's not
possible, I just lost a lof of time during large impacts.
* very simple : who care about having 20 different views? I think
a very small set of very useful and thinked ones are far better
than a plenty of medium ones.
* strongly focus on business : it should be clear that IT is just
here for support end user app. If the admin want a classic UI, it
take one of the others, they will always be available. So the main
view should be critical (as criticity, not the service status)
user app impacted. Then it should be very easy to show the root
problem of theses impacts. This view will be useful for our two
user populations (admins with a LOT of elements, that should focus
on business app first. I think in the future, most admins will be
in this case, and admins bosses, that focus on prod business only.
He (she) doesn't care about other "environments").
We can add another "classic" view that show host/services in
problems for pure IT elements. And only ONE view for theses 2
elements. It's another thing important : host and service are here
for end user app. They are resources only, do not need do separate
It should be easy to "tag" end user apps (so the criticity).
It should be easy also to "select" realms. So if a guy got access
to some realms, it should be easy for him to select them
It should be easy to see realms status, and in fact daemons
Of course, there will be question about the configuration part, we
can put this for a V2 after we solved all of theses points. A lot
of huge IT use on the hand configuration tool (from CMDB, etc),
and so such a tool won't help them. So the "efficient
visualization" (focus on critical root problem) should be add
The main spirit should be "small is beautiful". There other UI
with a lof of features, users can still use them if they want :)
I think for operators that must solve everything, the classic view
is enough, old school admin will use it too, new hype admins will
use the efficient one, like their bosses.
We should focus on what shinken add for monitoring, and I think
the distributed and root problem/criticity are the key points.
There are also business rules that can be quite easily added (but
not in a specific view, more like a hover layout that show the
tree if the user want it, no more :) ).
With this, we avoid the dangerous risk of "shinken UI do all you
want". No. From now it help you to focus on business, nothing
more. Then we can look at user reactions, and gather lot of
development power before going too far (we should NOT forgot we
got a core to maintain and develop! :) ).
So? Are you ok?
So? Is such an UI ok for you? Is this new project vision good? If
it's ok, we will see how we can do for this ui conception (I've
got some mockups that wait to be shown, and really are different
than current (monitoring) UIs :) ) and start this new adventure :D
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
Shinken-devel mailing list