Hi munin-users and smartmontools-users,
I crosspost this because we are thinking about a good way to=20
present smartctl_exit_status within munins=20
smartmontools-plugin.
On Wednesday 17 August 2005 15:55 +0200,=20
Lupe Christoph wrote:
> Quoting Gabriele Pohl <gabriele@...>:
> > On Wednesday 17 August 2005 14:15, Lupe Christoph wrote:
> Seems the Fedora package does not work too well.=20
This was not the only issue.. :-((
I told the story about the long and winding road
to get munin running, to Nico already,=20
but if you are intersted I can send you a copy=20
of that mail..
> Where did you get it?
=46rom the Munin-Project within Sourceforge:
http://sourceforge.net/project/showfiles.php?group_id=3D98117
> Is it part of FC4,=20
no.
> Is anything else in /etc/cron.d? If not, that directory
> is probably unused.
It seemed to be used:
=2D------------------- 8< --------------------
root@... cron.d]# ls -ltr
insgesamt 8
=2Drw-r--r-- 1 root root 188 7. M=E4r 13:36 sysstat
=2Drw-r--r-- 1 root root 217 21. Apr 21:08 munin
[root@... cron.d]# cat sysstat
# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib/sa/sa2 -A
=2D------------------- 8< --------------------
But, as I haven't activated sa since now..=20
I am not aware of, wether it is functionable.
I'll try that, when I have more time..
> > The only thing left is, that smartctl_exit_status is 5
> > for the damaged device (hdb)
> >
> > But this is not the right way to handle it..
@Nico: I made this remark because of the *false* value 5.
Shouldn't it be 216? (That is smartctls exit status after=20
executing the command within a shell)
> > I would only check, wether the 3rd bit is set.
>
> What would you think of plaotting the bits separately,
> with their bit number as value to keep them separate?
> Nico and I are debating the best way to represent the
> exit value, but can't agree yet.
>
It's useless to draw graphs about info, that is not changing=20
during time.
In detail:
=2D- Bit 0: Command line did not parse.
With the same configuration, that should have the same=20
value. I suppose we do configure at the start and keep the=20
configuration then.
ACK?
=2D- Bit 1: Device open failed, or device did not return an=20
IDENTIFY DEVICE structure.
Could be interesting, if a hardware defect could cause it.
=2D- Bit 2: Some SMART command to the disk failed, or there=20
was a checksum error in a SMART data structure
Relevant, especially reporting the checksum error=20
=2D- Bit 3: SMART status check returned "DISK FAILING".
That is relevant!
=2D- Bit 4: SMART status check returned "DISK OK" but we found=20
prefail Attributes <=3D threshold.
Also relevant!
=2D- Bit 5: SMART status check returned "DISK OK" but we=20
found that some (usage or prefail) Attributes have been=20
<=3D threshold at some time in the past.
A bit redundant with Bit 4..
but relevant
=2D- Bit 6: The device error log contains records of errors.
relevant.
But will never go down, if once raised..
=2D- Bit 7: The device self-test log contains records of=20
errors.
relevant.
But will never go down, if once raised..
I am not sure in this case (wether next selftest, that will=20
succeed, blank the log)
> > Gabriele (who will now finish the article after all
> > this exkursions ;)
>
> Please make sure to tell this mailing list when it has
> been published. If there will be a version on the web, we
> will want to link to it from the Munin webpage.
It will be printed on paper first and will not go online=20
before one year later..
The main topic is the use of smartmontools.
The integration within centralized monitoring services is=20
only a short extra info (small chapter)
But there will be shown a picture of=20
munins smartmontools attribute graph :-)
Cheers,
Gabriele
|