From: Jeremie Z. <jz...@to...> - 2005-10-26 13:44:18
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 hello! i like very much smartmontools for my personnal use, though i find the output sometimes difficult to read... now i have to implement some automatic mailing of smartmon results to some clients of mine who are ahem... kinda dumb. i am _sure_ that they won't understand anything (i'll explain them for a very long time anyway), so i was wondering : - - is there somewhere in the development process of the smartmontools a project of implementing some kind of "-h" switch (for "Human Readable") that would simplify the output? reducing it to the only "severe" figures, why not synthetising it to "WARNING: WILL FAIL SOON!" or something like this? - - alternatively, is there some kind of script or any tool that does such a job of filtering smarmontools output? anyway thanx a lot for reading me, and congratulations for your very useful piece of software! jz - -- - --- Jeremie ZIMMERMANN --------------- http://tofz.org -- - --- Sauvons le Droit d'Auteur! ------- http://eucd.info -- - --- Recherche en Informatique Libre -- http://april.org -- - -- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDX4gT1eTOPM/5+wsRAoE3AJ9KvN0w9bbBLZo6FQ14U/RED4+QwACfS0v4 gtEE7ucyIRwb9gFTyDXxODk= =0iCZ -----END PGP SIGNATURE----- |
From: Bruno W. I. <br...@wo...> - 2005-10-28 15:16:21
|
On Wed, Oct 26, 2005 at 15:43:47 +0200, Jeremie ZIMMERMANN <jz...@to...> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > hello! > > i like very much smartmontools for my personnal use, though i find the > output sometimes difficult to read... > > now i have to implement some automatic mailing of smartmon results to > some clients of mine who are ahem... kinda dumb. > > i am _sure_ that they won't understand anything (i'll explain them for a > very long time anyway), so i was wondering : > > - - is there somewhere in the development process of the smartmontools a > project of implementing some kind of "-h" switch (for "Human Readable") > that would simplify the output? reducing it to the only "severe" > figures, why not synthetising it to "WARNING: WILL FAIL SOON!" or > something like this? Unless you have so much money that you are willing to toss disks at the first sign of trouble, I don't think there is a good way to distill things down to this disk is good or this disk is bad. In particular, people need to be able to deal with sectors that have gone bad. |
From: Malte G. <mal...@gm...> - 2005-10-29 18:12:53
|
On Friday 28 October 2005 17:18, Bruno Wolff III wrote: > On Wed, Oct 26, 2005 at 15:43:47 +0200, > Jeremie ZIMMERMANN <jz...@to...> wrote: > > - - is there somewhere in the development process of the > > smartmontools a project of implementing some kind of "-h" switch > > (for "Human Readable") that would simplify the output? > Unless you have so much money that you are willing to toss disks at > the first sign of trouble, I don't think there is a good way to > distill things down to this disk is good or this disk is bad. Maybe "smartctl -H" is what he's looking for, when it says "failed" it's time to act, quickly... Maybe this is enough for people who don't need/want to care about the precise condition of their disk. |
From: Volker K. <lis...@pa...> - 2005-10-29 19:26:55
|
> Maybe "smartctl -H" is what he's looking for, when it says "failed" it's > time to act, quickly... Maybe this is enough for people who don't > need/want to care about the precise condition of their disk. No it's not enough. You want to know before your disk fails that you want to backup the data and replace the disk, not after it has gone up in smoke. In theory the overall FAILED status ought to give some warning, but I think you may find that the warning is rather late, and a substantial number of disk block are already dead by then. Also, some disks never set the overall status to FAILED. Volker -- Volker Kuhlmann is possibly list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me. |
From: Mario 'B. H. <Mario.Holbe@TU-Ilmenau.DE> - 2005-10-30 12:51:36
|
Volker Kuhlmann <lis...@pa...> wrote: >> Maybe "smartctl -H" is what he's looking for, when it says "failed" it's >> time to act, quickly... Maybe this is enough for people who don't > No it's not enough. You want to know before your disk fails that you > want to backup the data and replace the disk, not after it has gone up That's exactly what SMART is for. SMART is a sensor that tries to tell you that your disk could fail soon. If SMART tells FAIL, your disk is not dead, there is just some kind of a chance that it's going to fail. It can die earlier and it can survive for years even if SMART tells you it's going to fail soon. > in smoke. In theory the overall FAILED status ought to give some > warning, but I think you may find that the warning is rather late, and a Of course there is no guarantee that SMART detects every failure condition soon enough - if you throw your disk out of a window, SMART is just quite unlikely to detect preconditions for a headcrash soon enough. Of course you can try to apply heuristics on top of the SMART values, this is what SMART itself also does - in fact, the manufacturer burns his own heuristic in the thresholds and attribute types. However, if you have no idea what you are doing, you probably do better if you always keep your data safe (i.e. back it up) and just live with the heuristic of the disk manufacturer instead of trying to apply somewhat religious heuristics on your own which you don't even understand. regards Mario -- It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories instead of theories to suit facts. -- Sherlock Holmes by Arthur Conan Doyle |
From: Volker K. <lis...@pa...> - 2005-10-30 21:57:24
|
On Mon 31 Oct 2005 01:49:35 NZDT +1300, Mario 'BitKoenig' Holbe wrote: > That's exactly what SMART is for. SMART is a sensor that tries to tell > you that your disk could fail soon. If SMART tells FAIL, your disk is > not dead, there is just some kind of a chance that it's going to fail. > It can die earlier and it can survive for years even if SMART tells you > it's going to fail soon. That's the theory, anyway. I wasn't arguing against taking smart = FAILED serious, I was arguing against trusting smart = PASSED. I can see some aspects which reduce the usefulness of the overall smart PASSED status from the user's point of view. 1) The disk manufacturer has an economic incentive not to set the status to FAILED too soon, because once FAILED, it's impossible to refute a warranty claim. 2) The smart feature is not within the awareness of the general public. Disk manufacturers have therefore little incentive to put a lot of effort into making it work properly. I only need to look at my own disks or read this list for 2 years to come to that conclusion. The disk database in smartmontools is proof that manufacturers aren't interested in too much standard. It is prudent to apply a healthy portion of distrust to the output of smartctl -a (and it's not because of smartmontools). > Of course you can try to apply heuristics on top of the SMART values, > this is what SMART itself also does - in fact, the manufacturer burns > his own heuristic in the thresholds and attribute types. True. It's sure better than nothing. But it's still possible to improve on it, and for example shift towards greater caution. It's also a question of end user usability: currently end users get 5 screenfuls of meaningless numbers which are badly (if at all) explained. Having a program which gives some interpretation of these screenfuls would go a long way towards a good desktop tool. The interpretation doesn't have to be perfect, and could have 2 or 3 levels of "sensitivity" or "cautiousness". The current state of affairs is pretty much "geeks only". Obviously though, as you say, nothing beats a good backup... Volker -- Volker Kuhlmann is possibly list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me. |
From: Jeremie Z. <jz...@to...> - 2005-11-02 13:33:10
|
Volker Kuhlmann wrote: > > True. It's sure better than nothing. But it's still possible to improve > on it, and for example shift towards greater caution. It's also a > question of end user usability: currently end users get 5 screenfuls of > meaningless numbers which are badly (if at all) explained. Having a > program which gives some interpretation of these screenfuls would go a > long way towards a good desktop tool. The interpretation doesn't have to > be perfect, and could have 2 or 3 levels of "sensitivity" or > "cautiousness". The current state of affairs is pretty much "geeks > only". > i completely agree with that! concerning people who would want to throw away a disk at the first slight sign of failure, it's _exactly_ the kind of people i'll be working with! jerks with cash... (sigh!) i'm puting drives into a RAID5 array, which is very sensitive to errors (one bad drive is OK, two means "bye-bye!"), so they are willing very much to be able to "throw away" a drive as soon as it may-will-soon fail! maybe somebody or myself will afterwards keep the drive and check for bad blocks and continue using it for years after that, but they will be happy with it.... jz -- --- Jeremie ZIMMERMANN --------------- http://tofz.org -- --- Sauvons le Droit d'Auteur! ------- http://eucd.info -- --- Recherche en Informatique Libre -- http://april.org -- -- |
From: Bruno W. I. <br...@wo...> - 2005-11-03 16:46:30
|
On Wed, Nov 02, 2005 at 14:32:55 +0100, Jeremie ZIMMERMANN <jz...@to...> wrote: > > concerning people who would want to throw away a disk at the first > slight sign of failure, it's _exactly_ the kind of people i'll be > working with! jerks with cash... (sigh!) I wouldn't call them Jerks. It could easily be the case that the value of the data or of downtime is such, that it is cheaper to replace drives at the first sign of trouble than to have one fail. |
From: Jeremy J. <jb...@fo...> - 2005-11-03 20:55:42
|
Bruno Wolff III wrote: > On Wed, Nov 02, 2005 at 14:32:55 +0100, > Jeremie ZIMMERMANN <jz...@to...> wrote: >> concerning people who would want to throw away a disk at the first >> slight sign of failure, it's _exactly_ the kind of people i'll be >> working with! jerks with cash... (sigh!) > > I wouldn't call them Jerks. It could easily be the case that the value of the > data or of downtime is such, that it is cheaper to replace drives at the > first sign of trouble than to have one fail. > Maxtor are quite happy to replace disks with any bad sectors under warranty (which is handy, since they fail more often than one would like). We pull out disks with SMART read errors and get them replaced - HD costs are insignificant compared to customer data costs. For what it's worth, we use a script somewhat similar to http://jeremy.publication.org.uk/checkdisks (python, server names altered etc) that will quickly monitor SMART errors on a large number of disks on about 10 different machines and point us towards failed disks - this is used in conjunction with nightly smartd short tests and its built-in emailing feature to warn us of more serious errors immediately. Jeremy (Hrm. Looking back at that script make me think I could re-write it in a much more pythonic way, but *shrug* it gets the job done on a small network.) |
From: Jeremie Z. <jz...@to...> - 2005-11-04 13:07:12
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jeremy James wrote: > > For what it's worth, we use a script somewhat similar to > http://jeremy.publication.org.uk/checkdisks (python, server names > altered etc) that will quickly monitor SMART errors on a large number of > disks on about 10 different machines and point us towards failed disks - > this is used in conjunction with nightly smartd short tests and its great! when such a thing will be integrated into smartmontools? ;) jz - -- - --- Jeremie ZIMMERMANN --------------- http://tofz.org -- - --- Sauvons le Droit d'Auteur! ------- http://eucd.info -- - --- Recherche en Informatique Libre -- http://april.org -- - -- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDa1zy1eTOPM/5+wsRAtYRAKCI4B3/hrlZ7wI47T6pMW7bjOCXIACeModI bHiQ7KiopeYj+rAnkrh41fQ= =ymPo -----END PGP SIGNATURE----- |
From: Bruce A. <ba...@gr...> - 2005-10-31 03:25:25
|
Mario, > However, if you have no idea what you are doing, you probably do better > if you always keep your data safe (i.e. back it up) and just live with > the heuristic of the disk manufacturer instead of trying to apply > somewhat religious heuristics on your own which you don't even > understand. I would say something even stronger: any data which can not be easily replaced or reconstructed should be backed up, always. I view SMART as a tool 'for convenience'. It saves me and my research group time when we know in advance that a disk is likely to die. This allows us to deal with the potential failure in a manner which is more planned, and less time-consuming and disruptive than having to deal with an unexpected failure. Cheers, Bruce |