For those using snapraid on linux, here is a helper script that will run snapraid sync and email you the results (including a report on the file system changes) on completion. For added security, the script first checks to make sure that the content and parity file exist (default behaviour of snapraid v1.6 is to simply create new ones!). It also runs diff to check what has changed on the file system. If it detects more than X number of removed files (user configurable), it will warn the user (via email) and NOT proceed with sync to give user a chance to recover the deleted files in case of accidental deletions.
Notes:
1) no email will be sent if the file system and parity info is in sync.
2) the script is built on and for the Ubuntu distro. Should work with other distros but may require slight tweaks.
3) to run the script automatically, you will need to add it to cron or anacron
4) to have the output emailed to you at your internet email address, you need change the EMAIL_ADDRESS variable _AND_ configure postfix to use an internet mail gateway (like your isp or gmail).
5) deleted file detection threshold (DEL_THRESHOLD) is currently set to 20. You may want to increase/reduce it as per your personal preference. (i.e. hassle of having to run sync manually vs risk of not being able to recover from accidental deletes!)
#!/bin/bash######################################################################## this is a helper script that keeps snapraid parity info in sync with# your data. Here's how it works:# 1) it first calls diff to figure out if the parity info is out of sync# 2) if there are changed files (i.e. new, changed, moved or removed),# it then checks how many files were removed.# 3) if the deleted files exceed X (configurable), it triggers an# alert email and stops. (in case of accidental deletions)# 4) otherwise, it will call sync.# 5) when sync finishes, it sends an email with the output to user.## $Author: sidney $# $Revision: 5 $# $Date: 2011-10-16 09:28:44 +1100 (Sun, 16 Oct 2011) $# $HeadURL: file:///svnrepo/linuxScripts/snapraid_diff_n_sync.sh $########################################################################EMAIL_SUBJECT_PREFIX="[NAS-HTPC-HPN36L] SnapRAID - "EMAIL_ADDRESS="root"DEL_THRESHOLD=20CONTENT_FILE="/mnt/PPU/snapraid/content"PARITY_FILE="/mnt/PPU/snapraid/parity"LOG_FILE="/var/log/snapraid.log"## INTERNAL TEMP VARS ##TMP_OUTPUT="/tmp/snapraid.out"# redirect all stdout to log file (leave stderr alone thou)exec >> $LOG_FILE# timestamp the jobecho"[`date`] SnapRAID Job started."echo"SnapRAID DIFF Job started on `date`" > $TMP_OUTPUTecho"----------------------------------------" >> $TMP_OUTPUT#TODO - mount and unmount parity disk on demand!#sanity check first to make sure we can access the content and parity filesif[ ! -e $CONTENT_FILE];thenecho"[`date`] ERROR - Content file ($CONTENT_FILE) not found!"echo"ERROR - Content file ($CONTENT_FILE) not found!" >> $TMP_OUTPUTexit1;fiif[ ! -e $PARITY_FILE];thenecho"[`date`] ERROR - Parity file ($PARITY_FILE) not found!"echo"ERROR - Parity file ($PARITY_FILE) not found!" >> $TMP_OUTPUTexit1;fi# run the snapraid DIFF commandecho"[`date`] Running DIFF Command."
snapraid diff >> $TMP_OUTPUT# wait for the above cmd to finishwaitecho"----------------------------------------" >> $TMP_OUTPUTecho"SnapRAID DIFF Job finished on `date`" >> $TMP_OUTPUTDEL_COUNT=$(grep ^Remove $TMP_OUTPUT| wc -l - | cut -d' ' -f1)ADD_COUNT=$(grep ^Add $TMP_OUTPUT| wc -l - | cut -d' ' -f1)MOVE_COUNT=$(grep ^Move $TMP_OUTPUT| wc -l - | cut -d' ' -f1)UPDATE_COUNT=$(grep ^Update $TMP_OUTPUT| wc -l - | cut -d' ' -f1)echo"SUMMARY of changes - Added [$ADD_COUNT] - Deleted [$DEL_COUNT] - Moved [$MOVE_COUNT] - Updated [$UPDATE_COUNT]" >> $TMP_OUTPUT# check if files have changed#if [ "grep -E 'Add|Remove' $TMP_OUTPUT 1> /dev/null" ]; thenif[$DEL_COUNT -gt 0 -o $ADD_COUNT -gt 0 -o $MOVE_COUNT -gt 0 -o $UPDATE_COUNT -gt 0];then# YES, check if number of deleted files exceed DEL_THRESHOLDif[$DEL_COUNT -gt $DEL_THRESHOLD];then# YES, lets inform user and not proceed with the sync just in caseecho"Number of deleted files ($DEL_COUNT) exceeded threshold ($DEL_THRESHOLD). NOT proceeding with sync job. Please run sync manually if this is not an error condition." >> $TMP_OUTPUT
/usr/bin/mail -s "$EMAIL_SUBJECT_PREFIX WARNING - Number of deleted files ($DEL_COUNT) exceeded threshold ($DEL_THRESHOLD)""$EMAIL_ADDRESS" < $TMP_OUTPUTecho"WARNING - Deleted files ($DEL_COUNT) exceeded threshold ($DEL_THRESHOLD). Check $TMP_OUTPUT for details. NOT proceeding with sync job."else# NO, delete threshold not reached, lets run the sync jobecho"Deleted files ($DEL_COUNT) did not exceed threshold ($DEL_THRESHOLD), proceeding with sync job." >> $TMP_OUTPUTecho"[`date`] Changes detected [A-$ADD_COUNT,D-$DEL_COUNT,M-$MOVE_COUNT,U-$UPDATE_COUNT] and deleted files ($DEL_COUNT) is below threshold ($DEL_THRESHOLD). Running SYNC Command."echo"SnapRAID SYNC Job started on `date`" >> $TMP_OUTPUTecho"----------------------------------------" >> $TMP_OUTPUT
snapraid sync >> $TMP_OUTPUT#wait for the job to finishwaitecho"----------------------------------------" >> $TMP_OUTPUTecho"SnapRAID SYNC Job finished on `date`" >> $TMP_OUTPUT
/usr/bin/mail -s "$EMAIL_SUBJECT_PREFIX Sync Job COMPLETED""$EMAIL_ADDRESS" < $TMP_OUTPUTfielse# NO, so lets log it and exitecho"[`date`] No change detected. Nothing to do"fiecho"[`date`] Job ended."exit0;
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Many thanks for this script, I've run it manually and it's exactly what I need!
However, it doesn't seem to work correctly when run via cron. The script runs through but for some reason, the "snapraid diff >> $TMP_OUTPUT" command is not being re-directed to $TMP_OUTPUT (no idea where it goes) and as a result the script thinks there is nothing that needs to be done.
Can anyone shed any light on this issue please?
Thanks,
Ian.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@ian-f : maybe the script does not find the snapraid executable when run from the cron environment ?
Run 'which snapraid' in a shell, and replace all occurrences of 'snapraid' with the full path as returned (in my case it is /usr/local/bin/snapraid).
To others: I also run a version of sidney's script on my server (Ubuntu 10.04 LTE), it works, but with a twist :
- if sync has nothing to do, no mail is sent, as designed
- if sync runs, I do get a mail with the proper title sent at the right address. However the mail body is empty, and it has an attached file called 'noname' (with no extension). If I open that in a text editor I see the script's intended output.
Anybody else seen this ? My guess is it comes from the snapraid progress indicator being logged, such as :
@fpp-sf: 100% correct, thank you very much! I thought it was an issue running a bash as opposed to a sh script under cron. It works perfectly now, thank you.
I just wish I could return the favour and solve your problem for you but I've no idea how to solve yours. Sorry. :-(
Ian.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@fpp-sf: 100% correct, thank you very much! I thought it was an issue running a bash as opposed to a sh script under cron. It works perfectly now, thank you
Glad to be of help… it's a very common issue with cron jobs, I had to do it myself…
I just wish I could return the favour and solve your problem for you but I've no idea how to solve yours. Sorry
No sweat, maybe someone else will chime in :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Another strange (and maybe unrelated) thing is that I added some "hdparm" commands at the end of the script, to spin down the drives once the sync is done.
When there is nothing to do (and no mail sent), this works just fine.
When the sync actually runs (and the malformed mail is sent), the commands are executed (and the "Spinning down drives" message appears in the log), but in the morning I always find my drives still running…
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Are you using postfix for sending the email? Which email gateway and email client are you using? I'm using gmail for both sending and receiving the emails and the messages will show up in the body no matter how long the text are. Maybe you can try it with gmail first to confirm that its not a problem with your email gateway or client?
As for the hard disk spin down issue, have you checked if you have other jobs that may be spinning up the drives in the middle of the night? FYI, in Ubuntu, there is an anacron daily job called "standard" that will wake all the disks for its "lost+found" task. The daily job typically runs in the early morning (around 8am on mine) so by the time you check, you will find all the disks up and running.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The '/usr/bin/mail' command, as called in your script, is actually Heirloom-mailx on my system.
The MTA is not Postfix, but something much simpler, sSMPT.
It is configured to send mail through GMail using my account,and works quite well.
By your reasoning, the only possible culprit was mailx. And sure enough, if you read the man right down to the end, it does mention that it accepts only Unix line endings (LF) for text, and treats anything containing CRs (^M) as binary data (hence the attachment in mail).
Bingo: the output of a working "snapraid sync" has *lots* of CRs :-)
So mystery solved, thanks for setting me on the right track !
The hdparm mystery is still a mystery, however…
I don't think it can be another cron job, because when sync isn't run (which is often), the disks are *always* idle, and when it runs, they are *always* spinning on the following day.
I do see the daily standard job you mention. Maybe it is failing for some reason (I can't find cron logs anywhere to check), but it's clearly not waking up my disks in the morning…
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Good to know you managed to fix one of the issues…
Some other thought on your hard disk spin down mystery:
1) write a simple script to poll the disk state at regular intervals to see exactly what time its waking up (I used this approach when I was setting up my box. and thats how I found out about the standard cron job.)
2) in my setup, when the system sends mail, it writes to syslog which results in disk activity. AFAIK, the mail command runs asynchronously so your hdparm command might be alittle too early as there are still disk activity (due to logging by the mail command) going on.
3) IMHO, I would move the hdparm command into an independent generic task. This way, you can have a consistent and more robust way of idling your disks. On my box, I created a script that will check for a combination of triggers before idling the disks so that I have a balance between power saving, disk wear (due to spin up/down) and performance. Some of the conditions I check for are:
a) time since last disk activity
b) active connections to network shares
c) active torrents
Note that my box has 4 hard disks and a SSD and I configure each disk for a specific task (e.g. OS, TV recording, torrent, network share and snapraid parity) so they are spin down independently depending on the activities happening at the time the script runs.
If the above sounds like something you have or would like to have, I will be happy to share the script.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Some other thought on your hard disk spin down mystery:
1) write a simple script to poll the disk state at regular intervals to see exactly what time its waking up (I used this approach when I was setting up my box. and thats how I found out about the standard cron job.)
Hmmm… why do I never think of those simple hacks myself ? :-)
Thanks for the heads-up !
2) in my setup, when the system sends mail, it writes to syslog which results in disk activity. AFAIK, the mail command runs asynchronously so your hdparm command might be a little too early as there are still disk activity (due to logging by the mail command) going on.
Well, I did think of something like that at one point, and added a 'wait' command after the mailing, and then a 'sleep 60' just for kicks, but it didn't change anything.
Not very likely anyway, as I happen to have the exact same setup as you do - 3 data disks, one parity, and a separate SSD for system and 'always-on' stuff. So syslog & such would happen on the SSD, not the hard disks I'm trying to idle…
3) IMHO, I would move the hdparm command into an independent generic task. This way, you can have a consistent and more robust way of idling your disks. On my box, I created a script that will check for a combination of triggers before idling the disks so that I have a balance between power saving, disk wear (due to spin up/down) and performance.
If the above sounds like something you have or would like to have, I will be happy to share the script.
Actually I tacked the hdparm at the end there because sync can take seconds, minutes or hours, so it's unpredictable.
As it runs in the middle of the night, just after it finishes is the best time to idle the disks until the following evening… or would be, if it worked reliably :-)
I'd love to have a peek at your more sophisticated approach however - and gladly accept your kind offer, as I've been spoiled by Python for too long, and am badly out of practice with shell scripts :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It sounds like you are mailing STDOUT. If you limit the mail to STDERR, you'll get everything important, and, almost certainly, no CR's
Yes, I'm using sidney's script above : sdout goes to mail, stderr to the log file.
The approach has merit, as the two yield useful but different information.
One possible solution would be to add an option to snapraid so that sync doesn't print its progress line.
Another would be to use a less brain-dead mail agent.
But really, all it takes is adding
tr '\r' '\n'
(if you want to keep the progress info) or
grep -v '%,'
(if you don't need it), to the line that sends the mail in the script…
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here's my standby hdd script as requested. A couple of things to note:
1) the script is run by cron on a 1/2 hourly basis
2) the script will automatically scan for all hard disks attached to the system and spin down the idle disks so there is no need to manually configure/add them. You only need to manually set them up if you want to take advantage of the additional checks (like SSD, samba and torrent)
3) I use disk id to uniquely identify my disks. Its alittle cryptic (luckily for me, all my disks are different models and the model codes happen to be the first part of the disk id so I can make out which disk is which easily) to read but its robust as it doesn't change like UUID, path and disk labels can.
4) the reason why I implemented additional checks on top of disk idle time is because I want to have my cake and eat it - performance when I need it (i.e. no latency due to disk spin up), power-saving when I don't and maximum disk longevity (i.e. no unnecessary spin up/down) at all times.
5) I check for samba connections because in my typical usage session (3-4hrs at night), access to the network share is infrequent so the 30mins disk idle triggers quite often resulting in slow network access (due to the spin up delay) and increased disk wear. By checking for samba connections, I make sure that there are no more active users on the network before I allow the disks to spin down.
6) Likewise, I check for active torrents as some of the torrents I seed/download have really low activities so rather than having the disk spin up/down when they are being seeded/downloaded, I prefer just to let the disk spin all the time.
#!/bin/bash######################################################################## this is a helper script that tries to put the hdd to sleep based on# disk activity as well as the state of some processes.## $Author: sidney $# $Revision: 4 $# $Date: 2011-10-15 22:47:36 +1100 (Sat, 15 Oct 2011) $# $HeadURL: file:///svnrepo/linuxScripts/standby_hdd.sh $######################################################################### enter a list of SSD ids using RegEx syntaxSSD_DISKS="OCZ-VERTEX2"# enter a list of disks used by sambaSMB_DISKS="@(*WDC_WD20EARS*|*WDC_WD20EARX*)"# enter a list of disks used by transmissionBT_DISKS="*WDC_WD20EARX*"# enter a list of disks used by squeezebox#TODO - implement squeezebox watch## temp varsTMP_OUTPUT="/dev/shm/last_diskstats"SMB_ACTIVE=-1
BT_ACTIVE=-1
if[ ! -f $TMP_OUTPUT]then
cat /proc/diskstats > $TMP_OUTPUTecho"[$(date '+%F %T')] Tempfile does not exist, creating"exit0fifor DISK in`ls -1 /dev/disk/by-id | grep "scsi"| grep -v "part"`;do# skip SSDsif[[$DISK==$SSD_DISKS]];thenecho"[$(date '+%F %T')] Skipping SSD $DISK"continue;fi#lets determine the disk id (i.e. sd?)DISK_ID=$(ls -l /dev/disk/by-id/$DISK| cut -d '>' -f2 | cut -d '/' -f3)if["$(diff /proc/diskstats $TMP_OUTPUT| grep $DISK_ID)"=""];then# No disk activity since the last time this script ranif["$(/sbin/hdparm -C /dev/$DISK_ID| grep 'drive state'| cut -d: -f2 | awk '{ print $1}')"="standby"];thenecho"[$(date '+%F %T')] $DISK ($DISK_ID) already spun down"else# Disk is active but before we spin it down, lets check if any known apps/processes might be using it# checking disks used by sambaif[[$DISK==$SMB_DISKS]];thenif[$SMB_ACTIVE -lt 0];then# smb test not performed yet so lets do it nowSMB_ACTIVE=$(($(smbstatus -b | wc -l - | cut -d' ' -f1)-4))echo"[$(date '+%F %T')] Found $SMB_ACTIVE samba clients connected."fiif[$SMB_ACTIVE -gt 0];then# there are smb clients connected, so lets do nothingecho"[$(date '+%F %T')] There are $SMB_ACTIVE active smb clients connected. Not spinning down $DISK ($DISK_ID)"continue;fifi# checking disks used by transmissionif[[$DISK==$BT_DISKS]];thenif[$BT_ACTIVE -lt 0];then# BT test not performed yet so lets do it nowBT_ACTIVE=`/usr/bin/transmission-remote -n transmission:transmissionBT -l | grep -E 'Idle|Seeding|Downloading'| wc -l - | cut -d' ' -f1`echo"[$(date '+%F %T')] Found $BT_ACTIVE active torrents."fiif[$BT_ACTIVE -gt 0];then# there are active seeding/downloading torrentsecho"[$(date '+%F %T')] There are $BT_ACTIVE active torrents, therefore NOT spinning down $DISK ($DISK_ID)"continue;fifi# at this point, we believe there won't be any need for the disks so lets spin it downecho"\n[$(date '+%F %T')] Spinning down $DISK ($DISK_ID)"
/sbin/hdparm -y /dev/$DISK_ID| tr -d '\n'echo"\n"fielseecho"[$(date '+%F %T')] $DISK ($DISK_ID) is in use"fidone# store the disk stats for next use
cat /proc/diskstats > $TMP_OUTPUTexit0;
Any questions, fire away…
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Oh, and one more thing, if you are really paranoid about disk wear and unnecessary disk spin up/down, schedule the snapraid job in anacron's daily task instead of using cron. The reason is because as I mentioned previously, the daily standard job will potentially wake all your disks for the 'lost+found' task (it frequently does on mine) and since snapraid does the same (i.e. wake all the disks), might as well have them run back-to-back instead. :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
jkuehl2, is the auto spin down using hdparm working for you on your n36l/ubuntu rig? It didn't work on mine so I had to write my own manual script. But then again, it could my disks that is the problem. I'm running all WD green drives.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here's my standby hdd script as requested. A couple of things to note:
Once again, thanks a lot for sharing this Sidney !
That is some impressive script, I wouldn't trust myself to try and make it work on my own machine :-)
My immediate goal is much simpler than yours : I just want my disks to spin down early in the morning after sync has run, and stay that way until I access them when I get home… after that, we'll see :-)
That disk-by-id stuff is one useful trick, though.
Until now I was using by-label, but these only point to partitions, not disks, whereas by-id can do both.
This may explain some of my problems with hdparm commands not working, I'll switch over and see if it goes better.
Otherwise, it's really fun to see how we separately came to very similar hardware and software setup.
My system disk is also an SSD - and also an OCZ Vertex 2 :-)
My data and parity disks are all Samsung F4s though.
And I also run Samba, Transmission, and Squeezeboxserver… and snapraid, obviously.
Thanks for all the ideas !
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Oh, and one more thing, if you are really paranoid about disk wear and unnecessary disk spin up/down, schedule the snapraid job in anacron's daily task instead of using cron. The reason is because as I mentioned previously, the daily standard job will potentially wake all your disks for the 'lost+found' task (it frequently does on mine) and since snapraid does the same (i.e. wake all the disks), might as well have them run back-to-back instead. :)
Again some great advice… it forced me to look into this anachron stuff and understand how it works, thanks :-)
Actually I don't have anacron (probably because I have Ubuntu server LTS which is supposed to be always-on), but standard cron emulates the behaviour.
Default schedules for cron-daily, -weekly and -monthly were all over the map, which explains why I seemed to get random results with my disks from one day to the other…
I have rearranged things so that cron-daily runs last no matter what ; and as cron-daily runs 'standard' last, I've stuck my 'sync then idle' script at the end of that one, using disk-by-id instead of labels.
Now if it doesn't work right tomorrow there'll be some serious explaining to do :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
For those using snapraid on linux, here is a helper script that will run snapraid sync and email you the results (including a report on the file system changes) on completion. For added security, the script first checks to make sure that the content and parity file exist (default behaviour of snapraid v1.6 is to simply create new ones!). It also runs diff to check what has changed on the file system. If it detects more than X number of removed files (user configurable), it will warn the user (via email) and NOT proceed with sync to give user a chance to recover the deleted files in case of accidental deletions.
Notes:
1) no email will be sent if the file system and parity info is in sync.
2) the script is built on and for the Ubuntu distro. Should work with other distros but may require slight tweaks.
3) to run the script automatically, you will need to add it to cron or anacron
4) to have the output emailed to you at your internet email address, you need change the EMAIL_ADDRESS variable _AND_ configure postfix to use an internet mail gateway (like your isp or gmail).
5) deleted file detection threshold (DEL_THRESHOLD) is currently set to 20. You may want to increase/reduce it as per your personal preference. (i.e. hassle of having to run sync manually vs risk of not being able to recover from accidental deletes!)
Last edit: HedonisticAltruism 2014-05-25
This script has been updated and hosted on github for easy reference. Get it here - https://gist.github.com/bfg100k/87a1bbccf4f15d963ff7
Thank you for sharing that script, very useful.
Thanks, using this now as the base of my crontab.
Hi,
Many thanks for this script, I've run it manually and it's exactly what I need!
However, it doesn't seem to work correctly when run via cron. The script runs through but for some reason, the "snapraid diff >> $TMP_OUTPUT" command is not being re-directed to $TMP_OUTPUT (no idea where it goes) and as a result the script thinks there is nothing that needs to be done.
Can anyone shed any light on this issue please?
Thanks,
Ian.
Thanks for the script, fits my suits perfectly as another happy Snapraid + N36L + Ubuntu User ;-)
@ian-f : maybe the script does not find the snapraid executable when run from the cron environment ?
Run 'which snapraid' in a shell, and replace all occurrences of 'snapraid' with the full path as returned (in my case it is /usr/local/bin/snapraid).
To others: I also run a version of sidney's script on my server (Ubuntu 10.04 LTE), it works, but with a twist :
- if sync has nothing to do, no mail is sent, as designed
- if sync runs, I do get a mail with the proper title sent at the right address. However the mail body is empty, and it has an attached file called 'noname' (with no extension). If I open that in a text editor I see the script's intended output.
Anybody else seen this ? My guess is it comes from the snapraid progress indicator being logged, such as :
Maybe this causes the contents to be considered not "plain text" ?
Once, just once, I ran a test with a sync so small it printed only one line :
… and that got me a normal mail with the report in the body :-)
Any ideas ?
TIA,
fp
@fpp-sf: 100% correct, thank you very much! I thought it was an issue running a bash as opposed to a sh script under cron. It works perfectly now, thank you.
I just wish I could return the favour and solve your problem for you but I've no idea how to solve yours. Sorry. :-(
Ian.
Glad to be of help… it's a very common issue with cron jobs, I had to do it myself…
No sweat, maybe someone else will chime in :-)
* BUMP * … anyone ? :-)
Another strange (and maybe unrelated) thing is that I added some "hdparm" commands at the end of the script, to spin down the drives once the sync is done.
When there is nothing to do (and no mail sent), this works just fine.
When the sync actually runs (and the malformed mail is sent), the commands are executed (and the "Spinning down drives" message appears in the log), but in the morning I always find my drives still running…
fpp-sf,
Are you using postfix for sending the email? Which email gateway and email client are you using? I'm using gmail for both sending and receiving the emails and the messages will show up in the body no matter how long the text are. Maybe you can try it with gmail first to confirm that its not a problem with your email gateway or client?
As for the hard disk spin down issue, have you checked if you have other jobs that may be spinning up the drives in the middle of the night? FYI, in Ubuntu, there is an anacron daily job called "standard" that will wake all the disks for its "lost+found" task. The daily job typically runs in the early morning (around 8am on mine) so by the time you check, you will find all the disks up and running.
Thanks for the suggestions Sidney !
The '/usr/bin/mail' command, as called in your script, is actually Heirloom-mailx on my system.
The MTA is not Postfix, but something much simpler, sSMPT.
It is configured to send mail through GMail using my account,and works quite well.
By your reasoning, the only possible culprit was mailx. And sure enough, if you read the man right down to the end, it does mention that it accepts only Unix line endings (LF) for text, and treats anything containing CRs (^M) as binary data (hence the attachment in mail).
Bingo: the output of a working "snapraid sync" has *lots* of CRs :-)
So mystery solved, thanks for setting me on the right track !
The hdparm mystery is still a mystery, however…
I don't think it can be another cron job, because when sync isn't run (which is often), the disks are *always* idle, and when it runs, they are *always* spinning on the following day.
I do see the daily standard job you mention. Maybe it is failing for some reason (I can't find cron logs anywhere to check), but it's clearly not waking up my disks in the morning…
Good to know you managed to fix one of the issues…
Some other thought on your hard disk spin down mystery:
1) write a simple script to poll the disk state at regular intervals to see exactly what time its waking up (I used this approach when I was setting up my box. and thats how I found out about the standard cron job.)
2) in my setup, when the system sends mail, it writes to syslog which results in disk activity. AFAIK, the mail command runs asynchronously so your hdparm command might be alittle too early as there are still disk activity (due to logging by the mail command) going on.
3) IMHO, I would move the hdparm command into an independent generic task. This way, you can have a consistent and more robust way of idling your disks. On my box, I created a script that will check for a combination of triggers before idling the disks so that I have a balance between power saving, disk wear (due to spin up/down) and performance. Some of the conditions I check for are:
a) time since last disk activity
b) active connections to network shares
c) active torrents
Note that my box has 4 hard disks and a SSD and I configure each disk for a specific task (e.g. OS, TV recording, torrent, network share and snapraid parity) so they are spin down independently depending on the activities happening at the time the script runs.
If the above sounds like something you have or would like to have, I will be happy to share the script.
It sounds like you are mailing STDOUT. If you limit the mail to STDERR, you'll get everything important, and, almost certainly, no CR's
Hmmm… why do I never think of those simple hacks myself ? :-)
Thanks for the heads-up !
Well, I did think of something like that at one point, and added a 'wait' command after the mailing, and then a 'sleep 60' just for kicks, but it didn't change anything.
Not very likely anyway, as I happen to have the exact same setup as you do - 3 data disks, one parity, and a separate SSD for system and 'always-on' stuff. So syslog & such would happen on the SSD, not the hard disks I'm trying to idle…
Actually I tacked the hdparm at the end there because sync can take seconds, minutes or hours, so it's unpredictable.
As it runs in the middle of the night, just after it finishes is the best time to idle the disks until the following evening… or would be, if it worked reliably :-)
I'd love to have a peek at your more sophisticated approach however - and gladly accept your kind offer, as I've been spoiled by Python for too long, and am badly out of practice with shell scripts :-)
Yes, I'm using sidney's script above : sdout goes to mail, stderr to the log file.
The approach has merit, as the two yield useful but different information.
One possible solution would be to add an option to snapraid so that sync doesn't print its progress line.
Another would be to use a less brain-dead mail agent.
But really, all it takes is adding
(if you want to keep the progress info) or
(if you don't need it), to the line that sends the mail in the script…
Here's my standby hdd script as requested. A couple of things to note:
1) the script is run by cron on a 1/2 hourly basis
2) the script will automatically scan for all hard disks attached to the system and spin down the idle disks so there is no need to manually configure/add them. You only need to manually set them up if you want to take advantage of the additional checks (like SSD, samba and torrent)
3) I use disk id to uniquely identify my disks. Its alittle cryptic (luckily for me, all my disks are different models and the model codes happen to be the first part of the disk id so I can make out which disk is which easily) to read but its robust as it doesn't change like UUID, path and disk labels can.
4) the reason why I implemented additional checks on top of disk idle time is because I want to have my cake and eat it - performance when I need it (i.e. no latency due to disk spin up), power-saving when I don't and maximum disk longevity (i.e. no unnecessary spin up/down) at all times.
5) I check for samba connections because in my typical usage session (3-4hrs at night), access to the network share is infrequent so the 30mins disk idle triggers quite often resulting in slow network access (due to the spin up delay) and increased disk wear. By checking for samba connections, I make sure that there are no more active users on the network before I allow the disks to spin down.
6) Likewise, I check for active torrents as some of the torrents I seed/download have really low activities so rather than having the disk spin up/down when they are being seeded/downloaded, I prefer just to let the disk spin all the time.
Ok, enough talk. Here's the good stuff :)
Any questions, fire away…
Oh, and one more thing, if you are really paranoid about disk wear and unnecessary disk spin up/down, schedule the snapraid job in anacron's daily task instead of using cron. The reason is because as I mentioned previously, the daily standard job will potentially wake all your disks for the 'lost+found' task (it frequently does on mine) and since snapraid does the same (i.e. wake all the disks), might as well have them run back-to-back instead. :)
have you tried setting a explicit spindown timeout?(for one minute, increments occur in 5-seconds-steps)
"hdparm -S 12"
Also -y sets the drive to standby not sleep. Try upper case -Y or explicit timeout with -S.
jkuehl2, is the auto spin down using hdparm working for you on your n36l/ubuntu rig? It didn't work on mine so I had to write my own manual script. But then again, it could my disks that is the problem. I'm running all WD green drives.
Autostarting shell-script which runs whenever the server boots - setting the timeout and the acoustic power management.
HDDs are one HD204 Green from Samsung and four Seagate Greens.
Once again, thanks a lot for sharing this Sidney !
That is some impressive script, I wouldn't trust myself to try and make it work on my own machine :-)
My immediate goal is much simpler than yours : I just want my disks to spin down early in the morning after sync has run, and stay that way until I access them when I get home… after that, we'll see :-)
That disk-by-id stuff is one useful trick, though.
Until now I was using by-label, but these only point to partitions, not disks, whereas by-id can do both.
This may explain some of my problems with hdparm commands not working, I'll switch over and see if it goes better.
Otherwise, it's really fun to see how we separately came to very similar hardware and software setup.
My system disk is also an SSD - and also an OCZ Vertex 2 :-)
My data and parity disks are all Samsung F4s though.
And I also run Samba, Transmission, and Squeezeboxserver… and snapraid, obviously.
Thanks for all the ideas !
Again some great advice… it forced me to look into this anachron stuff and understand how it works, thanks :-)
Actually I don't have anacron (probably because I have Ubuntu server LTS which is supposed to be always-on), but standard cron emulates the behaviour.
Default schedules for cron-daily, -weekly and -monthly were all over the map, which explains why I seemed to get random results with my disks from one day to the other…
I have rearranged things so that cron-daily runs last no matter what ; and as cron-daily runs 'standard' last, I've stuck my 'sync then idle' script at the end of that one, using disk-by-id instead of labels.
Now if it doesn't work right tomorrow there'll be some serious explaining to do :-)