From: Bruce A. <ba...@gr...> - 2002-11-29 06:37:27
|
> Right now, I can run smartctl and have it work successfully with > out the ataIdentify. I've enclosed a dump of stdout. Must say, I'm > impressed with the app & the level of detail it provides! Cooler than > I thought .... So right now, smartctl is working on MacOSX but needs > some stuff fixed & I need to go through all the options & verify that > they're all ok. Here are some comments about the output (I haven't looked at your sources yet): > [Petes-TiBook:~/Desktop/SMART Project/smartmontools-5.0-45 - MacOSX] pcassidy% ./smartctl -anPe /dev/rdisk0 > smartctl version 5.0-45 Copyright (C) 2002 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > Found one! > Error ATA GET HD Identity Failed: Bad file descriptor > Smartctl: Hard Drive Read Identity Failed > > === START OF INFORMATION SECTION === > Device Model: > Serial Number: > Firmware Version: > ATA Version is: 1 > ATA Standard is: Unrecognized. Minor revision code: 0x00 All the rubbish here is because you don't yet have a GET IDENTITY ata command. > SMART is only available in ATA Version 3 Revision 3 or greater. > We will try to proceed in spite of this. > SMART support is: Unavailable - device lacks SMART capability. > Checking to be sure by trying SMART ENABLE command. > SMART appears to work. Continuing. > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF ENABLE/DISABLE COMMANDS SECTION === > SMART Enabled. > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Off-line data collection status: (0x00) Offline data collection activity was > never started. > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run. > Total time to complete off-line > data collection: (34050) seconds. This looks too big by at least a factor of ten. Check that you don't have a byte-ordering problem, or that something is being incorrectly cast. on PPC, is sizeof: char==1 short==2 int==4 long=4 long long=8 double=8 > Offline data collection > capabilities: (0x1b) SMART execute Offline immediate. > Automatic timer ON/OFF support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > SMART capabilities: (0x0300) Does not save SMART data before > entering power-saving mode. > Error logging capability: (0x01) Error logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 27) minutes. 27 minutes is around 1600 seconds. I would have expected the 34050 above to be around this amount -- not a factor of 20 bigger. > SMART Attributes Data Structure revision number: 4096 Again, this number looks way too big, Hmmm, 2^12, hard to see how bit 12 could accidentally get set though. Ahh -- I bet it's supported to be "16" == 2^4 and somehow you have gotten shifted 8 bits off. Yes, sounds right. Is there some one-byte misalignment here?? > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x0b00 100 100 062 Old_age - 0 > 2 Throughput_Performance 0x0500 100 100 040 Old_age - 0 > 3 Spin_Up_Time 0x0700 131 131 033 Old_age - 1 > 4 Start_Stop_Count 0x1200 097 097 000 Old_age - 5505 > 5 Reallocated_Sector_Ct 0x3300 100 100 005 Old_age - 0 > 7 Seek_Error_Rate 0x0b00 100 100 067 Old_age - 0 > 8 Seek_Time_Performance 0x0500 100 100 040 Old_age - 0 > 9 Power_On_Hours 0x1200 097 097 000 Old_age - 1743 > 10 Spin_Retry_Count 0x1300 100 100 060 Old_age - 0 > 12 Power_Cycle_Count 0x3200 100 100 000 Old_age - 1444 > 191 G-Sense_Error_Rate 0x0a00 100 100 000 Old_age - 0 > 192 Power-Off_Retract_Count 0x3200 099 099 000 Old_age - 278 > 193 Load_Cycle_Count 0x1200 093 093 000 Old_age - 73844 > 194 Temperature_Celsius 0x0200 161 161 000 Old_age - 34 (Lifetime Min/Max 11/54) Is this a recent IBM disk??? [By the way, it's been run really hot at some point in the past -- 54 C!] The fact that all the attributes are Old_age ones looks wrong. I think this is a 8-bit misalignment problem again. > 196 Reallocated_Event_Count 0x3200 100 100 000 Old_age - 0 > 197 Current_Pending_Sector 0x2200 100 100 000 Old_age - 0 > 198 Offline_Uncorrectable 0x0800 100 100 000 Old_age - 0 > 199 UDMA_CRC_Error_Count 0x0a00 200 200 000 Old_age - 0 > > SMART Error Log Version: 1 > No Errors Logged OK, this looks normal. > > SMART Self-test log, version number 256 > Warning - structure revision number does not match spec! Again, I think that this is 8 bits misaligned. Did you remove the attribute "__packed__" from the data structures? If so this may be causing your problems. Another solution is to change all the unsigned char unsigned short from atacmds.h, and change them to __u8 __u16 > No self-tests have been logged As soon as you have the -X and -S commands working, you should do some self-tests to the disk so that we can see what the self-test log looks like. Bottom line: It looks good! I think that one of the data structures in atacmds.h has gotten misaligned because of your compiler/compiler-options/edits. Suggestions to cure it: (1) make sure attribute __packed__ is there (2) verify that sizes are as above (3) if needed change unsigned char to __u8, unsigned short to __u16. We can/should do this for all platforms. Just need to include linux/types.h By the way, if this is an IBM disk, you will find excellent documentation in the IBM OEM manual. But be careful -- the IBM disk is a superset of the ATA smart command set. So some of the commands/data described in that manual will not work for all ATA smart devices. Bruce |
From: Bruce A. <ba...@gr...> - 2002-11-29 07:32:19
|
Hi Peter, Your code is very nice. Here are some comments on your sources in atacmds.c. These are mostly about things that we can both do to make the source read better (eg, I also make changes to come more in line with how you have to do things). ==== COMMENT ONE==== // compute checksum if ((*interface)->SMARTValidateReadData(interface, (ATASMARTData *)data)) checksumwarning("SMART Attribute Data Structure"); In the long run, we should probably use identical checksumming code. It will simplify the code base and reduce the amount of changes. There is no need for you to call SMARTValidateReadData(). The checksum is defined by the ATA standard, so you don't need to call this routine. Let's define this: int checksum(unsigned char *data){ unsigned char chksum=0; int i; for (i=0; i<512; i++) chksum+=data[i]; return chksum; } in atacmds.c. Then for example the code would read like this: // Reads SMART attributes into *data int ataReadSmartValues(int device, struct ata_smart_values *data){ #ifdef MOSX IOATASMARTInterface** interface = (IOATASMARTInterface**)device; if ((*interface)->SMARTReadData(interface, (ATASMARTData *)data)){ syserror("Error SMART Values Read failed"); return -1; } #else unsigned char buf[HDIO_DRIVE_CMD_HDR_SIZE+ATA_SMART_SEC_SIZE]= {WIN_SMART, 0, SMART_READ_VALUES, 1, }; if (ioctl(device,HDIO_DRIVE_CMD,buf)){ syserror("Error SMART Values Read failed"); return -1; } // copy data to user space memcpy(data,buf+HDIO_DRIVE_CMD_HDR_SIZE,ATA_SMART_SEC_SIZE); #endif // verify that checksum vanishes if (checksum(unsigned char *data)) checksumwarning("SMART Attribute Data Structure"); return 0; } I appreciate that you left my code untouched, and I have done something slightly different here (copied to user space BEFORE doing the checksum, as your code does). This is safe, since after finding the checksum value, my code either terminates in checksumwarning() [ so copying to user data space is irrelevant] or just continues on and does the copy anyway. Do you agree that this is a bit simpler? This applies to all the commands where a checksum is calculated. === COMMENT TWO ==== assert(sizeof(int) == sizeof(int *)); I asked you to do this, but am wondering. Would it be better to define a myassert() function that prints out a nice error warning message explaining to the puzzled user/developer what the heck is going on? Should we do: myassert(sizeof(int),"int type",sizeof(void *), "pointer type"); which prints a very understandable warning? On second thought, perhaps this is just not worth the trouble... any developer will figure out what's going on in a few seconds anyway... === COMMENT THREE === If for simplicity you want to merge the ataDisableSmart() ataEnableSmart() functions by adding a second 0/1 or enum on,off argument, that would be fine with me. Likewise for the other enable/disable functions. Again, it would simplify the code base for everyone. === COMMENT FOUR === ataEnableAutoOffline/ataDisableAutoOffline If you look at the comments in the latest version of atacmds.c you'll see an explanation. These are NOT part of the ATA standard, but are well supported and documented by IBM and some others (Maxtor, I think). I suggest you either ask the IOKit people to add them (trivial) or if not, just add an #ifdef MOSX pout("AutoOffline is not in the ATA standard and is not implemented by IOKit/MacOSX/Darwin"); #else existing code #endif === COMMENT FIVE === Just checking -- your code calls ataSmartStatus2, which because of the absent #ifdef HDIO_DRIVE_TASK calls ataSmartStatus(), which calls your version. Right?? [Comment to myself: I really ought to nuke ataSmartStatus() -- it's not used anywhere else -- in which case your code should be in ataSmartStatus2(int device) instead!] === END OF COMMENTS === Bottom line -- it looks great! Bruce |