From: Jeffrey J. K. <bac...@ko...> - 2011-12-22 23:49:29
|
JP Vossen wrote at about 21:50:29 -0500 on Wednesday, December 21, 2011: > I'm running Debian Squeeze stock backuppc-3.1.0-9 on a server and I'm > getting kernel messages [1] and SMART errors [2] about the WD 2TB SATA > disk. Fine, I RMA'd it and have the new one... Now what? I know I can > either 'dd' or start fresh. But... > > > If I start fresh, I know everything will be work and be valid, but I > lose my historical backups when I wipe the bad disk and RMA it. > > > If I 'ddrescue' BAD --> GOOD, I'll worry about the integity of the > BackupPC store. As I understand it, the incoming files are hashed and > stored, but the store itself is never checked (true?). So when I do > backups, if an incoming file hash matches a file already in the store, > the incoming file is "de-duped" and dropped. But what if the file > actually in the store is corrupt due to the bad disk? If the hash of a new file matches the hash of an existing pool file then the contents are compared since there is always the possibility of a hash collision since the file hash is a partial file md5sum that is based on the first and last 128K slice plus the filesize. > > Am I correct? If so, is there a way to have BackupPC validate that the > files in the pool actually match their hash and weren't mangled by the disk? Of course, there is no guarantee that the pool files themselves are not corrupt. Checking the files against their pool file name hash can rule out some file corruption but if the file size is unchanged and the corruption is not in the first or last 128K slice then the hash will be unchanged so any corruption won't be detectable. That being said, I have written several routines to both check and fix the pool for corruption relative to the partial file md5sum pool file name hash. Please search the archives where I have discussed and posted the code... Note that there have been bugs in BackupPC itself and also in various pool libraries (specifically on arm5 processors) that cause relatively innocuous errors in the pool file names relative to the actual intended partial file md5sum hash. > > > Any other solution I'm missing? > > Thanks, > JP > ___________________________________________ > [1] Example kernel errors: > > Security Events for kernel > =-=-=-=-=-=-=-=-=-=-=-=-=- > kernel: [4020993.728571] end_request: I/O error, dev sda, sector 81203507 > kernel: [4021009.712952] end_request: I/O error, dev sda, sector 81203507 > > System Events > =-=-=-=-=-=-= > kernel: [4020983.471256] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 > action 0x0 > kernel: [4020983.471290] ata3.00: BMDMA stat 0x25 > kernel: [4020983.471315] ata3.00: failed command: READ DMA > kernel: [4020983.471347] ata3.00: cmd > c8/00:18:33:11:d7/00:00:00:00:00/e4 tag 0 dma 12288 in > kernel: [4020983.471351] res > 51/40:07:33:11:d7/40:00:28:00:00/e4 Emask 0x9 (media error) > kernel: [4020983.471424] ata3.00: status: { DRDY ERR } > kernel: [4020983.471446] ata3.00: error: { UNC } > kernel: [4020983.501157] ata3.00: configured for UDMA/133 > > > [2] Example SMART error: > > Error 1704 occurred at disk power-on lifetime: 10149 hours (422 days + > 21 hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 40 45 66 01 e0 Error: UNC 64 sectors at LBA = 0x00016645 = 91717 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 40 3f 66 01 e0 08 46d+13:36:50.242 READ DMA > ec 00 00 00 00 00 a0 08 46d+13:36:50.233 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 46d+13:36:50.225 SET FEATURES [Set transfer > mode] > > ----------------------------|:::======|------------------------------- > JP Vossen, CISSP |:::======| http://bashcookbook.com/ > My Account, My Opinions |=========| http://www.jpsdomain.org/ > ----------------------------|=========|------------------------------- > "Microsoft Tax" = the additional hardware & yearly fees for the add-on > software required to protect Windows from its own poorly designed and > implemented self, while the overhead incidentally flattens Moore's Law. > > ------------------------------------------------------------------------------ > Write once. Port to many. > Get the SDK and tools to simplify cross-platform app development. Create > new or port existing apps to sell to consumers worldwide. Explore the > Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join > http://p.sf.net/sfu/intel-appdev > _______________________________________________ > BackupPC-users mailing list > Bac...@li... > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: http://backuppc.wiki.sourceforge.net > Project: http://backuppc.sourceforge.net/ |