[r4221]: trunk / www / badblockhowto.xml  Maximize  Restore  History

Download this file

1291 lines (1231 with data), 47.7 kB

   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" >
<!--
This is DocBook XML that can be rendered into a single HTML page with a
command like 'xmlto html-nochunks <this_file_name>'. It can
also be rendered into multi-page HTML (drop the "-nochunks") or pdf,
ps, txt, etc.
-->
<article id="index">
<articleinfo>
<title>Bad block HOWTO for smartmontools</title>
<author>
<firstname>Bruce</firstname>
<surname>Allen</surname>
<affiliation>
<address>
<email>smartmontools-support@lists.sourceforge.net</email>
</address>
</affiliation>
</author>
<authorinitials>ba</authorinitials>
<author>
<firstname>Douglas</firstname>
<surname>Gilbert</surname>
<affiliation>
<address>
<email>smartmontools-support@lists.sourceforge.net</email>
</address>
</affiliation>
</author>
<authorinitials>dpg</authorinitials>
<pubdate>2007-01-23</pubdate>
<revhistory>
<revision>
<revnumber>1.1</revnumber>
<date>2007-01-23</date>
<authorinitials>dpg</authorinitials>
<revremark>
add sections on ReiserFS and partition table damage
</revremark>
</revision>
<revision>
<revnumber>1.0</revnumber>
<date>2006-11-14</date>
<authorinitials>dpg</authorinitials>
<revremark>
merge BadBlockHowTo.txt and BadBlockSCSIHowTo.txt
</revremark>
</revision>
</revhistory>
<copyright>
<year>2004</year>
<year>2005</year>
<year>2006</year>
<year>2007</year>
<holder>Bruce Allen</holder>
</copyright>
<legalnotice>
<para>
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with no Invariant Sections, with no Front-Cover Texts, and with
no Back-Cover Texts.
</para>
<para>
For an online copy of the license see
<ulink url="http://www.fsf.org/copyleft/fdl.html">
<literal>www.fsf.org/copyleft/fdl.html</literal></ulink>.
</para>
</legalnotice>
<abstract>
<para>
This article describes what actions might be taken when smartmontools
detects a bad block on a disk. It demonstrates how to identify the file
associated with an unreadable disk sector, and how to force that sector
to reallocate.
</para>
</abstract>
</articleinfo>
<!--
<toc></toc>
-->
<sect1 id="intro">
<title>Introduction</title>
<para>
Handling bad blocks is a difficult problem as it often involves
decisions about losing information. Modern storage devices tend
to handle the simple cases automatically, for example by writing
a disk sector that was read with difficulty to another area on
the media. Even though such a remapping can be done by a disk
drive transparently, there is still a lingering worry about media
deterioration and the disk running out of spare sectors to remap.
</para>
<para>
Can smartmontools help? As the <acronym>SMART</acronym> acronym
<footnote><para>
Self-Monitoring, Analysis and Reporting Technology -> SMART
</para></footnote>
suggests, the <command>smartctl</command> command and the
<command>smartd</command> daemon concentrate on monitoring and analysis.
So apart from changing some reporting settings, smartmontools will not
modify the raw data in a device. Also smartmontools only works with
physical devices, it does not know about partitions and file systems.
So other tools are needed. The job of smartmontools is to alert the user
that something is wrong and user intervention may be required.
</para>
<para>
When a bad block is reported one approach is to work out the mapping between
the logical block address used by a storage device and a file or some other
component of a file system using that device. Note that there may not be such
a mapping reflecting that a bad block has been found at a location not
currently used by the file system. A user may want to do this analysis to
localize and minimize the number of replacement files that are retrieved from
some backup store. This approach requires knowledge of the file system
involved and this document uses the Linux ext2/ext3 and ReiserFS file systems
for examples. Also the type of content may come into play. For example if
an area storing video has a corrupted sector, it may be easiest to accept
that a frame or two might be corrupted and instruct the disk not to retry
as that may have the visual effect of causing a momentary blank into a 1
second pause (while the disk retries the faulty sector, often accompanied
by a telltale clicking sound).
</para>
<para>
Another approach is to ignore the upper level consequences (e.g. corrupting
a file or worse damage to a file system) and use the facilities offered by
a storage device to repair the damage. The SCSI disk command set is used
elaborate on this low level approach.
</para>
</sect1>
<sect1 id="rfile">
<title>Repairs in a file system</title>
<para>
This section contains examples of what to do at the file system level
when smartmontools reports a bad block. These examples assume the Linux
operating system and either the ext2/ext3 or ReiserFS file system. The
various Linux commands shown have man pages and the reader is encouraged
to examine these. Of note is the <command>dd</command> command which is
often used in repair work
<footnote><para>
Starting with GNU coreutils release 5.3.0, the <command>dd</command>
command in Linux includes the options 'iflag=direct' and 'oflag=direct'.
Using these with the <command>dd</command> commands should be helpful,
because adding these flags should avoid any interaction
with the block buffering IO layer in Linux and permit direct reads/writes
from the raw device. Use <command>dd --help</command> to see if your
version of dd supports these options. If not, the latest code for dd
can be found at <ulink url="http://alpha.gnu.org/gnu/coreutils">
<literal>alpha.gnu.org/gnu/coreutils</literal></ulink>.
</para></footnote>
and has a unique command line syntax.
</para>
<para>
The authors would like to thank Sergey Vlasov, Theodore Ts'o,
Michael Bendzick, and others for explaining this approach. The authors would
like to add text showing how to do this for other file systems, in
particular XFS, and JFS: please email if you can provide this
information.
</para>
<sect2 id="e2_example1">
<title>ext2/ext3 first example</title>
<para>
In this example, the disk is failing self-tests at Logical Block
Address LBA = 0x016561e9 = 23421417. The LBA counts sectors in units
of 512 bytes, and starts at zero.
</para>
<para>
<programlisting>
root]# smartctl -l selftest /dev/hda:
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 217 0x016561e9
</programlisting>
Note that other signs that there is a bad sector on the disk can be
found in the non-zero value of the Current Pending Sector count:
<programlisting>
root]# smartctl -A /dev/hda
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 1
</programlisting>
</para>
<para>
First Step: We need to locate the partition on which this sector of
the disk lives:
<programlisting>
root]# fdisk -lu /dev/hda
Disk /dev/hda: 123.5 GB, 123522416640 bytes
255 heads, 63 sectors/track, 15017 cylinders, total 241254720 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 63 4209029 2104483+ 83 Linux
/dev/hda2 4209030 5269319 530145 82 Linux swap
/dev/hda3 5269320 238227884 116479282+ 83 Linux
/dev/hda4 238227885 241248104 1510110 83 Linux
</programlisting>
The partition <filename>/dev/hda3</filename> starts at LBA 5269320 and
extends past the 'problem' LBA. The 'problem' LBA is offset
23421417 - 5269320 = 18152097 sectors into the partition
<filename>/dev/hda3</filename>.
</para>
<para>
To verify the type of the file system and the mount point, look in
<filename>/etc/fstab</filename>:
<programlisting>
root]# grep hda3 /etc/fstab
/dev/hda3 /data ext2 defaults 1 2
</programlisting>
You can see that this is an ext2 file system, mounted at
<filename>/data</filename>.
</para>
<para>
Second Step: we need to find the block size of the file system
(normally 4096 bytes for ext2):
<programlisting>
root]# tune2fs -l /dev/hda3 | grep Block
Block count: 29119820
Block size: 4096
</programlisting>
In this case the block size is 4096 bytes.
Third Step: we need to determine which File System Block contains this
LBA. The formula is:
<programlisting>
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.
</programlisting>
In our example, L=23421417, S=5269320, and B=4096. Hence the
'problem' LBA is in block number
<programlisting>
b = (int)18152097*512/4096 = (int)2269012.125
so b=2269012.
</programlisting>
</para>
<para>
Note: the fractional part of 0.125 indicates that this problem LBA is
actually the second of the eight sectors that make up this file system
block.
</para>
<para>
Fourth Step: we use debugfs to locate the inode stored in this block,
and the file that contains that inode:
<programlisting>
root]# debugfs
debugfs 1.32 (09-Nov-2002)
debugfs: open /dev/hda3
debugfs: testb 2269012
Block 2269012 not in use
</programlisting>
If the block is not in use, as in the above example, then you can skip
the rest of this step and go ahead to Step Five.
</para>
<para>
If, on the other hand, the block is in use, we want to identify
the file that uses it:
</para>
<programlisting>
debugfs: testb 2269012
Block 2269012 marked in use
debugfs: icheck 2269012
Block Inode number
2269012 41032
debugfs: ncheck 41032
Inode Pathname
41032 /S1/R/H/714197568-714203359/H-R-714202192-16.gwf
</programlisting>
<para>
In this example, you can see that the problematic file (with the mount
point included in the path) is:
<filename>/data/S1/R/H/714197568-714203359/H-R-714202192-16.gwf</filename>
</para>
<para>
When we are working with an ext3 file system, it may happen that the
affected file is the journal itself. Generally, if this is the case,
the inode number will be very small. In any case, debugfs will not
be able to get the file name:
<programlisting>
debugfs: testb 2269012
Block 2269012 marked in use
debugfs: icheck 2269012
Block Inode number
2269012 8
debugfs: ncheck 8
Inode Pathname
debugfs:
</programlisting>
</para>
<para>
To get around this situation, we can remove the journal altogether:
<programlisting>
tune2fs -O ^has_journal /dev/hda3
</programlisting>
and then start again with Step Four: we should see this time that the
wrong block is not in use any more. If we removed the journal file, at
the end of the whole procedure we should remember to rebuild it:
<programlisting>
tune2fs -j /dev/hda3
</programlisting>
</para>
<para>
Fifth Step
<emphasis>NOTE:</emphasis> This last step will <emphasis>permanently
</emphasis> and irretrievably <emphasis>destroy</emphasis> the contents
of the file system block that is damaged: if the block was allocated to
a file, some of the data that is in this file is going to be overwritten
with zeros. You will not be able to recover that data unless you can
replace the file with a fresh or correct version.
</para>
<para>
To force the disk to reallocate this bad block we'll write zeros to
the bad block, and sync the disk:
<programlisting>
root]# dd if=/dev/zero of=/dev/hda3 bs=4096 count=1 seek=2269012
root]# sync
</programlisting>
</para>
<para>
Now everything is back to normal: the sector has been reallocated.
Compare the output just below to similar output near the top of this
article:
<programlisting>
root]# smartctl -A /dev/hda
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 1
</programlisting>
Note: for some disks it may be necessary to update the SMART Attribute values by using
<command>smartctl -t offline /dev/hda</command>
</para>
<para>
We have corrected the first errored block. If more than one blocks
were errored, we should repeat all the steps for the subsequent ones.
After we do that, the disk will pass its self-tests again:
<programlisting>
root]# smartctl -t long /dev/hda [wait until test completes, then]
root]# smartctl -l selftest /dev/hda
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 239 -
# 2 Extended offline Completed: read failure 90% 217 0x016561e9
# 3 Extended offline Completed: read failure 90% 212 0x016561e9
# 4 Extended offline Completed: read failure 90% 181 0x016561e9
# 5 Extended offline Completed without error 00% 14 -
# 6 Extended offline Completed without error 00% 4 -
</programlisting>
</para>
<para>
and no longer shows any offline uncorrectable sectors:
<programlisting>
root]# smartctl -A /dev/hda
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
</programlisting>
</para>
</sect2>
<sect2 id="e2_example2">
<title>ext2/ext3 second example</title>
<para>
On this drive, the first sign of trouble was this email from smartd:
<programlisting>
To: ballen
Subject: SMART error (selftest) detected on host: medusa-slave166.medusa.phys.uwm.edu
This email was generated by the smartd daemon running on host:
medusa-slave166.medusa.phys.uwm.edu in the domain: master001-nis
The following warning/error was logged by the smartd daemon:
Device: /dev/hda, Self-Test Log error count increased from 0 to 1
</programlisting>
</para>
<para>
Running <command>smartctl -a /dev/hda</command> confirmed the problem:
<programlisting>
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 80% 682 0x021d9f44
Note that the failing LBA reported is 0x021d9f44 (base 16) = 35495748 (base 10)
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 3
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 3
</programlisting>
</para>
<para>
and one can see above that there are 3 sectors on the list of pending
sectors that the disk can't read but would like to reallocate.
</para>
<para>
The device also shows errors in the SMART error log:
<programlisting>
Error 212 occurred at disk power-on lifetime: 690 hours
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 12 46 9f 1d e2 Error: UNC 18 sectors at LBA = 0x021d9f46 = 35495750
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
25 00 12 46 9f 1d e0 00 2485545.000 READ DMA EXT
</programlisting>
</para>
<para>
Signs of trouble at this LBA may also be found in SYSLOG:
<programlisting>
[root]# grep LBA /var/log/messages | awk '{print $12}' | sort | uniq
LBAsect=35495748
LBAsect=35495750
</programlisting>
</para>
<para>
So I decide to do a quick check to see how many bad sectors there
really are. Using the bash shell I check 70 sectors around the trouble
area:
<programlisting>
[root]# export i=35495730
[root]# while [ $i -lt 35495800 ]
> do echo $i
> dd if=/dev/hda of=/dev/null bs=512 count=1 skip=$i
> let i+=1
> done
&lt;SNIP&gt;
35495734
1+0 records in
1+0 records out
35495735
dd: reading `/dev/hda': Input/output error
0+0 records in
0+0 records out
&lt;SNIP&gt;
35495751
dd: reading `/dev/hda': Input/output error
0+0 records in
0+0 records out
35495752
1+0 records in
1+0 records out
&lt;SNIP&gt;
</programlisting>
</para>
<para>
which shows that the seventeen sectors 35495735-35495751 (inclusive)
are not readable.
</para>
<para>
Next, we identify the files at those locations. The partitioning
information on this disk is identical to the first example above, and
as in that case the problem sectors are on the third partition
<filename>/dev/hda3</filename>. So we have:
<programlisting>
L=35495735 to 35495751
S=5269320
B=4096
</programlisting>
so that b=3778301 to 3778303 are the three bad blocks in the file
system.
<programlisting>
[root]# debugfs
debugfs 1.32 (09-Nov-2002)
debugfs: open /dev/hda3
debugfs: icheck 3778301
Block Inode number
3778301 45192
debugfs: icheck 3778302
Block Inode number
3778302 45192
debugfs: icheck 3778303
Block Inode number
3778303 45192
debugfs: ncheck 45192
Inode Pathname
45192 /S1/R/H/714979488-714985279/H-R-714979984-16.gwf
debugfs: quit
</programlisting>
Note that the first few steps of this procedure could also be done
with a single command, which is very helpful if there are many bad
blocks (thanks to Danie Marais for pointing this out):
<programlisting>
debugfs: icheck 3778301 3778302 3778303
</programlisting>
</para>
<para>
And finally, just to confirm that this is really the damaged file:
</para>
<para>
<programlisting>
[root]# md5sum /data/S1/R/H/714979488-714985279/H-R-714979984-16.gwf
md5sum: /data/S1/R/H/714979488-714985279/H-R-714979984-16.gwf: Input/output error
</programlisting>
</para>
<para>
Finally we force the disk to reallocate the three bad blocks:
<programlisting>
[root]# dd if=/dev/zero of=/dev/hda3 bs=4096 count=3 seek=3778301
[root]# sync
</programlisting>
</para>
<para>
We could also probably use:
<programlisting>
[root]# dd if=/dev/zero of=/dev/hda bs=512 count=17 seek=35495735
</programlisting>
</para>
<para>
At this point we now have:
<programlisting>
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
</programlisting>
</para>
<para>
which is encouraging, since the pending sectors count is now zero.
Note that the drive reallocation count has not yet increased: the
drive may now have confidence in these sectors and have decided not to
reallocate them..
</para>
<para>
A device self test:
<programlisting>
[root#] smartctl -t long /dev/hda
(then wait about an hour) shows no unreadable sectors or errors:
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 692 -
# 2 Extended offline Completed: read failure 80% 682 0x021d9f44
</programlisting>
</para>
</sect2>
<sect2 id="unassigned">
<title>Unassigned sectors</title>
<para>
This section was written by Kay Diederichs. Even though this section
assumes Linux and the ext2/ext3 file system, the strategy should be
more generally applicable.
</para>
<para>
I read your badblocks-howto at and greatly
benefited from it. One thing that's (maybe) missing is that often the
<command>smartctl -t long</command> scan finds a bad sector which is
<emphasis> not</emphasis> assigned to
any file. In that case it does not help to run debugfs, or rather
debugfs reports the fact that no file owns that sector. Furthermore,
it is somewhat laborious to come up with the correct numbers for
debugfs, and debugfs is slow ...
</para>
<para>
So what I suggest in the case of presence of
Current_Pending_Sector/Offline_Uncorrectable errors is to create a
huge file on that file system.
<programlisting>
dd if=/dev/zero of=/some/mount/point bs=4k
</programlisting>
creates the file. Leave it running until the partition/file system is
full. This will make the disk reallocate those sectors which do not
belong to a file. Check the <command>smartctl -a</command> output after
that and make
sure that the sectors are reallocated. If any remain, use the debugfs
method. Of course the usual caveats apply - back it up first, and so
on.
</para>
</sect2>
<sect2 id="reiserfs_ex">
<title>ReiserFS example</title>
<para>
This section was written by Joachim Jautz with additions from Manfred
Schwarb.
</para>
<para>
The following problems were reported during a scheduled test:
<programlisting>
smartd[575]: Device: /dev/hda, starting scheduled Offline Immediate Test.
[... 1 hour later ...]
smartd[575]: Device: /dev/hda, 1 Currently unreadable (pending) sectors
smartd[575]: Device: /dev/hda, 1 Offline uncorrectable sectors
</programlisting>
</para>
<para>
[Step 0] The SMART selftest/error log
(see <command>smartctl -l selftest</command>) indicated there was a problem
with block address (i.e. the 512 byte sector at) 58656333. The partition
table (e.g. see <command>sfdisk -luS /dev/hda</command> or
<command>fdisk -ul /dev/hda</command>) indicated that this block was in the
<filename>/dev/hda3</filename> partition which contained a ReiserFS file
system. That partition started at block address 54781650.
</para>
<para>
While doing the initial analysis it may also be useful to take a copy
of the disk attributes returned by <command>smartctl -A /dev/hda</command>.
Specifically the values associated with the "Reallocated_Sector_Ct" and
"Reallocated_Event_Count" attributes (for ATA disks, the grown list (GLIST)
length for SCSI disks). If these are incremented at the end of the procedure
it indicates that the disk has re-allocated one or more sectors.
</para>
<para>
[Step 1] Get the file system's block size:
<programlisting>
# debugreiserfs /dev/hda3 | grep '^Blocksize'
Blocksize: 4096
</programlisting>
</para>
<para>
[Step 2] Calculate the block number:
<programlisting>
# echo "(58656333-54781650)*512/4096" | bc -l
484335.37500000000000000000
</programlisting>
It is re-assuring that the calculated 4 KB damaged block address in
<filename>/dev/hda3</filename> is less than "Count of blocks on the
device" shown in the output of <command>debugreiserfs</command> shown above.
</para>
<para>
[Step 3] Try to get more info about this block =&gt; reading the block
fails as expected but at least we see now that it seems to be unused.
If we do not get the `Cannot read the block' error we should
check if our calculation in [Step 2] was correct ;)
<programlisting>
# debugreiserfs -1 484335 /dev/hda3
debugreiserfs 3.6.19 (2003 http://www.namesys.com)
484335 is free in ondisk bitmap
The problem has occurred looks like a hardware problem.
</programlisting>
</para>
<para>
If you have bad blocks, we advise you to get a new hard drive, because
once you get one bad block that the disk drive internals cannot hide from
your sight, the chances of getting more are generally said to become
much higher (precise statistics are unknown to us), and this disk
drive is probably not expensive enough for you to risk your
time and data on it. If you don't want to follow that
advice then if you have just a few bad blocks, try writing to the
bad blocks and see if the drive remaps the bad blocks (that means
it takes a block it has in reserve and allocates it for use for
of that block number). If it cannot remap the block, use
<command>badblock</command> option (-B) with reiserfs utils to handle
this block correctly.
<programlisting>
bread: Cannot read the block (484335): (Input/output error).
Aborted
</programlisting>
So it looks like we have the right (i.e. faulty) block address.
</para>
<para>
[Step 4] Try then to find the affected file
<footnote><para>
Do not use <command>tar -c -f /dev/null</command> or
<command>tar -cO /mydir >/dev/null</command>. GNU tar does not
actually read the files if <filename>/dev/null</filename> is used as
archive path or as standard output, see <command>info tar</command>.
</para></footnote>:
<programlisting>
tar -cO /mydir | cat >/dev/null
</programlisting>
If you do not find any unreadable files, then the block may be free or
located in some metadata of the file system.
</para>
<para>
[Step 5] Try your luck: bang the affected block with
<command>badblocks -n</command> (non-destructive read-write mode, do unmount
first), if you are very lucky the failure is transient and you can provoke
reallocation
<footnote><para>
Important: set blocksize range is arbitrary, but do not only test a single
block, as bad blocks are often social. Not too large as this test probably
has not 0% risk.
</para></footnote>:
<programlisting>
# badblocks -b 4096 -p 3 -s -v -n /dev/hda3 `expr 484335 + 100` `expr 484335 - 100`
</programlisting>
<footnote><para>
The rather awkward `expr 484335 + 100` (note the back quotes) can be replaced
with $((484335+100)) if the bash shell is being used. Similarly the last
argument can become $((484335-100)) .
</para></footnote>
</para>
<para>
check success with <command>debugreiserfs -1 484335 /dev/hda3</command>.
Otherwise:
</para>
<para>
[Step 6] Perform this step <emphasis>only</emphasis> if Step 5 has failed
to fix the problem: overwrite that block to force reallocation:
<programlisting>
# dd if=/dev/zero of=/dev/hda3 count=1 bs=4096 seek=484335
1+0 records in
1+0 records out
4096 bytes transferred in 0.007770 seconds (527153 bytes/sec)
</programlisting>
</para>
<para>
[Step 7] If you can't rule out the bad block being in metadata, do
a file system check:
<programlisting>
reiserfsck --check
</programlisting>
This could take a long time so you probably better go for lunch ...
</para>
<para>
[Step 8] Proceed as stated earlier. For example, sync disk and run a long
selftest that should succeed now.
</para>
</sect2>
</sect1>
<sect1 id="sdisk">
<title>Repairs at the disk level</title>
<para>
This section first looks at a damaged partition table. Then it ignores
the upper level impact of a bad block and just repairs the underlying
sector so that defective sector will not cause problems in the future.
</para>
<sect2 id="partition">
<title>Partition table problems</title>
<para>
Some software failures can lead to zeroes or random data being written
on the first block of a disk. For disks that use a DOS-based partitioning
scheme this will overwrite the partition table which is found at the
end of the first block. This is a single point of failure so after the
damage tools like <command>fdisk</command> have no alternate data to use
so they report no partitions or a damaged partition table.
</para>
<para>
One utility that may help is
<ulink url="http://www.cgsecurity.org/wiki/TestDisk">
<literal>testdisk</literal></ulink> which can scan a disk looking for
partitions and recreate a partition table if requested.
<footnote><para>
<command>testdisk</command> scans the media for the beginning of file
systems that it recognizes. It can be tricked by data that looks
like the beginning of a file system or an old file system from a
previous partitioning of the media (disk). So care should be taken.
Note that file systems should not overlap apart from the fact that
extended partitions lie wholly within a extended partition table
allocation. Also if the root partition of a Linux/Unix installation
can be found then the <filename>/etc/fstab</filename> file is a useful
resource for finding the partition numbers of other partitions.
</para></footnote>
</para>
<para>
Programs that create DOS partitions
often place the first partition at logical block address 63. In Linux
a loop back mount can be attempted at the appropriate offset of a disk
with a damaged partition table. This approach may involve placing the
disk with the damaged partition table in a working computer or perhaps
an external USB enclosure. Assuming the disk with the damaged partition
is <filename>/dev/hdb</filename>. Then the following read-only loop back
mount could be tried:
<programlisting>
# mount -r /dev/hdb -o loop,offset=32256 /mnt
</programlisting>
The offset is in bytes so the number given is (63 * 512). If the file
system cannot be identified then a '-t &lt;fs_type&gt;'
may be needed (although this is not a good sign). If this mount is
successful, a backup procedure is advised.
</para>
<para>
Only the primary DOS partitions are recorded in the first block of
a disk. The extended DOS partition table is placed elsewhere on
a disk. Again there is only one copy of it so it represents another
single point of failure. All DOS partition information can be
read in a form that can be used to recreate the tables with the
<command>sfdisk</command> command. Obviously this needs to be done
beforehand and the file put on other media. Here is how to fetch the
partition table information:
<programlisting>
# sfdisk -dx /dev/hda &gt; my_disk_partition_info.txt
</programlisting>
Then <filename>my_disk_partition_info.txt</filename> should be placed on
other media. If disaster strikes, then the disk with the damaged partition
table(s) can be placed in a working system, let us say the damaged disk is
now at <filename>/dev/hdc</filename>, and the following command restores
the partition table(s):
<programlisting>
# sfdisk -x -O part_block_prior.img /dev/hdc &lt; my_disk_partition_info.txt
</programlisting>
Since the above command is potentially destructive it takes a copy of the
block(s) holding the partition table(s) and puts it in
<filename>part_block_prior.img</filename> prior to any changes. Then it
changes the partition tables as indicated by
<filename>my_disk_partition_info.txt</filename>. For what it is worth the
author did test this on his system!
<footnote><para>
Thanks to Manfred Schwarb for the information about storing partition
table(s) beforehand.
</para></footnote>
</para>
<para>
For creating, destroying, resizing, checking and copying partitions, and
the file systems on them, GNU's
<ulink url="http://www.gnu.org/software/parted">
<literal>parted</literal></ulink> is worth examining.
The <ulink url="http://www.tldp.org/HOWTO/Large-Disk-HOWTO.html">
<literal>Large Disk HOWTO</literal></ulink> is also a useful resource.
</para>
</sect2>
<sect2 id="lvm">
<title>LVM repairs</title>
<para>
This section was written by Frederic BOITEUX. It was titled: "HOW TO
LOCATE AND REPAIR BAD BLOCKS ON AN LVM VOLUME".
</para>
<para>
Smartd reports an error in a short test�:
<programlisting>
# smartctl -a /dev/hdb
...
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 66 37383668
</programlisting>
So the disk has a bad block located in LBA block 37383668
</para>
<para>
In which physical partition is the bad block�?
<programlisting>
# sfdisk -luS /dev/hdb # or 'fdisk -ul /dev/hdb'
Disk /dev/hdb: 9729 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0
Device Boot Start End #sectors Id System
/dev/hdb1 63 996029 995967 82 Linux swap / Solaris
/dev/hdb2 * 996030 1188809 192780 83 Linux
/dev/hdb3 1188810 156296384 155107575 8e Linux LVM
/dev/hdb4 0 - 0 0 Empty
</programlisting>
It's in the <filename>/dev/hdb3</filename> partition, a LVM2 partition.
From the LVM2 partition beginning, the bad block has an offset of
<programlisting>
(37383668 - 1188810) = 36194858
</programlisting>
</para>
<para>
We have to find in which LVM2 logical partition the block belongs to.
</para>
<para>
In which logical partition is the bad block�?
</para>
<para>
<emphasis>IMPORTANT</emphasis>�: LVM2 can use different schemes dividing
its physical partitions to logical ones�: linear, striped, contiguous or
not... The following example assumes that allocation is linear�!
</para>
<para>
The physical partition used by LVM2 is divided in PE (Physical Extent)
units of the same size, starting at pe_start' 512 bytes blocks from
the beginning of the physical partition.
</para>
<para>
The 'pvdisplay' command gives the size of the PE (in KB) of the
LVM partition�:
<programlisting>
# part=/dev/hdb3�; pvdisplay -c $part | awk -F: '{print $8}'
4096
</programlisting>
</para>
<para>
To get its size in LBA block size (512 bytes or 0.5 KB), we multiply this
number by 2�: 4096 * 2 = 8192 blocks for each PE.
</para>
<para>
To find the offset from the beginning of the physical partition is a
bit more difficult�: if you have a recent LVM2 version, try�:
<programlisting>
# pvs -o+pe_start $part
</programlisting>
</para>
<para>
Either, you can look in /etc/lvm/backup�:
<programlisting>
# grep pe_start $(grep -l $part /etc/lvm/backup/*)
pe_start = 384
</programlisting>
</para>
<para>
Then, we search in which PE is the badblock, calculating the PE rank
in which the faulty block of the partition is�:
physical partition's bad block number / sizeof(PE) =
<programlisting>
36194858 / 8192 = 4418.3176
</programlisting>
</para>
<para>
So we have to find in which LVM2 logical partition is used the PE
number 4418 (count starts from 0)�:
<programlisting>
# lvdisplay --maps |egrep 'Physical|LV Name|Type'
LV Name /dev/WDC80Go/racine
Type linear
Physical volume /dev/hdb3
Physical extents 0 to 127
LV Name /dev/WDC80Go/usr
Type linear
Physical volume /dev/hdb3
Physical extents 128 to 1407
LV Name /dev/WDC80Go/var
Type linear
Physical volume /dev/hdb3
Physical extents 1408 to 1663
LV Name /dev/WDC80Go/tmp
Type linear
Physical volume /dev/hdb3
Physical extents 1664 to 1791
LV Name /dev/WDC80Go/home
Type linear
Physical volume /dev/hdb3
Physical extents 1792 to 3071
LV Name /dev/WDC80Go/ext1
Type linear
Physical volume /dev/hdb3
Physical extents 3072 to 10751
LV Name /dev/WDC80Go/ext2
Type linear
Physical volume /dev/hdb3
Physical extents 10752 to 18932
</programlisting>
</para>
<para>
So the PE #4418 is in the <filename>/dev/WDC80Go/ext1</filename>
LVM logical partition.
</para>
<para>
Size of logical block of file system on <filename>/dev/WDC80Go/ext1
</filename>�:
</para>
<para>
It's a ext3 fs, so I get it like this�:
<programlisting>
# dumpe2fs /dev/WDC80Go/ext1 | grep 'Block size'
dumpe2fs 1.37 (21-Mar-2005)
Block size: 4096
</programlisting>
</para>
<para>
bad block number for the file system�:
</para>
<para>
The logical partition begins on PE 3072�:
<programlisting>
(# PE's start of partition * sizeof(PE)) + parttion offset[pe_start] =
(3072 * 8192) + 384 = 25166208
</programlisting>
512b block of the physical partition, so the bad block number for the
file system� is�:
<programlisting>
(36194858 - 25166208) / (sizeof(fs block) / 512)
= 11028650 / (4096 / 512) = 1378581.25
</programlisting>
</para>
<para>
Test of the fs bad block�:
<programlisting>
dd if=/dev/WDC80Go/ext1 of=block1378581 bs=4096 count=1 skip=1378581
</programlisting>
</para>
<para>
If this dd command succeeds, without any error message in console or
syslog, then the block number calculation is probably wrong�! *Don't*
go further, re-check it and if you don't find the error, please
renounce�!
</para>
<para>
Search / correction follows the same scheme as for simple
partitions�:
<itemizedlist>
<listitem><para>
find possible impacted files with debugfs (icheck &lt;fs block nb&gt;,
then ncheck &lt;icheck nb&gt;).
</para></listitem>
<listitem><para>
reallocate bad block writing zeros in it, *using the fs block size*�:
</para></listitem>
</itemizedlist>
</para>
<para>
<programlisting>
dd if=/dev/zero of=/dev/WDC80Go/ext1 count=1 bs=4096 seek=1378581
</programlisting>
</para>
<para>
Et voil��!
</para>
</sect2>
<sect2 id="bb">
<title>Bad block reassignment</title>
<para>
The SCSI disk command set and associated disk architecture are assumed
in this section. SCSI disks have their own logical to physical mapping
allowing a damaged sector (usually carrying 512 bytes of data) to be
remapped irrespective of the operating system, file system or software
RAID being used.
</para>
<para>
The terms <emphasis>block</emphasis> and <emphasis>sector</emphasis> are
used interchangeably, although block tends to get used in higher level or
more abstract contexts such as a <emphasis>logical block</emphasis>.
</para>
<para>
When a SCSI disk is formatted, defective sectors identified during
the manufacturing process (the so called primary list: PLIST),
those found during the format itself (the certification list: CLIST),
those given explicitly to the format command (the DLIST) and optionally
the previous grown list (GLIST) are not used in the logical block
map. The number (and low level addresses) of the unmapped sectors can be
found with the READ DEFECT DATA SCSI command.
</para>
<para>
SCSI disks tend to be divided into zones which have spare sectors and
perhaps spare tracks, to support the logical block address mapping
process. The idea is that if a logical block is remapped, the heads do not
have to move a long way to access the replacement sector. Note that spare
sectors are a scarce resource.
</para>
<para>
Once a SCSI disk format has completed successfully, other problems
may appear over time. These fall into two categories:
<itemizedlist>
<listitem><para>
recoverable: the Error Correction Codes (ECC) detect a problem
but it is small enough to be corrected. Optionally other strategies
such as retrying the access may retrieve the data.
</para></listitem>
<listitem><para>
unrecoverable: try as it may, the disk logic and ECC algorithms
cannot recover the data. This is often reported as a
<emphasis>medium error</emphasis>.
</para></listitem>
</itemizedlist>
</para>
<para>
Other things can go wrong, typically associated with the transport and
they will be reported using a term other than
<emphasis>medium error</emphasis>. For example a disk may decide a read
operation was successful but a computer's host bus adapter (HBA) checking
the incoming data detects a CRC error due to a bad cable or termination.
</para>
<para>
Depending on the disk vendor, recoverable errors can be ignored. After all,
some disks have up to 68 bytes of ECC above the payload size of 512 bytes
so why use up spare sectors which are limited in number
<footnote><para>
Detecting and fixing an error with ECC "on the fly" and not going the further
step and reassigning the block in question may explain why some disks have
large numbers in their read error counter log. Various worried users have
reported large numbers in the "errors corrected without substantial delay"
counter field which is in the "Errors corrected by ECC fast" column in
the <command>smartctl -l error</command> output.
</para></footnote>
?
If the disk can recover the data and does decide to re-allocate (reassign)
a sector, then first it checks the settings of the ARRE and AWRE bits in the
read-write error recovery mode page. Usually these bits are set
<footnote><para>
Often disks inside a hardware RAID have the ARRE and AWRE bits
cleared (disabled) so the RAID controller can do things manually or flag
the disk for replacement.
</para></footnote>
enabling automatic (read or write) re-allocation. The automatic
re-allocation may also fail if the zone (or disk) has run out of spare
sectors.
</para>
<para>
Another consideration with RAIDs, and applications that require a high
data rate without pauses, is that the controller logic may not want a
disk to spend too long trying to recover an error.
</para>
<para>
Unrecoverable errors will cause a <emphasis>medium error</emphasis> sense
key, perhaps with some useful additional sense information. If the extended
background self test includes a full disk read scan, one would expect the
self test log to list the bad block, as shown in the <xref linkend="rfile"/>.
Recent SCSI disks with a periodic background scan should also list
unrecoverable read errors (and some recoverable errors as well). The
advantage of the background scan is that it runs to completion while self
tests will often terminate at the first serious error.
</para>
<para>
SCSI disks expect unrecoverable errors to be fixed manually using the
REASSIGN BLOCKS SCSI command since loss of data is involved. It is possible
that an operating system or a file system could issue the REASSIGN BLOCKS
command itself but the authors are unaware of any examples. The REASSIGN BLOCKS
command will reassign one or more blocks, attempting to (partially ?) recover
the data (a forlorn hope at this stage), fetch an unused spare sector from the
current zone while adding the damaged old sector to the GLIST (hence the
name "grown" list). The contents of the GLIST may not be that interesting
but <command>smartctl</command> prints out the number of entries in the grown
list and if that number grows quickly, the disk may be approaching the end
of its useful life.
</para>
<para>
Here is an alternate brute force technique to consider: if the data on the
SCSI or ATA disk has all been backed up (e.g. is held on the other disks in
a RAID 5 enclosure), then simply reformatting the disk may be the least
cumbersome approach.
</para>
<sect3 id="sexample">
<title>Example</title>
<para>
Given a "bad block", it still may be useful to look at the
<command>fdisk</command> command (if the disk has multiple partitions)
to find out which partition is involved, then use
<command>debugfs</command> (or a similar tool for the file system in
question) to find out which, if any, file or other part of the file system
may have been damaged. This is discussed in the <xref linkend="rfile"/>.
</para>
<para>
Then a program that can execute the REASSIGN BLOCKS SCSI command is
required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows
the author's <command>sg_reassign</command> utility in the sg3_utils
package can be used. Also found in that package is
<command>sg_verify</command> which can be used to check that a block is
readable.
</para>
<para>
Assume that logical block address 1193046 (which is 123456 in hex) is
corrupt
<footnote><para>
In this case the corruption was manufactured by using the WRITE LONG
SCSI command. See <command>sg_write_long</command> in sg3_utils.
</para></footnote>
on the disk at <filename>/dev/sdb</filename>. A long selftest command like
<command>smartctl -t long /dev/sdb</command> may result in log results
like this:
<programlisting>
# smartctl -l selftest /dev/sdb
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Failed in segment - 354 1193046 [0x3 0x11 0x0]
# 2 Background short Completed - 323 - [- - -]
# 3 Background short Completed - 194 - [- - -]
</programlisting>
</para>
<para>
The <command>sg_verify</command> utility can be used to confirm that there
is a problem at that address:
<programlisting>
# sg_verify --lba=1193046 /dev/sdb
verify (10): Fixed format, current; Sense key: Medium Error
Additional sense: Unrecovered read error
Info fld=0x123456 [1193046]
Field replaceable unit code: 228
Actual retry count: 0x008b
medium or hardware error, reported lba=0x123456
</programlisting>
</para>
<para>
Now the GLIST length is checked before the block reassignment:
<programlisting>
# sg_reassign --grown /dev/sdb
>> Elements in grown defect list: 0
</programlisting>
</para>
<para>
And now for the actual reassignment followed by another check of the GLIST
length:
<programlisting>
# sg_reassign --address=1193046 /dev/sdb
# sg_reassign --grown /dev/sdb
>> Elements in grown defect list: 1
</programlisting>
</para>
<para>
The GLIST length has grown by one as expected. If the disk was unable to
recover any data, then the "new" block at lba 0x123456 has vendor specific
data in it. The <command>sg_reassign</command> utility can also do bulk
reassigns, see <command>man sg_reassign</command> for more information.
</para>
<para>
The <command>dd</command> command could be used to read the contents of
the "new" block:
<programlisting>
# dd if=/dev/sdb iflag=direct skip=1193046 of=blk.img bs=512 count=1
</programlisting>
</para>
<para>
and a hex editor
<footnote><para>
Most window managers have a handy calculator that will do hex to
decimal conversions. More work may be needed at the file system level,
</para></footnote>
used to view and potentially change the
<filename>blk.img</filename> file. An altered <filename>blk.img</filename>
file (or <filename>/dev/zero</filename>) could be written back with:
<programlisting>
# dd if=blk.img of=/dev/sdb seek=1193046 oflag=direct bs=512 count=1
</programlisting>
</para>
<para>
More work may be needed at the file system level, especially if the
reassigned block held critical file system information such as
a superblock or a directory.
</para>
<para>
Even if a full backup of the disk is available, or the disk has been
"ejected" from a RAID, it may still be worthwhile to reassign the bad
block(s) that caused the problem (or simply format the disk (see
<command>sg_format</command> in the sg3_utils package)) and re-use the
disk later (not unlike the way a replacement disk from a manufacturer
might be used).
</para>
<para>
$Id: badblockhowto.xml 2873 2009-08-11 21:46:20Z dipohl $
</para>
</sect3>
</sect2>
</sect1>
<!--
<appendix id="appendix">
<title>annex a</title>
<sect1 id="what">
<title>what</title>
<para>
dummy
</para>
<para>
$Id: badblockhowto.xml 2873 2009-08-11 21:46:20Z dipohl $
</para>
</sect1>
</appendix>
-->
</article>