Recovering from a dead disk in a Linux software-RAID5 system using mdadm

RAID5 failure

As I wrote quite a while ago, I set up a RAID5 with three
IDE disks at home, which I'm using as backup (yes, I know that
RAID != backup) and storage space.

A few days ago, the RAID was put to a real-life test for the first time, as one of the disks died. Here's what that looks like in dmesg:

raid5: raid level 5 set md1 active with 3 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:3
 disk 0, o:1, dev:hda2
 disk 1, o:1, dev:hdg2
 disk 2, o:1, dev:hde2
[...]
hdg: dma_timer_expiry: dma status == 0x21
hdg: DMA timeout error
hdg: 4 bytes in FIFO
hdg: dma timeout error: status=0x50 { DriveReady SeekComplete }
ide: failed opcode was: unknown
hdg: dma_timer_expiry: dma status == 0x21
hdg: DMA timeout error
hdg: 252 bytes in FIFO
hdg: dma timeout error: status=0x50 { DriveReady SeekComplete }
ide: failed opcode was: unknown
hdg: dma_timer_expiry: dma status == 0x21
hdg: DMA timeout error
hdg: 252 bytes in FIFO
hdg: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdg: DMA disabled
ide3: reset: success
hdg: dma_timer_expiry: dma status == 0x21
hdg: DMA timeout error
hdg: 252 bytes in FIFO
hdg: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdg: DMA disabled
ide3: reset: success
hdg: status timeout: status=0x80 { Busy }
ide: failed opcode was: 0xea
hdg: drive not ready for command
hdg: lost interrupt
hdg: task_out_intr: status=0x50 { DriveReady SeekComplete }
ide: failed opcode was: unknown
hdg: lost interrupt
hdg: task_out_intr: status=0x50 { DriveReady SeekComplete }
ide: failed opcode was: unknown

That's when I realized that something was horribly wrong.

Not long after that, these messages appeared in dmesg. As you can see the software-RAID automatically realized that a drive died and removed the faulty disk from the array. I did not lose any data, and the system did not freeze up; I could continue working as if nothing happened (as it should be).

 md: super_written gets error=-5, uptodate=0
 raid5: Disk failure on hdg2, disabling device.
 raid5: Operation continuing on 2 devices.
 RAID5 conf printout:
  --- rd:3 wd:2
  disk 0, o:1, dev:hda2
  disk 1, o:0, dev:hdg2
  disk 2, o:1, dev:hde2
 RAID5 conf printout:
  --- rd:3 wd:2
  disk 0, o:1, dev:hda2
  disk 2, o:1, dev:hde2

This is how you can check the current RAID status:

 $ cat /proc/mdstat
 Personalities : [raid6] [raid5] [raid4] 
 md1 : active raid5 hda2[0] hde2[2] hdg2[3](F)
       584107136 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]

The "U_U" means two of the disks are OK, and one is faulty/removed. The desired state is "UUU", which means all three disks are OK.

The next steps are to replace the dead drive with a new one, but first you should know exactly which disk you need to remove (in my case: hda, hde, or hdg). If you remove the wrong one, you're screwed. The RAID will be dead and all your data will be lost (RAID5 can survive only one dead disk at a time).

The safest way (IMHO) to know which disk to remove is to write down the serial number of the disk, e.g. using smartctl, and then check the back side of each disk for the matching serial number.

 $ smartctl -i /dev/hda | grep Serial
 $ smartctl -i /dev/hde | grep Serial
 $ smartctl -i /dev/hdg | grep Serial

(ideally you should get the serial numbers before one of the disks dies)

Now power down the PC and remove the correct drive. Get a new drive which is at least as big as the one you removed. As this is software-RAID you have quite a lot of flexibility; the new drive doesn't have to be from the same vendor / series, it doesn't even have to be of the same type (e.g. I got a SATA disk instead of another IDE one).

Insert the drive into some other PC in order to partition it correctly (e.g. using fdisk or cfdisk). In my case I needed a 1 GB /boot partition for GRUB, and the rest of the drive is another partition of the type "Linux RAID auto", which the software-RAID will then recognize.

Then, put the drive into the RAID PC and power it up. After a successful boot (remember, 2 out of 3 disks in RAID5 are sufficient for a working system) you'll have to hook-up the new drive into the RAID:

 $ mdadm --manage /dev/md1 --add /dev/sda2
 mdadm: added /dev/sda2

My new SATA drive ended up being /dev/sda2, which I added using mdadm. The RAID immediately starts restoring/resyncing all data on that drive, which may take a while (2-3 hours, depends on the RAID size and some other factors). You can check the current progress with:

 $ cat /proc/mdstat 
 Personalities : [raid6] [raid5] [raid4] 
 md1 : active raid5 sda2[3] hda2[0] hde2[2]
       584107136 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
       [>....................]  recovery =  0.1% (473692/292053568) finish=92.3min speed=52632K/sec

As soon as this process is finished you'll see this in dmesg:

 md: md1: recovery done.
 RAID5 conf printout:
  --- rd:3 wd:3
  disk 0, o:1, dev:hda2
  disk 1, o:1, dev:sda2
  disk 2, o:1, dev:hde2

In /proc/mdstat you'll see "UUU" again, which means your RAID is fully functional and redundant (with three disks) again. Yay.

 $ cat /proc/mdstat
 Personalities : [raid6] [raid5] [raid4] 
 md1 : active raid5 sda2[1] hda2[0] hde2[2]
       584107136 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Btw, another nice utility you might find useful is hddtemp, which can check the temperature of the drives. You should take care that they don't get too hot, especially so if the RAID runs 24/7.

 $ hddtemp /dev/hda
 dev/hda: SAMSUNG HD300LD: 38 °C
 $ hddtemp /dev/hde
 dev/hde: SAMSUNG HD300LD: 44 °C
 $ hddtemp /dev/sda
 dev/sda: SAMSUNG HD322HJ: 32 °C

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Great tutorial, easy and very helpful

Thanks for this tutorial, I was looking for this information on several different occasions and never found anything simple enough for me. Tonight I found this, I made some notes and added it to my bookmarks. Good stuff!

need help determining the partition size of working disks.

I am the lucky person to take over the maintenance of an existing RAID5 array. Recently I realized that sda was dead. Obviously I have to replace the disk with a new working one. The problem is, I need to partition the new disk before I can integrate it in the RAID5 volume. I believe that the partition on new disk has to be exactly the same as on others disks. But how do I find the partition information (size and type) on the other working disks?
Thanks in advance.

*fdisk*

There's a number of tools for doing that, including fdisk, cfdisk, or sfdisk, I think.

Whooww you are great

Shit when i first read this it took me a while to understand it. Now i do and i respect your method. I can be very handy in the future if it would ever happen to me (hope not)

The dangers of RAID

Oftentimes a drive will fail in a redundant system, but the user will be oblivious to the failure because there is another working disk. This is sort of what raid is supposed to do, but the single disk isn't supposed to be going it alone.

partitioning and fans and the evilosity of raid 5

re: partitioning:

just wondering why you think you need to put the new drive into another compuiter to partition it properly? you can partition it in the RAID machine just before running the mdadm command.

in fact, i've found it very handy to do so - you can query the partition tables of the other drives in the raid array, and ensure that you create the correct sized partition on the new drive.

even if the dead drive took your only copy of the /boot partition with it (RAID-1 is useful for /boot!, you can still do it in the same machine. either by taking advantage of the fact that hotswap means you don't have to reboot, or by rebooting to a rescue CD with support for RAID and whatever drivers are necessary to use the drives.

re fans:

if your hot-swap bays are individual (each hot-swap unit only fits one drive bay) then that means they have tiny little 40mm fans at best. IMO & IME, that's next to useless. and noisy. you'd be better off replacing the hot swap bays with one of the 3-in-2 (fits 3x3.5" drives in 2 5.25" bays), 4-in-3, or 5-in-3 hot swap bays (e.g from Addonics). They come with 80mm fans on the back of the unit. much more air-flow, and a lot less noise.

or just get rid of the tiny little fans on the 1-drive bays and use sticky tape or something similarly temporary (you still need access to the bays, after all) to affix an 80mm or 120mm fan to your computer case in front of the drives.

finally, RAID-5:

I've given up on raid-5. i won't even use it in a real machine room with raised floors and air-conditioning (i use raid-6 there, when i need more than raid-1). twice i've lost all my data due to two drives dying at about the same time. the first time happened when a 2nd drive died during the re-sync operation to add a replacement drive to the array. the second was when, a few months after deciding to give raid-5 another try at home after the first incident, two drives died within hours of each other overnight. by the time i woke up and noticed, it was too late to do anything about it.

neither of these scenarios are unlikely or uncommon. a raid resync operation WILL find a fault on the source drive(s) if there's one there to be found. and drives bought at the same time, from the same batch are likely to have roughly the same lifespan when they are used identically (they'll also have the same bugs/flaws).

to make things worse - in both cases, i only had backups of the really important stuff, and lost a fair bit of stuff that i didn't think was important enough to include in my backup schedule.

never again.

i've gone back to using RAID-1 for everything important. costs me twice as much for disk space (since i have to buy two drives to get the space of 1), but it's worth it.

and i run regular rsync backups to other machines. each of the machines on my home network acts as an rsync backup repository for at least one of the other machines.

hdd temps

Your dev/hde gets quite hot, since 50 °C is considered the maximum recommended working temperature for hd's. This can seriously shorten its lifespan and may have caused the other drive's untimely crash.
Was hde under heavy load when you ran hddtemp? If not, you might consider to improve cooling, i.e. place a case fan in front of the drives or mount them in a special frame.

hdd temps

Thanks for the hint! Not sure if that drive was too hot before it died, but as this is a RAID, I'd assume the load on all drives should be approximately the same. I'll monitor the temperatures for a longer period of time to see if one of them is constantly hotter than the others. However, all drives are in a removable drive bay, each of which has a fan, so in theory it shouldn't be much of a problem.

monitoring disk temperatures

munin has two good plugins for graphing hdd temperatures. one that uses hddtemp, and one that uses smartctl. both are good.

it's interesting to see the temperature vary according to patterns of usage and, of course, hour of the day. you can see the effect of long-running disk-thrashing programs - e.g. there's a small temperature spike every morning when cron runs updatedb for locate. and transcoding several hours worth of mythtv recordings also makes a noticable difference.

my disks tend to run about 10-15 deg above ambient. it's summer here in melbourne and it's going to be 40C for the rest of the week. time to turn the AC on.