29/09/2013

Recovering from a major RAID5 with 2 disks failing out of 4

Yes. The title is a bit unrealistic, because if you have 2 disks failing out of 4, you are out of business. But I was lucky. One disk was failing into WRITE (at least that is what S.M.A.R.T. was saying). The other one was failing into READ (well, I didn't need S.M.A.R.T. to tell me anything, it just failed!). So, how did I manage (and believe me, this was a long, sleepless night, full of events). First, I took the disk that was failing WRITE commands and put it in another computer where I had some extra, unused disks (well, I just have them hanging there...). There, I copied the disk using dd (ah, you need Linux for this to work! Is there anything else but Linux for these kind of jobs anyway?):
dd if=/dev/sd${OLD} of=/dev/sd${NEW} # where you can replace your drives with the ones fitting your case

And here is where I got lucky! I managed to copy the whole drive on the new drive. But I wasn't not that lucky. When I put the drive back in the array. BOOHOOO! My RAID5 had 2 good drives now (one of them was the new one) but the third one was a SPARE!!! So, I still couldn't assemble the array. But there is a catch, I could rebuild the array without touching the data... Here is how it was looking before:
cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]       
md0 : inactive sdb3[4](S) sda3[0] sdc3[2]
      XXXXXXXXXX blocks

As you can see, the sdb3 is seen as a SPARE! Now, that is wrong, and it shouldn't be. So, I had to risk, declare the whole array as wrong and hope that the SPARE could actually have the information to get me out of this stall state. So, here is how I activated the SPARE:
mdadm --stop /dev/md0 # Stop the md0 array
mdadm -Cv /dev/md0 --assume-clean --level=5 --layout=left-symmetric --chunk=64 --raid-devices=4 /dev/sda3 /dev/sdb3 /dev/sdc3 missing # Remember, sdb was a SPARE and the sdd was GONE :)
[...CUT...]
Continue creating array? yes # I answered here with yes...
mdadm: array /dev/md0 started.

After that, I stopped, took a break (or a deep breath) and then looked into the array status:
cat /proc/mdstat                                                                                                                     
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md0 : active raid5 sdc3[2] sdb3[1] sda3[0]
      4391961792 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

You can see that the last one is gone missing (is marked as "_"). Better way to look at it is via:
mdadm --detail /dev/md0                                                                                                              
/dev/md0:
        Version : [...CUT...]
  Creation Time : [...CUT...]
     Raid Level : raid5
     Array Size : [...CUT...]
  Used Dev Size : [...CUT...]
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : [...CUT...]
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : [...CUT...]
         Events : [...CUT...]

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       0        0        3      removed

Next, all you need to do is add another clean drive (no partition table, you can wipe it out with dd if=/dev/zero of=/dev/sd${CLEAN} bs=512 count=8) and then make sure to let the array md0 know about it like this:
mdadm -a /dev/md0 /dev/sdd3
mdadm: added /dev/sdd3

Now, if you have problems with the partition of the drive, you can replicate the partition from one of the active drives (in my case, /dev/sda as master, and /dev/sdd as the target) like this:
sfdisk -d /dev/sda | sfdisk /dev/sdd

At the end of the process, you should see something like this:
mdadm --detail /dev/md0 
/dev/md0:
        Version : [...CUT...]
  Creation Time : [...CUT...]
     Raid Level : raid5
     Array Size : [...CUT...]
  Used Dev Size : [...CUT...]
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : [...CUT...]
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

 Rebuild Status : 0% complete

           UUID : [...CUT...]
         Events : 0.8

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       4       8       51        3      spare rebuilding   /dev/sdd3

And if you want to go closer to the kernel, you can always enter the sysfs like this:
cd /sys/block/md0/md
ls -al
[...CUT...]

You can sync, (echo check > sync_action) etc...

Well, that is all folks. I hope it gives other lost souls (or damage raid5 disks unlucky ones) a good idea how to recover the data. Drop a comment if you have questions or additions...