Sunday 21 April 2019

linux - Degraded RAID5 and no md superblock on one of remaining drive


This is actually on a QNAP TS-509 NAS. The RAID is basically a Linux RAID.


The NAS was configured with RAID 5 with 5 drives (/md0 with /dev/sd[abcde]3). At some point, /dev/sde failed and drive was replaced. While rebuilding (and not completed), the NAS rebooted itself and /dev/sdc dropped out of the array. Now the array can't start because essentially 2 drives have dropped out. I disconnected /dev/sde and hoped that /md0 can resume in degraded mode, but no luck.. Further investigation shows that /dev/sdc3 has no md superblock. The data should be good since the array was unable to assemble after /dev/sdc dropped off.


All the searches I done showed how to reassemble the array assuming 1 bad drive. But I think I just need to restore the superblock on /dev/sdc3 and that should bring the array up to a degraded mode which will allow me to backup data and then proceed with rebuilding with adding /dev/sde.


Any help would be greatly appreciated.


mdstat does not show /dev/md0


# cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md5 : active raid1 sdd2[2](S) sdc2[3](S) sdb2[1] sda2[0]
530048 blocks [2/2] [UU]

md13 : active raid1 sdd4[3] sdc4[2] sdb4[1] sda4[0]
458880 blocks [5/4] [UUUU_]
bitmap: 40/57 pages [160KB], 4KB chunk

md9 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
530048 blocks [5/4] [UUUU_]
bitmap: 33/65 pages [132KB], 4KB chunk

mdadm show /dev/md0 is still there


# mdadm --examine --scan
ARRAY /dev/md9 level=raid1 num-devices=5 UUID=271bf0f7:faf1f2c2:967631a4:3c0fa888
ARRAY /dev/md5 level=raid1 num-devices=2 UUID=0d75de26:0759d153:5524b8ea:86a3ee0d
spares=2
ARRAY /dev/md0 level=raid5 num-devices=5 UUID=ce3e369b:4ff9ddd2:3639798a:e3889841
ARRAY /dev/md13 level=raid1 num-devices=5 UUID=7384c159:ea48a152:a1cdc8f2:c8d79a9c

With /dev/sde removed, here is the mdadm examine output showing sdc3 has no md superblock


# mdadm --examine /dev/sda3
/dev/sda3:
Magic : a92b4efc
Version : 00.90.00
UUID : ce3e369b:4ff9ddd2:3639798a:e3889841
Creation Time : Sat Dec 8 15:01:19 2012
Raid Level : raid5
Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB)
Array Size : 5854278400 (5583.08 GiB 5994.78 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 0

Update Time : Sat Dec 8 15:06:17 2012
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : d9e9ff0e - correct
Events : 0.394

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 3 0 active sync /dev/sda3

0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
4 4 0 0 4 faulty removed
[~] # mdadm --examine /dev/sdb3
/dev/sdb3:
Magic : a92b4efc
Version : 00.90.00
UUID : ce3e369b:4ff9ddd2:3639798a:e3889841
Creation Time : Sat Dec 8 15:01:19 2012
Raid Level : raid5
Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB)
Array Size : 5854278400 (5583.08 GiB 5994.78 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 0

Update Time : Sat Dec 8 15:06:17 2012
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : d9e9ff20 - correct
Events : 0.394

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 19 1 active sync /dev/sdb3

0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
4 4 0 0 4 faulty removed
[~] # mdadm --examine /dev/sdc3
mdadm: No md superblock detected on /dev/sdc3.
[~] # mdadm --examine /dev/sdd3
/dev/sdd3:
Magic : a92b4efc
Version : 00.90.00
UUID : ce3e369b:4ff9ddd2:3639798a:e3889841
Creation Time : Sat Dec 8 15:01:19 2012
Raid Level : raid5
Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB)
Array Size : 5854278400 (5583.08 GiB 5994.78 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 0

Update Time : Sat Dec 8 15:06:17 2012
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : d9e9ff44 - correct
Events : 0.394

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 51 3 active sync /dev/sdd3

0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
4 4 0 0 4 faulty removed

fdisk output shows /dev/sdc3 partition is still there.


[~] # fdisk -l

Disk /dev/sdx: 128 MB, 128057344 bytes
8 heads, 32 sectors/track, 977 cylinders
Units = cylinders of 256 * 512 = 131072 bytes

Device Boot Start End Blocks Id System
/dev/sdx1 1 8 1008 83 Linux
/dev/sdx2 9 440 55296 83 Linux
/dev/sdx3 441 872 55296 83 Linux
/dev/sdx4 873 977 13440 5 Extended
/dev/sdx5 873 913 5232 83 Linux
/dev/sdx6 914 977 8176 83 Linux

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 66 530113+ 83 Linux
/dev/sda2 67 132 530145 82 Linux swap / Solaris
/dev/sda3 133 182338 1463569695 83 Linux
/dev/sda4 182339 182400 498015 83 Linux

Disk /dev/sda4: 469 MB, 469893120 bytes
2 heads, 4 sectors/track, 114720 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/sda4 doesn't contain a valid partition table

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 66 530113+ 83 Linux
/dev/sdb2 67 132 530145 82 Linux swap / Solaris
/dev/sdb3 133 182338 1463569695 83 Linux
/dev/sdb4 182339 182400 498015 83 Linux

Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 1 66 530125 83 Linux
/dev/sdc2 67 132 530142 83 Linux
/dev/sdc3 133 182338 1463569693 83 Linux
/dev/sdc4 182339 182400 498012 83 Linux

Disk /dev/sdd: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdd1 1 66 530125 83 Linux
/dev/sdd2 67 132 530142 83 Linux
/dev/sdd3 133 243138 1951945693 83 Linux
/dev/sdd4 243139 243200 498012 83 Linux

Disk /dev/md9: 542 MB, 542769152 bytes
2 heads, 4 sectors/track, 132512 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md9 doesn't contain a valid partition table

Disk /dev/md5: 542 MB, 542769152 bytes
2 heads, 4 sectors/track, 132512 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md5 doesn't contain a valid partition table

Answer



Ouch!



All the searches I done showed how to reassemble the array assuming 1 bad drive.



That is because RAID5 will not work with more than one failed drive. You can not guarantee to recover all data with two missing drives. In fact, if both drives are fully inaccessible recovering it will fail. The data simple is not there anymore.


Two notes:



  1. I wrote fully down. As in dead disk, drive removed from system. Not just a single bad sector.

  2. The usual rant that RAID is not a backup. If RAID fails you just have to keep the system up till 5PM, backups the files changed since the last backup (using incremental backup) and then you can either try a lengthy rebuild or rebuild the RAID and restore from backup. Obviously as a home user you do things slightly different, but the same problem persists when doing a RAID5 rebuild and getting an URE.


( Also see This canonical post Serverfault and this post on S.U. and this post on S.U. )


In your case I see these options:



  1. Send the drives in to a very expensive data recovery lab. These things are really expensive.

  2. Give up and restore from an old backup.

  3. Try to mount the RAID arrays with two drives missing.


Before you try option 3: Make a backup of the drives. Place them in another system and copy the drives with dd or ddrescue. Keep those images. If things fail you can restore to the current situation from these. (read: things will not get worse).


You then can try to recover either from the NAS, or from the system where you stored the images. Make a working copy of them and use the loopback device. If you have sufficient diskspace then this is the preferred way, though you would need a place with twice the free diskspace of your entire NAS.


Next read this rather lengthy blog at http://blog.al4.co.nz/2011/03/recovering-a-raid5-mdadm-array-with-two-failed-devices/.


The essential steps in it are:
mdadm --create /dev/md1 --level=5 --raid-devices=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 missing


That would mark drive 5 as missing. I selected that one because I have no idea what state it is in after a partial rebuild.


With a bit of luck you can now mount it as a degraded array. Copy all data off it, then delete the array and rebuild. It may hang during the copying of data. In that case reboot, skip a few files and continue. It is far from perfect, but if recovery is to expensive and you have no backups then this might be the only way.


No comments:

Post a Comment

How can I VLOOKUP in multiple Excel documents?

I am trying to VLOOKUP reference data with around 400 seperate Excel files. Is it possible to do this in a quick way rather than doing it m...