Linux: Repairing a software raid on a Sunday morning.
The trouble started when emails were coming in claiming first “Fail” and later “DegradedArray” events of a software raid running on a Debian Linux server. Looking into logs revealed an SSD (NVMe) died but let us look at it step by step.
Luckily for us the server was still operational and we were able to ssh into it. So the first thing we did was checking the raid status:
% cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 nvme1n1p3 465370432 blocks super 1.2 [2/1] [_U] bitmap: 4/4 pages [16KB], 65536KB chunk md0 : inactive nvme1n1p1(S) 33520640 blocks super 1.2 md1 : active (auto-read-only) raid1 nvme1n1p2 1046528 blocks super 1.2 [2/1] [_U]
We can see above that only one drive (nvme1n1) is used and the other one (nvme0n1) is missing. Also of our 3 groups one is inactive (md0). If we would have checked earlier we might have seen something like this:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 nvme1n1p3 465370432 blocks super 1.2 [2/1] [_U] bitmap: 4/4 pages [16KB], 65536KB chunk md0 : active (auto-read-only) raid1 nvme1n1p1 33520640 blocks super 1.2 [2/1] [_U] resync=PENDING md1 : active raid1 nvme1n1p2 1046528 blocks super 1.2 [2/1] [_U]
Which means that the md0 group is still active but has been put into read-only mode and waits for re-sync.
nvme list (maybe you need to install the nvme-cli package) should list all nvme drives.
Node Generic SN Model Namespace Usage Format FW Rev --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme1n1 /dev/ng1n1 XXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXX 1 XXXXX GB / XXXXXX GB 512 B + 0 B XXXXXXXX
In our case it just listed one instead of two. But because disks usually do not dissolve into thin air we check the logs:
% sudo dmesg | grep nvme nvme nvme0: I/O 1012 (I/O Cmd) QID 5 timeout, aborting nvme nvme0: I/O 1012 QID 5 timeout, reset controller nvme nvme0: I/O 8 QID 0 timeout, reset controller nvme nvme0: Device not ready; aborting reset, CSTS=0x1 nvme nvme0: Abort status: 0x371 nvme nvme0: Device not ready; aborting reset, CSTS=0x1 nvme nvme0: Removing after probe failure status: -19 nvme nvme0: Device not ready; aborting reset, CSTS=0x1 nvme0n1: detected capacity change from 1000215216 to 0 md/raid1:md2: Disk failure on nvme0n1p3, disabling device. md/raid1:md0: Disk failure on nvme0n1p1, disabling device. md/raid1:md1: Disk failure on nvme0n1p2, disabling device.
Well, bingo! Looks like one of our disks died. This implies replacing it with a new one (or pinging our hosting provider to do so). After shutting down, replacing the hardware and booting up again, we can go to work to bring the new disk into our software raid.
Step 1: copy partition table
The first thing we need to do is copying the partition table from the intact drive to the new one. Our drives were partitioned with the MBR scheme by our hosting provider, therefore we needed to issue the following command:
% sudo sfdisk -d /dev/nvme1n1 | sudo sfdisk /dev/nvme0n1
This dumps the partition table of drive nvme1n1 and writes it onto drive nvme0n1. Which is just what we wanted. Please note that you have to use other commands if your drives are partitioned with GPT!
Also please be aware that in our case the “first” drive (nvme0) died and the “second” one stayed healthy. Depending on your setup up, you’ll likely have different drive names, so just don’t copy paste commands blindly from this article!
Step 2: repair the raid
In our case we have three groups to repair (md0, md1, md2) of which one (md0) is completely inactive.
Re-activate and repair an inactive group
First we need to activate our inactive group md0 via the following command which will enable us to proceed to the repair of it.
% sudo mdadm --manage /dev/md0 --run
After it has been activated we can simply add the correct partition of the new second drive to it:
% sudo mdadm /dev/md/0 -a /dev/nvme0n1p1
cat /proc/mdstat should now show something like this:
md0 : active raid1 nvme0n1p1 nvme1n1p1 33520640 blocks super 1.2 [2/1] [_U] [=============>.......] recovery = 66.8% (22415360/33520640) finish=0.8min speed=206159K/sec
Repair an active group
For active raid groups (in our case md1 and md2) we don’t need to go through the extra step of activating them. Therefore simply adding the correct partitions from the new drive to the correct groups will trigger the recovery.
% sudo mdadm /dev/md/1 -a /dev/nvme0n1p2 % sudo mdadm /dev/md/2 -a /dev/nvme0n1p3
Remember to check via
cat /proc/mdstat for the progress of the recovery. It should contain information like this:
md2 : active raid1 nvme0n1p3 nvme1n1p3 465370432 blocks super 1.2 [2/1] [_U] [============>........] recovery = 61.8% (287942272/465370432) finish=14.2min speed=206976K/sec bitmap: 4/4 pages [16KB], 65536KB chunk
For larger partitions this can take a while. Please proceed only after the recovery has been finished completely!
Step 3: update the boot loader
Because we changed disks we need to update the bootloader. It is not installed on the new drive and if we need to change the old one we would be stuck without one. ;-)
First we need to update the device map for the GRUB bootloader:
% sudo grub-mkdevicemap -n
Afterwards we install it onto the new drive (Please note that we use nvme0n1 and not simply nvme0!):
% sudo grub-install /dev/nvme0n1
And that was it! Now cross your fingers and reboot the machine.