Replacing a failed Drive in a Ceph Cluster

❓ What?

Drives that are failing might emit errors in one of the parameters of the SMART log which can be obtained with:

smartctl -a <drive>

Those errors are visible in the SMART ERROR Log or in the OSD logs that can be obtained with journalctl -fu ceph-osd@<osd-num>.service or in the kernel log: dmesg -T or a combination of all of the above. A short smartctl test (smartctl -t short <drive>) can also be run to further confirm that the drive is dying but is often not necessary.

❔ Why?

A Ceph cluster over time can have drives that are either completely non-functional or emitting read/write errors because of sustained use or manufacturing defects. In that case, a replacement of the drive is necessary to ensure configured data redundancy and sustained performance.

🎤 How?

Symptoms

When there’s a failed drive/OSD, there are two situations:

The drive is dying but still has data available.
The disk is dead and no data can be retrieved from it.

Procedure

In both cases, the disk can be marked out of the Ceph cluster with ceph osd out <osd.num> or ceph osd reweight 0 (Both are equivalent operations).
Ceph drains the dying OSD and moves data OR if the disk is dead, ceph rebuilds data from redundant bits / parity to other OSDs on the same node. This may cause a Nearfull OSD situation.
To prevent such a situation, the trick here is to set ceph osd crush reweight 0. This makes sure that the data is distributed to other OSDs on all the nodes / throughout the crushmap. See Difference Between OSD Reweight and CRUSH Reweight.
Wait till the OSD has been drained (0 PGs) if the disk is still alive. The subcommands ok-to-stop and safe-to-destroy can be run to make sure that the OSD can be stopped and destroyed without affecting data redundancy. If the disk is dead, it can be replaced immediately.
The OSD can be destroyed and recreated thereafter.

👓 References

https://ceph.io/en/news/blog/2014/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/

kayg.org

Finder

Table of Contents

Replacing a failed Drive in a Ceph Cluster

Table of Contents

❓ What?

❔ Why?

🎤 How?

Symptoms

Procedure

👓 References

Graph View

Backlinks

Recent Entries

Resisting Craft V3

The Password Manager Situation

LMP Version and Bluetooth Information on MacOS

Ceph PG Scaling