Drives that are failing might emit errors in one of the parameters of the SMART log which can be obtained with:
Those errors are visible in the
SMART ERROR Log or in the OSD logs that can be obtained with
journalctl -fu ceph-osd@<osd-num>.service or in the kernel log:
dmesg -T or a combination of all of the above. A short smartctl test (
smartctl -t short <drive>) can also be run to further confirm that the drive is dying but is often not necessary.
A Ceph cluster over time can have drives that are either completely non-functional or emitting read/write errors because of sustained use or manufacturing defects. In that case, a replacement of the drive is necessary to ensure configured data redundancy and sustained performance.
When there’s a failed drive/OSD, there are two situations:
The drive is dying but still has data available.
The disk is dead and no data can be retrieved from it.
In both cases, the disk can be marked out of the Ceph cluster with
ceph osd out <osd.num>or
ceph osd reweight 0(Both are equivalent operations).
To prevent such a situation, the trick here is to set
ceph osd crush reweight 0. This makes sure that the data is distributed to other OSDs on all the nodes / throughout the crushmap. See Difference Between OSD Reweight and CRUSH Reweight.
Wait till the OSD has been drained (0 PGs) if the disk is still alive. The subcommands
safe-to-destroycan be run to make sure that the OSD can be stopped and destroyed without affecting data redundancy. If the disk is dead, it can be replaced immediately.
The OSD can be destroyed and recreated thereafter.