ranges between 0 - 1; 0 for an osd that’s out, 1 for in
value set to the disk size in TiB
decides data placement locally on the node
decides data placement across the CRUSH Map
custom value does not persist across osd recreation
custom value persists across all scenarios
accepts both osd.<num> and <num> formats
accepts only the osd.<num> format
Assumptions
Examples in this note will include operating on osd.18 and this is what the cluster looks like before running the commands:
The REWEIGHT column here indicates the value that will be changed by ceph osd reweight
The WEIGHT column here indicates the value that will be changed by ceph osd crush reweight
The host’s weight is decided by calculating the sum of the individual OSD weights.
The Difference
ceph osd reweight
Ranges from 0 - 1, 0 being the equivalent of marked out and 1 is the equivalent of being marked in. If an OSD has been marked in, it’ll have the reweight value of 1. We can confirm this with a small experiment.
Out the OSD. check osd reweight
In the OSD, check osd reweight
It forces CRUSH to move (1 - reweight) times the data that would have otherwise lived on this drive. However it does not change the weight of the host and therefore only causes data movement within the host, not across the crush map. This might cause a nearfull-osd situation as more data is allocated to a single OSD.
When a custom reweight is set (eg: 0.5), it persists through an osd being marked out and marked in. We can confirm this with a tiny experiment as well.
Set a custom weight on the OSD: 0.5
Mark it out and in, check the osd reweight
A reweight does not persist across wiping and recreation of the OSD using the same OSD ID.
Before wiping
After wiping and creating the OSD with the same OSD ID
Restarting the cluster will wipe out osd reweight and osd reweight-by-utilization, but osd crush reweight settings are persistent.
It’s not clear what a cluster restart implies here as the restarting the MONs and MGRs (both simultaneous and rolling) persists the osd reweight.
Even in the case of shutting down all the nodes (which is very unlikely in a production cluster), the reweight persists.
ceph osd crush reweight
A value that’s generally the size of the disk in TiB and dictates how much data CRUSH places on this particular disk.
The CRUSH weight of an OSD contributes to the CRUSH weight of the node, so any changes to the CRUSH weight causes data movement across the whole cluster.
When a custom crush weight is set (eg: 0), it persists across all scenarios:
osd out / osd in:
wiping the osd, recreating the OSD with the same OSD ID
Before wiping
After wiping and recreation with same OSD ID
a “cluster restart” (simultaneous / rolling restarts of MONs and MGRs, all nodes shutdown at once and started)