Overview

The Ceph MDS standby-replay daemon is affected by a now-fixed memory leak bug that causes gradually increasing memory consumption1.

This is fixed in Ceph Versions 17.2.82 and 18.2.43.

Workarounds

  1. Memory-Based Restart:

    • Monitor MDS memory usage
    • Restart the MDS when it reaches a specific memory threshold
    • Can be automated using earlyoom
  2. Disable Standby-Replay:

ceph fs set <fsname> allow_standby_replay false

Bug Reference

This issue is tracked in Ceph’s issue tracker1. The bug affects multiple versions of Ceph and requires either implementing one of the workarounds or upgrading to a version with the fix.

Footnotes

  1. Ceph Issue #48673 - MDS Memory Leak 2

  2. https://tracker.ceph.com/issues/63675

  3. https://tracker.ceph.com/issues/63676