* refs/pull/29581/head:
os/bluestore: do not set osd_memory_target default from cgroup limit
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Mark Nelson <mnelson@redhat.com>
* refs/pull/29422/head:
qa/tasks/mgr/dashboard/test_health: update schema
doc/rados/operations/monitoring: document muting health alerts
qa/standalone/mon/health-mutes: add tests
doc/rados/operations/health-checks: document MON_DISK_{LOW,CRIT,BIG}
doc/rados/operations/health-checks: document OSD_NO_DOWN_OUT_INTERVAL
doc/rados/operations/health-checks: document AUTH_BAD_CAPS
doc/reados/operations/health-checks: document PG_SLOW_SNAP_TRIMMING
doc/rados/operations/health-checks: document MGR_DOWN
mon/HealthCheck: check mutes based on count, not parsing the summary string
mon/health_checks: associate a count with health_alert_t
mon/HealthMonitor: simplify health alert dump
mon/PGMap: use nice timespan for PG stuck warnings
mon/HealthMonitor: allow muted alert counts to decrease but not increase
mon/PGMap: fix summary form for bluestore health alerts
doc/rados/operations/health-alerts: document BLUESTORE_NO_COMPRESSION
mon/PGMap: fix summary form for POOL_APP_NOT_ENABLED
mon/HealthMonitor: persist summary for non-sticky mutes
mon/HealthMonitor: move get_health_status()
mon/HealthMonitor: automatically clear non-sticky mutes when alert clears
mon/HealthMonitor: add gather_all_health_checks helper
mon/HealthMonitor: add sticky flag to mutes
mon/HealthMonitor: expire mutes based on ttl
mon: apply mutes to health [detail]
mon/HealthMonitor: implement mute and unmount commands
mon/HealthMonitor: maintain list of mutes
mon: refactor/simplify health [detail]
mon/health_checks: format 'health summary' with a colon
mon/health_checks: drop dump_summary_compat
Reviewed-by: Neha Ojha <nojha@redhat.com>
This was introduced by #27754. The explicit device lists were cast to
sets but other parts of the code where not updated accordingly. To avoid
touching all code places, only cast to sets for disjoint test and keep
lists otherwise.
Fixes: https://tracker.ceph.com/issues/41292
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
mgr/dashboard: Fixes 'defaultBuilder' is not a function
Reviewed-by: Ricardo Dias <rdias@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Tiago Melo <tmelo@suse.com>
Also fix the 'checks' field, which is a list of objects, not strings. (The
test doesn't notice because it's empty.)
Signed-off-by: Sage Weil <sage@redhat.com>
I think someday the docs for how health alerts work (here) and the
enumeration of all actual alerts should be restructured. For now this
si the simplest placde to fit this!
Signed-off-by: Sage Weil <sage@redhat.com>t
Make sure mute and unmute work. Make sure stick is sticky. Mkae sure
counts can go down bupt if they go upt hte mute clears.
Signed-off-by: Sage Weil <sage@redhat.com>
This is more explicit and robust, and works with the PG warnings, which
don't conform to the "%d ..." form that the other messages do.
Signed-off-by: Sage Weil <sage@redhat.com>
0 means this is a singleton. Otherwise, we can sum this up, either
via merge() or get_or_add().
We always structure this so the count goes toward zero (more healthy), so
if a value is too low, then we count how much too low it is.
Signed-off-by: Sage Weil <sage@redhat.com>
Use dump() member instead of duplicating! The only reason we had this
before was because the detail portion was optinoal
Signed-off-by: Sage Weil <sage@redhat.com>
If the summary starts with a digit, parse a count.
If the count goes up, clear the mute.
If the count goes down, update the mute so that we ratchet the threshold
down.
Signed-off-by: Sage Weil <sage@redhat.com>
- de-escalate severity
- mark mutes in structured output
- note mutes in summary text output
- mark mutes in detail text output
Signed-off-by: Sage Weil <sage@redhat.com>