RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-09 20:52:09 +00:00

Author	SHA1	Message	Date
David Zafman	661996d434	mgr: Warn when too many reads are repaired on an OSD Include test case Configurable by setting mon_osd_warn_num_repaired (default 10) Ignore new health warning with random eio injection test Fixes: https://tracker.ceph.com/issues/41564 Signed-off-by: David Zafman <dzafman@redhat.com>	2020-06-16 17:45:27 -07:00
Igor Fedotov	deb0af6347	doc/rados/operations/health-checks: document bluestore spurious read errors alert. Signed-off-by: Igor Fedotov <ifedotov@suse.com>	2020-04-14 12:08:30 +03:00
Josh Durgin	772d7c1d3c	mgr/pg_autoscaler: add warning when target bytes and ratio are both set Signed-off-by: Josh Durgin <jdurgin@redhat.com>	2020-02-10 10:08:36 +08:00
Josh Durgin	d62c121ee3	mgr/pg_autoscaler: remove target ratio warning Since the ratios are normalized, they cannot exceed 1.0 or overcommit combined with target_bytes. Signed-off-by: Josh Durgin <jdurgin@redhat.com>	2020-02-10 10:08:36 +08:00
Tsung-Ju Lii	253cb9903e	doc/rados/operations: fix OSD_OUT_OF_ORDER_FULL fullness ordering Signed-off-by: Tsung-Ju Lii <usefulalgorithm@gmail.com>	2019-11-13 17:43:48 +08:00
Sage Weil	6e46b1c0e5	osd/OSDMap: health alert for non-power-of-two pg_num Fixes: https://tracker.ceph.com/issues/41647 Signed-off-by: Sage Weil <sage@redhat.com>	2019-09-24 09:26:33 -05:00
Sage Weil	2a1b58b5ac	doc/rados/operations/monitoring: document muting health alerts I think someday the docs for how health alerts work (here) and the enumeration of all actual alerts should be restructured. For now this si the simplest placde to fit this! Signed-off-by: Sage Weil <sage@redhat.com>t	2019-08-14 20:40:08 -05:00
Sage Weil	95b8e9fa0d	doc/rados/operations/health-checks: document MON_DISK_{LOW,CRIT,BIG} Signed-off-by: Sage Weil <sage@redhat.com>	2019-08-14 20:40:08 -05:00
Sage Weil	dd5e985614	doc/rados/operations/health-checks: document OSD_NO_DOWN_OUT_INTERVAL Signed-off-by: Sage Weil <sage@redhat.com>	2019-08-14 20:40:08 -05:00
Sage Weil	0eba993fad	doc/rados/operations/health-checks: document AUTH_BAD_CAPS Signed-off-by: Sage Weil <sage@redhat.com>	2019-08-14 20:40:08 -05:00
Sage Weil	7e9ba0a1c1	doc/reados/operations/health-checks: document PG_SLOW_SNAP_TRIMMING The mitigation steps are weak, but it's not clear concrete guidance to provide. Signed-off-by: Sage Weil <sage@redhat.com>	2019-08-14 20:40:08 -05:00
Sage Weil	078ef210d5	doc/rados/operations/health-checks: document MGR_DOWN Signed-off-by: Sage Weil <sage@redhat.com>	2019-08-14 20:40:08 -05:00
Sage Weil	1b6745efb4	doc/rados/operations/health-alerts: document BLUESTORE_NO_COMPRESSION Signed-off-by: Sage Weil <sage@redhat.com>	2019-08-14 20:40:08 -05:00
Sage Weil	f011c13547	Merge PR #29292 into master * refs/pull/29292/head: os/bluestore: warn on no per-pool omap os/bluestore: fsck: warning (not error) by default on no per-pool omap os/bluestore: fsck: int64_t for error count os/bluestore: default size of 1 TB for testing os/bluestore: behave if we do set PGMETA and PERPOOL flags os/bluestore: do not set both PGMETA_OMAP and PERPOOL_OMAP os/bluestore: fsck: only generate 1 error per omap_head os/bluestore: make fsck repair convert to per-pool omap os/bluestore: teach fsck to tolerate per-pool omap os/bluestore: ondisk format change to 3 for per-pool omap mon/PGMap: add data/omap breakouts for 'df detail' view osd/osd_types: separate get_{user,allocated}_bytes() into data and omap variants mon/PGMap: fix stored_raw calculation mon/PGMap: add in actual omap usage into per-pool stats osd: report per-pool omap support via store_statfs_t os/bluestore: set per_pool_omap key on mkfs osd/osd_types: count per-pool omap capable OSDs os/bluestore: report omap_allocated per-pool os/bluestore: add pool prefix to omap keys kv/KeyValueDB: take key_prefix for estimate_prefix_size() os/bluestore: fix manual omap key manipulation to use Onode::get_omap_key() os/bluestore: make omap key helpers Onode methods os/bluestore: add Onode::get_omap_prefix() helper os/bluestore: change _do_omap_clear() args Reviewed-by: Josh Durgin <jdurgin@redhat.com>	2019-08-09 10:40:45 -05:00
Sage Weil	b8501164ef	os/bluestore: warn on no per-pool omap Signed-off-by: Sage Weil <sage@redhat.com>	2019-08-09 08:21:18 -05:00
Neha Ojha	c9d2833b25	Merge pull request #29425 from aclamk/wip-bluestore-monitor-allocations [bluestore][tools] Inspect allocations in bluestore Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Igor Fedotov <ifedotov@suse.com> Reviewed-by: Neha Ojha <nojha@redhat.com>	2019-08-07 11:37:34 -07:00
Adam Kupczyk	713f9b4d09	doc/rados/operations/health-checks: document BlueStore fragmentation and BlueFS space available features Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>	2019-08-07 19:18:21 +02:00
Sage Weil	143e1f0469	mgr/telemetry: force re-opt-in if the report contents change Signed-off-by: Sage Weil <sage@redhat.com>	2019-07-31 20:33:19 -05:00
Sage Weil	c885ee7f0c	mgr/crash: raise RECENT_CRASH warning for recent (new) crashes Signed-off-by: Sage Weil <sage@redhat.com>	2019-07-19 09:43:04 -05:00
David Zafman	fa698e18e1	mon: Improve health status for backfill_toofull and recovery_toofull Treat backfull_toofull as a warning condition because it can resolve itself. Includes test case for PG_BACKFILL_FULL Includes test case for recovery_toofull / PG_RECOVERY_FULL Fixes: https://tracker.ceph.com/issues/39555 Signed-off-by: David Zafman <dzafman@redhat.com>	2019-06-20 02:22:01 +00:00
Xie Xingguo	302d7bcdd8	Merge pull request #27735 from xiexingguo/wip-device-class-noout osd: revamp {noup,nodown,noin,noout} related commands Reviewed-by: Sage Weil <sage@redhat.com>	2019-06-05 14:17:06 +08:00
xie xingguo	a3b0dc29b9	doc: refresh {noup,nodown,noin,noout} changes Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>	2019-05-30 10:52:38 +08:00
zjh	94237d3693	osd: Better error message when OSD count is less than osd_pool_default_size Fixes: http://tracker.ceph.com/issues/38617 Signed-off-by: zjh <jhzeng93@foxmail.com>	2019-04-28 20:09:13 +08:00
Sage Weil	c2190c1ff8	Merge PR #27519 into master * refs/pull/27519/head: doc/rados/operations/health-checks: document new bluestore warnings os/bluestore: alert on fm/bdev size mismatch os/bluestore: introduce legacy statfs alert Reviewed-by: Sage Weil <sage@redhat.com>	2019-04-16 14:31:49 -05:00
Sage Weil	b29495954b	doc/rados/operations/health-checks: document new bluestore warnings Signed-off-by: Sage Weil <sage@redhat.com>	2019-04-15 17:42:48 +03:00
Sage Weil	9aa9893b8f	osd/OSDMap: raise OSD_FLAGS health alert for crush node flags, too Signed-off-by: Sage Weil <sage@redhat.com>	2019-04-12 11:10:35 -05:00
Vangelis Tasoulas	24131fc59a	doc: Update documentation for the MANY_OBJECTS_PER_PG warning The current documentation for the MANY_OBJECTS_PER_PG warning states that The threshold can be raised to silence the health warning by adjusting the mon_pg_warn_max_object_skew config option on the monitors. It seems that this is not true (at least) since the luminous times, and this option should be adjusted on the managers. I encountered this problem and I spend quite sometime injecting the mon_pg_warn_max_object_skew to the monitors, added the option ceph.conf and restarted the monitors several times but the warning was not going away. I had to download the code to see what's happening and I found out this: $ git grep -A 3 mon_pg_warn_max_object_skew src/common/options.cc src/common/options.cc:1480: Option("mon_pg_warn_max_object_skew", Option::TYPE_FLOAT, Option::LEVEL_ADVANCED) src/common/options.cc-1481- .set_default(10.0) src/common/options.cc-1482- .set_description("max skew few average in objects per pg") src/common/options.cc-1483- .add_service("mgr"), After I restarted the ceph-mgr service, the warning went away. Signed-off-by: Vangelis Tasoulas <vangelis@tasoulas.net>	2019-04-05 19:53:35 +02:00
Sage Weil	242ef7824d	doc/rados/operations: document BLUEFS_SPILLOVER Signed-off-by: Sage Weil <sage@redhat.com>	2019-04-02 11:13:31 -05:00
Ashish Singh	7108e6a3c7	doc: Fix incorrect mention of 'osd_deep_mon_scrub_interval' Fixed the incorrect mention of 'osd_deep_mon_scrub_interval' in health-checks.rst. Changed it to 'osd_deep_scrub_interval'. Fixes: https://tracker.ceph.com/issues/38310 Signed-off-by: Ashish Singh <assingh@redhat.com>	2019-02-21 12:10:41 +05:30
David Zafman	6a9895b97a	mon: Fix scrub health warning handling and change config to a ratio Make this mon_warn code clearer since it involves 2 values Code used mon scrub interval instead of pg scrub interval Rename config values to include _pg_ and ratio to make it more clear Fix scrub warniing handling use per-pool intervals when specified Fixes: http://tracker.ceph.com/issues/37264 Signed-off-by: David Zafman <dzafman@redhat.com>	2019-01-23 16:49:33 -08:00
Sage Weil	b5e5ee6f40	Merge PR #25849 into master * refs/pull/25849/head: qa/suites/rados/upgrade: one mon per node, and enable-msgr2 at end qa/rados/thrash-old-clients: avoid msgr2 mon: make bootstrap rank check more robust mon: clean up probe debug output a bit msg/async: use v1 for v1 <-> [v2,v1] peers msg/async/AsyncMessenger: drop single-use _send_to mon/HealthMonitor: raise MON_MSGR2_NOT_ENABLED if mons not bound to msgr2 doc/rados/operations/health-checks: document MON_* health warnings mon/MonMapMonitor: add 'mon enable-msgr2' command mon: respawn if rank addr changes mon/MonMap: calc_addr_mons() after setting rank addrvec Reviewed-by: Ricardo Dias <rdias@suse.com>	2019-01-17 11:04:30 -06:00
Sage Weil	6ba8db68cd	mon/HealthMonitor: raise MON_MSGR2_NOT_ENABLED if mons not bound to msgr2 If the ms_bind_msgr2 option is enabled, and all mons are nautilus, raise a health alert if any mons aren't bound to msgr2 addresses. Whitelist tests that mon_bind_addrvec=false or mon_bind_msgr2=false. Signed-off-by: Sage Weil <sage@redhat.com>	2019-01-15 10:42:29 -06:00
Sage Weil	57c4795c00	doc/rados/operations/health-checks: document MON_* health warnings Signed-off-by: Sage Weil <sage@redhat.com>	2019-01-15 10:42:29 -06:00
Sage Weil	94620be57c	Merge PR #25273 into master * refs/pull/25273/head: doc/rados/operations/health-checks: Add LARGE_OMAP_OBJECTS Reviewed-by: Sage Weil <sage@redhat.com>	2019-01-12 05:56:41 -06:00
Brad Hubbard	522a21ec62	doc/rados/operations/health-checks: Add LARGE_OMAP_OBJECTS Document LARGE_OMAP_OBJECTS health check Signed-off-by: Brad Hubbard <bhubbard@redhat.com>	2019-01-12 12:16:47 +10:00
Sage Weil	f490fd0130	doc/rados/operations: document autoscaler and its health warnings Signed-off-by: Sage Weil <sage@redhat.com>	2018-12-18 13:30:54 -06:00
Bryan Stillwell	791b00daa1	doc: Multiple spelling fixes I ran a lot of the docs through aspell and found a number of spelling problems. Signed-off-by: Bryan Stillwell <bstillwell@godaddy.com>	2018-08-09 14:51:25 -06:00
Sage Weil	7ab8675fdf	doc/rados/operations/health-checks: document DEVICE_HEALTH* messages Signed-off-by: Sage Weil <sage@redhat.com>	2018-07-31 14:08:53 -05:00
John Spray	191cce74e1	doc: note new mgr module error codes Signed-off-by: John Spray <john.spray@redhat.com>	2018-01-24 13:08:21 -05:00
Kefu Chai	f5f2ced624	mgr/PGMap: drop REQUEST_{SLOW,STUCK} HEALTH_WARNs in mimic SLOW_OPS unifies both of them since mimic Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-11-23 17:41:47 +08:00
Sage Weil	027672b777	doc/rados/operations/health-checks: fix TOO_MANY_PGS discussion Fiddling with pgp_num doesn't help with TOO_MANY_PGS. Signed-off-by: Sage Weil <sage@redhat.com>	2017-09-14 16:01:14 -04:00
Alfredo Deza	3ec44df21d	doc/rados/operations use new ref labels in health-checks Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-08-16 08:20:01 -04:00
Alfredo Deza	5a3da3acaf	doc/rados/operations use new ref label in health-checks Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-08-16 08:20:01 -04:00
Alfredo Deza	d8932d62bf	doc/rados/operations use the new ref label for crush map tunables Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-08-16 08:20:00 -04:00
Patrick Donnelly	81be13b34c	doc: remove duplicate CephFS health check doc These are documented in doc/cephfs/health-messages.rst. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-08-04 12:28:38 -07:00
Kefu Chai	f273712e1b	doc: document bluestore compression settings Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-08-02 16:42:08 +08:00
Sage Weil	0afffa5c58	Merge pull request #16611 from liewegas/wip-doc-health doc/rados/operations/health-checks: osd section Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Kefu Chai <kchai@redhat.com> Reviewed-by: John Spray <john.spray@redhat.com> Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>	2017-08-01 08:26:24 -05:00
Sage Weil	dbb1dd33e6	doc/rados/operations/health-checks: add PG health check commentary Include a link to pg-repair.rst, although there is no content there yet. Signed-off-by: Sage Weil <sage@redhat.com>	2017-08-01 09:25:42 -04:00
Sage Weil	6bac77e960	doc/rados/operations/health-checks: osd section First paragraph: explain what the error means. Second or later paragraph: describe steps to fix or mitigate. Signed-off-by: Sage Weil <sage@redhat.com>	2017-08-01 09:25:41 -04:00
Kefu Chai	2670d244fd	doc: various fixes - radosgw/s3/bucketops.rst: fix Malformed table. - operations/health-checks.rst: Title underline too short - rbd/rados-rbd-cmds.rst: Title underline too short - rados/operations/index.rst: include health-checks in toc Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-08-01 17:31:36 +08:00

1 2

52 Commits