feature: Health warnings on long network ping times, add "dump_osd_network" to get a report
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Executed ceph-conf --dump-all on a freshly installed v14.2.2 (nautilus)
cluster. Compared the global defaults to the keys/values specified in
mon-config-ref.rst. Checked options.cc to make sure the obsolete keys
are no longer used.
Fixes: https://tracker.ceph.com/issues/41516
Signed-off-by: James McClune <jmcclune@mcclunetechnologies.net>
Signed-off-by: Myna <mynaramana@gmail.com>
(cherry picked from commit a20ba26721)
Note: This documentation fix was merged to nautilus via
https://github.com/ceph/ceph/pull/29191 without being merged to master, first.
This commit forward-ports the fix to master.
osd: scrub error on big objects; make bluestore refuse to start on big objects
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
I think someday the docs for how health alerts work (here) and the
enumeration of all actual alerts should be restructured. For now this
si the simplest placde to fit this!
Signed-off-by: Sage Weil <sage@redhat.com>t
* refs/pull/29292/head:
os/bluestore: warn on no per-pool omap
os/bluestore: fsck: warning (not error) by default on no per-pool omap
os/bluestore: fsck: int64_t for error count
os/bluestore: default size of 1 TB for testing
os/bluestore: behave if we *do* set PGMETA and PERPOOL flags
os/bluestore: do not set both PGMETA_OMAP and PERPOOL_OMAP
os/bluestore: fsck: only generate 1 error per omap_head
os/bluestore: make fsck repair convert to per-pool omap
os/bluestore: teach fsck to tolerate per-pool omap
os/bluestore: ondisk format change to 3 for per-pool omap
mon/PGMap: add data/omap breakouts for 'df detail' view
osd/osd_types: separate get_{user,allocated}_bytes() into data and omap variants
mon/PGMap: fix stored_raw calculation
mon/PGMap: add in actual omap usage into per-pool stats
osd: report per-pool omap support via store_statfs_t
os/bluestore: set per_pool_omap key on mkfs
osd/osd_types: count per-pool omap capable OSDs
os/bluestore: report omap_allocated per-pool
os/bluestore: add pool prefix to omap keys
kv/KeyValueDB: take key_prefix for estimate_prefix_size()
os/bluestore: fix manual omap key manipulation to use Onode::get_omap_key()
os/bluestore: make omap key helpers Onode methods
os/bluestore: add Onode::get_omap_prefix() helper
os/bluestore: change _do_omap_clear() args
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
* refs/pull/29337/head:
mon: enable telemetry module by default
mgr/telemetry: force re-opt-in if the report contents change
mgr/telemetry: less noise in the log
mgr/telemetry: wake up serve on config change
mgr/telemetry: track telemetry report revisions
Reviewed-by: Neha Ojha <nojha@redhat.com>
doc: pg_num should always be a power of two
Reviewed-By: Jan Fajerski <jfajerski@suse.com>
Reviewed-By: Sage Weil <sage@redhat.com>
Reviewed-By: Abhishek Lekshmanan <abhishek@suse.com>
like
```
ceph osd pool set <pool-name> crush_rule <rule-name>
```
where `<rule-name>` is a string instead of a number.
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/29034/head:
doc/mgr/crash: document missing commands, options
qa/suites/rados/singleton/all/test-crash: whitelist RECENT_CRASH
qa/suites/rados/mgr/tasks/insights: whitelist RECENT_CRASH
qa/tasks/mgr/test_insights: crash module now rejects bad crash reports
mgr/telemetry: fix remote into crash do_ls()
mgr/crash: don't make these methods static
mgr/BaseMgrModule: handle unicode health detail strings
mgr/crash: verify timestamp is valid
qa/suites/mgr: whitelist RECENT_CRASH
mgr/crash: remove unused var
mgr/crash: remove unused import 'six'
qa/workunits/rados/test_crash: health check
mgr/crash: improve validation on post
mgr/crash: automatically prune old crashes after a year
mgr/crash: raise RECENT_CRASH warning for recent (new) crashes
mgr/crash: add 'crash ls-new'
mgr/crash: add option and serve infra
mgr/crash: keep copy of crashes in memory
mgr/pg_autoscaler: adjust style to match built-in tables
mgr/crash: make 'crash ls' a nice table with a NEW column
mgr/crash: nicely format 'crash info' output
mgr/crash: add 'crash archive <id>', 'crash archive-all' commands
Reviewed-by: Neha Ojha <nojha@redhat.com>
Include hardware details and update language for modern tools.
Fixes: http://tracker.ceph.com/issues/39620
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
doc/rados/configuration: update to be in sync with ConfUtils changes
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
osd: add hdd and ssd variants for osd_recovery_max_active
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Mark Nelson <mnelson@redhat.com>
Treat backfull_toofull as a warning condition because it can resolve itself.
Includes test case for PG_BACKFILL_FULL
Includes test case for recovery_toofull / PG_RECOVERY_FULL
Fixes: https://tracker.ceph.com/issues/39555
Signed-off-by: David Zafman <dzafman@redhat.com>
- be specific about stopped OSDs
- add missing '--no-mon-config' option
- fix indent of here script delimiting identifier
- use $host variable in for loop
Signed-off-by: Hannes von Haugwitz <hannes@vonhaugwitz.com>
to use strict priority ordering.
The new "mclock_opclass/mclock_client" queue basically prioritizes
operations based on the class they belong to. The priority property
of an operation, if lower than a specific value (64, by default),
will get ignored and hence all operations from the same class will
be treated fairly in a FIFO fashion (but still limited by the total
IOPS or bandwidth available for the corresponding class).
To reduce the impact of performance, a more general strategy would be
enforcing some limitations on the IOPS or bandwidth for the background
recovery (or backfill) operation class. However, this way we'll end up
blocking client operations too if they are currently blocked by some
degraded objects which need to be recovered first.
We hereby grant recovery operations of this kind a higher priority
to force them to use strict priority ordering, which should still
be of significance once we switch to the new "mclock_opclass/mclock_client"
queue.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
The Luminous release notes tell users to ensure that rbd clients have
the ability to blacklist other client users; this is provided by
"profile rbd", which this change now documents explicitly in the user
management documentation.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
Use OSD_POOL_PRIORITY_MAX and OSD_POOL_PRIORITY_MIN constants
Scale legacy priorities if exceeds maximum
Signed-off-by: David Zafman <dzafman@redhat.com>
Add the missing `max_change`, `max_osds`, and `--no-increasing` parameters to `reweight-by-utilization` and `test-reweight-by-utilization`. Minor adjustments to wording.
Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
osd_pool_default_pg_autoscale_mode is the right parameter to
set placement-group autoscale mode.
Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
The current documentation for the MANY_OBJECTS_PER_PG warning
states that The threshold can be raised to silence the health
warning by adjusting the mon_pg_warn_max_object_skew config
option on the monitors. It seems that this is not true (at least)
since the luminous times, and this option should be adjusted on
the managers.
I encountered this problem and I spend quite sometime injecting
the mon_pg_warn_max_object_skew to the monitors, added the option
ceph.conf and restarted the monitors several times but the warning
was not going away. I had to download the code to see what's
happening and I found out this:
$ git grep -A 3 mon_pg_warn_max_object_skew src/common/options.cc
src/common/options.cc:1480: Option("mon_pg_warn_max_object_skew", Option::TYPE_FLOAT, Option::LEVEL_ADVANCED)
src/common/options.cc-1481- .set_default(10.0)
src/common/options.cc-1482- .set_description("max skew few average in objects per pg")
src/common/options.cc-1483- .add_service("mgr"),
After I restarted the ceph-mgr service, the warning went away.
Signed-off-by: Vangelis Tasoulas <vangelis@tasoulas.net>
Added note about the requirement for the latest ceph-iscsi version
3 to the dashboard documentation. Added some doc references
and replaced some URLs in the iSCSI docs with reST labels instead.
Signed-off-by: Lenz Grimmer <lgrimmer@suse.com>
config-ref: add a note on current scheduler settings.
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>