Commit Graph

125450 Commits

Author SHA1 Message Date
Patrick Donnelly
2cd3494771 qa: update mds_pre_upgrade to no longer stop standbys
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
8e0b9bcad6 qa: update mds_pre_upgrade to disable standby-replay
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
295971b9c6 qa: add tests for compat manipulation and upgrade
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
54b649af9b doc: remove deprecated compat commands
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
efb70f2b33 doc: update MDS upgrade procedure
Now that CompatSet changes to the FSMap no longer cause old MDS to
suicide.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
58eaa237b0 mon,mds: use per-MDS compat to inform replacement
This diff makes the following changes:

- FSMap::compat is now just a "default compat" of currently unknown
  utility. It is used when constructing a new file system but does
  not really have any effect or current use.

- The `mds compat *` CLI commands are deprecated. They manipulate
  the default compat which has no useful effect.

- Each MDS sends its compat to the mons in its beacon. This is from
  MDSMap::get_compat_set_all() at MDS boot. This CompatSet does not
  change for the duration of the MDS lifetime.

- Mons record each MDS compat in the FSMap to inform standby failover.
  An MDS is only promoted if it is compatible with the file system
  compat.

- Mons upgrade (merge) the file system compat when (a) the number of
  *in* MDS is 1 (effected by max_mds=1) and (b) the mons are promoting a
  standby with a new compat. A file system is never upgraded when there
  is more than 1 rank to prevent two MDS with incompatible compat.

- A suite of `fs compat` commands exist to manipulate the file system
  compat. These exist mostly for testing.

The consequence of these changes is that the upgrade procedure for MDS
can be updated to no longer require turning off all MDS but rank 0
before performing any upgrades. A CompatSet change would cause all MDS
receiving the new MDSMap to suicide due to incompatibility (if so).
Instead, the monitors will no longer assign an incompatible MDS to a
file system and enforce an upgrade procedure if incompatibilities exist.

Fixes: https://tracker.ceph.com/issues/49720
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
8cdc36c89d mon: do not update inline incompat except via mds
The MDS_FEATURE_INCOMPAT_INLINE feature indicates that an MDS knows how
to read/write inline data and that the file system may have it. The
separate setting for inline_data protects this file system feature.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:53 -07:00
Patrick Donnelly
4a10b6016f mds: add MDSMap method for creating null MDSMap
It's not necessary to distribute a CompatSet with the null mdsmap. We
only need to communicate that the MDS is not part of any map.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:53 -07:00
Patrick Donnelly
0256ae010f mds: only update beacon epoch if newer
This is a defensive programming change. We don't want the beacon epoch
to ever go backwards.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:53 -07:00
Patrick Donnelly
b8ad8a8c82 mds: harden standby_mds lookup
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:53 -07:00
Patrick Donnelly
31c8edd603 mon/FSCommands: accept generic ostream rather than stringstream
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:53 -07:00
Patrick Donnelly
56b36e69fa include: add less verbose CompatSet dump
For printing in `fs dump`.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:53 -07:00
Patrick Donnelly
5da02036ee include: add dump operator for Feature
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:53 -07:00
Patrick Donnelly
d179fa2bc0 include: add const qualifier to appropriate CompatSet methods
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:53 -07:00
Patrick Donnelly
5ae7b9202b Merge PR #42513 into master
* refs/pull/42513/head:
	qa: multifs already enabled as default

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 14:03:36 -07:00
Patrick Donnelly
54fea240de Merge PR #42499 into master
* refs/pull/42499/head:
	client:make sure only to update dir dist from auth mds

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 14:02:33 -07:00
Patrick Donnelly
c99a5e56a6 Merge PR #42201 into master
* refs/pull/42201/head:
	qa: fold frag confs into conf/mds.yaml

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
2021-07-30 14:00:19 -07:00
Neha Ojha
bd309c2ee8
Merge pull request #42133 from sseshasa/wip-persist-osd-iops-cap-mclock
osd: Add mechanism to avoid running OSD bench on every OSD init when mclock_scheduler is enabled

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-07-30 13:14:03 -07:00
Daniel Gryniewicz
8e947ccc83
Merge pull request #42550 from dang/wip-dang-zipper-writer
Zipper Writer API
Reviewed-by: cbodley@redhat.com
2021-07-30 15:12:21 -04:00
Daniel Gryniewicz
33d7954c50
Merge pull request #31454 from soumyakoduri/dbstore
rgw/Zipper: DB Backend store

Reviewed-by: dang@redhat.com
Reviewed-by: amaredia@redhat.com
2021-07-30 14:00:30 -04:00
Daniel Gryniewicz
38c2c646da RGW - Zipper - Proper Writer API
With the implementation of DBStore, it was determined that the API used
for writing in Zipper was too tied to RADOS.  Implement a clean writing
API named Writer.

Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
2021-07-30 12:47:32 -04:00
Daniel Gryniewicz
ea33bc4fad
Merge pull request #42266 from dang/wip-dang-zipper-raw_obj
Wip dang zipper raw obj

Reviewed-by: Soumya Koduri <skoduri@redhat.com>
2021-07-30 10:06:56 -04:00
Sridhar Seshasayee
464e9ea6c0 qa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout
Modified test cases:

1. ver-health.sh:
  a. TEST_check_version_health_1():
    To avoid intermittent timeouts observed in wait_for_health_string(),
    increase the wait time to 20 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
33d2a2c93b qa/standalone/scrub: Force a subset of scrub tests to use "wpq" scheduler
The following tests in the test files mentioned below use the
"osd_scrub_sleep" option to introduce delays during scrubbing to help
determine scrubbing states, validate reservations during scrubbing etc..
This works when using the "wpq" scheduler.

But when the "mclock_scheduler" is enabled, the "osd_scrub_sleep" is
disabled and overridden to 0. This is done to delegate the scheduling of
the background scrubs to the "mclock_scheduler" based on the set QoS
parameters. Due to this, the checks to verify the scrub states,
reservations etc. fail since the window to check them is very short
due to scrubs completing very quickly. This affects a small subset of
scrub tests mentioned below,

1. osd-scrub-dump.sh -> TEST_recover_unexpected()
2. osd-scrub-repair.sh -> TEST_auto_repair_bluestore_tag()
3. osd-scrub-test.sh -> TEST_scrub_abort(), TEST_deep_scrub_abort()

Only for the above tests, until there's a reliable way to query scrub
states with "--osd-scrub-sleep" set to 0, the "osd_op_queue" config
option is set to "wpq".

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
f658ff3511 qa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler
Modified test cases:

1. test-erasure-eio.sh:
  a. Test_ec_backfill_unfound():
    - Set osd_mclock_profile to high_recovery_ops profile.
    - Increase the wait for backfill_unfound timeout to 240 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
bdf36cf045 qa/standalone/osd-backfill: Modify backfill tests for mclock scheduler
Modified test cases:

1. osd-backfill-prio.sh:
  Set osd_op_queue = wpq for all tests since the mclock doesn't
  consider recovery priority as part of its scheduling algorithm.

2. osd-backfill-space.sh:
  Set osd_mclock_profile to high_recovery_ops and increase the wait
  for backfills timeout to 1200 secs for the following tests:
  - TEST_backfill_test_simple()
  - TEST_backfill_test_multi()
  - TEST_backfill_test_sametarget()
  - TEST_backfill_multi_partial()
  - TEST_ec_backfill_simple()
  - TEST_ec_backfill_multi()
  - SKIP_TEST_ec_backfill_multi_partial()
  - SKIP_TEST_ec_backfill_multi_partial()

3. osd-backfill-stats:
  - TEST_backfill_ec_down_all_out():
   Set osd_mclock_profile to high_recovery_ops and increase the wait
   for recovery timeout to 240 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
2c577040cb qa/standalone/osd: Modify osd tests for mclock scheduler
Modified test cases:
1. osd-recovery-prio.sh:
   Set osd_op_queue = wpq for all tests since mclock
   doesn't consider recovery priority as part of its
   scheduling algorithm.

2. osd-recovery-stats.sh:
   a. TEST_recovery_undersized():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for recovery timeout to 300 secs.

3. osd-rep-recov-eio.sh:
   a. TEST_rep_backfill_unfound():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for backfill_unfound to 360 secs.

4. repeer-on-acting-back.sh:
   a. TEST_repeer_on_down_act():
     - Set osd_mclock_profile to high_recovery_ops profile.
       (To improve the test duration)

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
5a85a6a035 qa/standalone: Modify ceph-helpers.sh tests for mclock scheduler.
List of changes:

1. Remove the enforcement to use osd_op_queue=wpq when an osd is brought
   up in the following functions:
   - run_osd()
   - run_osd_filestore() and
   - activate_osd()

2. New functions:
   - get_op_scheduler() - Get the current osd_op_queue for an osd.

3. Modified test cases:
   - test_run_osd() - Add check for osd_max_backfill count.
     The mclock scheduler overrides the count to 1000.

4. New test cases:
   - test_activate_osd_after_mark_down()
   - test_get_op_scheduler()

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
8725a10880 osd: Add a new config option to forcibly run OSD benchmark on init
The new config option "osd_mclock_force_run_benchmark_on_init" is
introduced to allow a user to force run the OSD benchmark test on every
OSD boot-up even if the historical data about the OSD's iops capacity is
available on the MON config store. The 'force_run_benchmark' flag is set
to the value indicated by the new config option.

By default this new config option is set to false.

The utility of this option is to help refresh the OSD iops capacity
when the underlying device's performance characteristics have changed
significantly. In such cases, the OSD can be restarted with this option
enabled temporarily. Once the new iops capacity is updated to the MON
store, this option can be removed from the OSD's start-up config.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
10f8b79ca3 osd: Add mechanism to avoid running OSD benchmark on every OSD boot-up
Use "mon_cmd_set_config()" to store the OSD's max iops capacity to
the MON store during the first bring-up. Don't run the OSD benchmark
test on subsequent boot-ups if a previously persisted iops capacity is
available on the MON store and is different from the default iops
capacity.

Add the 'force_run_benchmark' flag to force a run of the benchmark
in case the default iops capacity cannot be determined.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
9438e5a4b6 common/config: Add methods to return the default value of a config option
Add wrapper method "get_val_default()" to the ConfigProxy class that takes
the config option key to search. This method in-turn calls another method
with the same name added to md_config_t class that does the actual work of
searching for the config option. If the option is valid, _get_val_default()
is used to get the default value. Otherwise, the wrapper method returns
std::nullopt.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
1fca4bdfd4 osd: Add method to store config option key/value on the MON store
Add method mon_cmd_set_config() to save config option key and
value to the MON store. The ConfigMonitor command, 'config set' is
used to achieve this.

A corresponding get method is unnecessary since any config option
found on the MON store is loaded during OSD boot-up and set using
the md_config_t::set_mon_vals() method. Therefore, the existing
versions of ConfigProxy::get_val() method are sufficient to get
the latest value for the config option.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Kefu Chai
13c2a0e948
Merge pull request #42308 from jtlayton/wip-51644
osd: don't assert on zero-length OP_ZERO request

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-07-30 19:03:19 +08:00
Kefu Chai
4dec9ae97a
Merge pull request #42523 from mgfritch/cephadm-fsid-validate
cephadm: validate `fsid` command arg

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
2021-07-30 19:01:32 +08:00
Kefu Chai
7c32665f60
Merge pull request #42528 from liewegas/fix-51816
mon/LogMonitor: fix crash when cluster log file is not writeable

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-07-30 19:00:30 +08:00
Kefu Chai
f81d8810a2
Merge pull request #42538 from dsavineau/issue_51902
cephadm: don't use ctx.fsid for clean_cgroup

Reviewed-by: Adam King <adking@redhat.com>
2021-07-30 18:59:05 +08:00
Kefu Chai
7224c3af80
Merge pull request #42558 from tchaikov/wip-crimson-cleanup
crimson/os: cleanups for building with Clang

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 16:49:02 +08:00
Kefu Chai
2144038aed crimson/os: do not capture unused variable
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-30 14:58:00 +08:00
Kefu Chai
5f0d7cd415 crimson/os: reference this explicitly
to silence false alarm from Clang that `this` is not used.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-30 14:58:00 +08:00
Kefu Chai
79f0a2b5c6 crimson/os: do not capture labels
structured binding does not define variables, so we cannot capture them
without defining variables in capture list.

in this change, instead of using a map<> for defining labels, just
create labels on the fly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-30 14:58:00 +08:00
Kefu Chai
aefa811cfe
Merge pull request #42556 from tchaikov/wip-fair-mutex
common: add ceph::fair_mutex

Reviewed-by: Xiubo Li <xiubli@redhat.com>
2021-07-30 14:30:09 +08:00
Kefu Chai
59144a3fd0
Merge pull request #42539 from cyx1231st/wip-seastore-cache-metrics-2
crimson/os/seastore/cache: refine metrics

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-07-30 13:22:13 +08:00
Kefu Chai
8c07345d33 common: add ceph::fair_mutex
a mutex which enqueues and wakes up the waiters in FIFO order, to
ensure the fairness of the mutex.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-30 13:01:20 +08:00
Yingxin Cheng
c89d9f6a96 crimson/os/seastore: reassign extent_types_t values and remove extent_type_to_index()
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:22 +08:00
Yingxin Cheng
a059ac1e27 crimson/os/seastore/cache: misc cleanup to metrics
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:22 +08:00
Yingxin Cheng
dedd14e185 crimson/os/seastore/cache: remove derived metrics
Only keep the basic metrics to minimize the total number of metrics.

Derived metrics can be numerous according to different needs and can be
confusing with labels.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:22 +08:00
Yingxin Cheng
08a95d07b7 crimson/os/seastore/cache: remove counter labels
Do not label metrics by counter type which could be confusing.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:22 +08:00
Yingxin Cheng
38b01895ee crimson/os/seastore/cache: cleanup, replace unordered_map by array
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:05 +08:00
Ilya Dryomov
c22e44895d
Merge pull request #40965 from rokj/patch-3
doc: mention copying keyrings and adjust node names in manual deployment example

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-07-29 23:54:44 +02:00
Soumya Koduri
5e74a83a97 rgw/dbstore: Fix library link issues
Now that rgw_common is no more linked with rgw_a library (commit#7b61667),
dbstore (rgw_sal_dbstore) should be linked directly to rgw_common.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-07-29 22:09:27 +05:30