Commit Graph

125411 Commits

Author SHA1 Message Date
Sridhar Seshasayee
8725a10880 osd: Add a new config option to forcibly run OSD benchmark on init
The new config option "osd_mclock_force_run_benchmark_on_init" is
introduced to allow a user to force run the OSD benchmark test on every
OSD boot-up even if the historical data about the OSD's iops capacity is
available on the MON config store. The 'force_run_benchmark' flag is set
to the value indicated by the new config option.

By default this new config option is set to false.

The utility of this option is to help refresh the OSD iops capacity
when the underlying device's performance characteristics have changed
significantly. In such cases, the OSD can be restarted with this option
enabled temporarily. Once the new iops capacity is updated to the MON
store, this option can be removed from the OSD's start-up config.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
10f8b79ca3 osd: Add mechanism to avoid running OSD benchmark on every OSD boot-up
Use "mon_cmd_set_config()" to store the OSD's max iops capacity to
the MON store during the first bring-up. Don't run the OSD benchmark
test on subsequent boot-ups if a previously persisted iops capacity is
available on the MON store and is different from the default iops
capacity.

Add the 'force_run_benchmark' flag to force a run of the benchmark
in case the default iops capacity cannot be determined.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
9438e5a4b6 common/config: Add methods to return the default value of a config option
Add wrapper method "get_val_default()" to the ConfigProxy class that takes
the config option key to search. This method in-turn calls another method
with the same name added to md_config_t class that does the actual work of
searching for the config option. If the option is valid, _get_val_default()
is used to get the default value. Otherwise, the wrapper method returns
std::nullopt.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
1fca4bdfd4 osd: Add method to store config option key/value on the MON store
Add method mon_cmd_set_config() to save config option key and
value to the MON store. The ConfigMonitor command, 'config set' is
used to achieve this.

A corresponding get method is unnecessary since any config option
found on the MON store is loaded during OSD boot-up and set using
the md_config_t::set_mon_vals() method. Therefore, the existing
versions of ConfigProxy::get_val() method are sufficient to get
the latest value for the config option.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Kefu Chai
13c2a0e948
Merge pull request #42308 from jtlayton/wip-51644
osd: don't assert on zero-length OP_ZERO request

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-07-30 19:03:19 +08:00
Kefu Chai
4dec9ae97a
Merge pull request #42523 from mgfritch/cephadm-fsid-validate
cephadm: validate `fsid` command arg

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
2021-07-30 19:01:32 +08:00
Kefu Chai
7c32665f60
Merge pull request #42528 from liewegas/fix-51816
mon/LogMonitor: fix crash when cluster log file is not writeable

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-07-30 19:00:30 +08:00
Kefu Chai
f81d8810a2
Merge pull request #42538 from dsavineau/issue_51902
cephadm: don't use ctx.fsid for clean_cgroup

Reviewed-by: Adam King <adking@redhat.com>
2021-07-30 18:59:05 +08:00
Kefu Chai
7224c3af80
Merge pull request #42558 from tchaikov/wip-crimson-cleanup
crimson/os: cleanups for building with Clang

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 16:49:02 +08:00
Kefu Chai
2144038aed crimson/os: do not capture unused variable
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-30 14:58:00 +08:00
Kefu Chai
5f0d7cd415 crimson/os: reference this explicitly
to silence false alarm from Clang that `this` is not used.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-30 14:58:00 +08:00
Kefu Chai
79f0a2b5c6 crimson/os: do not capture labels
structured binding does not define variables, so we cannot capture them
without defining variables in capture list.

in this change, instead of using a map<> for defining labels, just
create labels on the fly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-30 14:58:00 +08:00
Kefu Chai
aefa811cfe
Merge pull request #42556 from tchaikov/wip-fair-mutex
common: add ceph::fair_mutex

Reviewed-by: Xiubo Li <xiubli@redhat.com>
2021-07-30 14:30:09 +08:00
Kefu Chai
59144a3fd0
Merge pull request #42539 from cyx1231st/wip-seastore-cache-metrics-2
crimson/os/seastore/cache: refine metrics

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-07-30 13:22:13 +08:00
Kefu Chai
8c07345d33 common: add ceph::fair_mutex
a mutex which enqueues and wakes up the waiters in FIFO order, to
ensure the fairness of the mutex.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-30 13:01:20 +08:00
Yingxin Cheng
c89d9f6a96 crimson/os/seastore: reassign extent_types_t values and remove extent_type_to_index()
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:22 +08:00
Yingxin Cheng
a059ac1e27 crimson/os/seastore/cache: misc cleanup to metrics
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:22 +08:00
Yingxin Cheng
dedd14e185 crimson/os/seastore/cache: remove derived metrics
Only keep the basic metrics to minimize the total number of metrics.

Derived metrics can be numerous according to different needs and can be
confusing with labels.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:22 +08:00
Yingxin Cheng
08a95d07b7 crimson/os/seastore/cache: remove counter labels
Do not label metrics by counter type which could be confusing.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:22 +08:00
Yingxin Cheng
38b01895ee crimson/os/seastore/cache: cleanup, replace unordered_map by array
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-30 09:42:05 +08:00
Ilya Dryomov
c22e44895d
Merge pull request #40965 from rokj/patch-3
doc: mention copying keyrings and adjust node names in manual deployment example

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2021-07-29 23:54:44 +02:00
Kefu Chai
6436cc5e13
Merge pull request #42432 from tchaikov/wip-mon-crush-cleanup
mon: let CrushWrapper::get_validated_type_id() return an optional<>

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-07-29 23:40:03 +08:00
Dimitri Savineau
9d7e1422a2
Merge pull request #42524 from guits/cv_wait_destroy_tests
ceph-volume/tests: retry when destroying osd
2021-07-29 09:42:15 -04:00
Ernesto Puerta
321bf26628
Merge pull request #42515 from rhcs-dashboard/decouple-unit-tests-from-build-dir
mgr/dashboard: backend unit tests: decouple from build dir

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
2021-07-29 14:47:24 +02:00
Kefu Chai
e8da264486
Merge pull request #42516 from tchaikov/wip-win32-snappy
win32_deps_build.sh: bump snappy version to 1.1.9

Reviewed-by: Nathan Cutler <ncutler@suse.com>
2021-07-29 17:15:16 +08:00
Rok Jaklič
bcad5d9822 doc: adding missing command. changed node naming.
Signed-off-by: Rok Jaklič <rokj@rasca.net>
2021-07-29 09:49:13 +02:00
Yingxin Cheng
3674945d30 crimson/os/seastore/cache: cleanup, rename to get_by_src()
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-29 14:58:18 +08:00
Yingxin Cheng
c7d58c05f4 crimson/os/seastore: measure cache hit ratio by src
Remove excessive amount of cache hit/access metrics by extent type.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-29 14:58:18 +08:00
Yingxin Cheng
78bf4744f6 crimson/os/seastore: measure committed efforts by extent
In order to cross-check the writes at segment manager level, and
evaluate the write amplification from each sub-component.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2021-07-29 14:58:12 +08:00
Kefu Chai
1d5b07a092 win32_deps_build.sh: only clone the tip of required tag
no need to clone the whole repo, just clone the tip of the specified
tag. this saves the bandwidth, disk IO and precious time.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-29 12:13:26 +08:00
Kefu Chai
b04248172d win32_deps_build.sh: bump snappy version to 1.1.9
in snappy, the commit of 26102a0c66175bc39edbf484c994a21902e986dc
fixes the SNAPPY_VERSION generation. and this commit was included by
v1.1.8 and v1.1.9.

also, in v1.1.9, a change was introduced, where the function signature
was changed, and more importantly, this change is not backward
compatible:

<   bool GetUncompressedLength(Source* source, uint32_t* result);
---
>   bool GetUncompressedLength(Source* source, uint32* result);

see also, https://tracker.ceph.com/issues/50934

so we check SNAPPY_VERSION to tell if we should use `uint32_t` or
`uint32`.

in this change, snappy version used to build win32 client is bumped
to the latest stable version, v1.1.9, to include the fix of
SNAPPY_VERSION. this paves the road to fix of https://tracker.ceph.com/issues/50934

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-29 12:13:26 +08:00
Nathan Cutler
4c13a798dc compression/snappy: use uint32_t to be compatible with 1.1.9
The snappy project made the following change in snappy.h between version 1.1.8
and 1.1.9:

<   bool GetUncompressedLength(Source* source, uint32_t* result);
---
>   bool GetUncompressedLength(Source* source, uint32* result);

This causes Ceph to FTBFS with snappy 1.1.9.

Thanks to Chris Denice for bringing this to our attention via Redmine.

Fixes: https://tracker.ceph.com/issues/50934
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2021-07-29 12:13:26 +08:00
Brad Hubbard
434b325c40
Merge pull request #42442 from badone/wip-insights-reports-non-persistent-storage
Don't persist report data

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-07-29 09:19:32 +10:00
Dimitri Savineau
3907ce7d6e cephadm: don't use ctx.fsid for clean_cgroup
The clean_cgroup method assumes that the ctx.fsid is set while this is
true for the bootstrap command, it isn't set for adopt or deploy commands
(and maybe others).

This ends up to the adopt command to fails:

Traceback (most recent call last):
  File "/sbin/cephadm", line 8301, in <module>
    main()
  File "/sbin/cephadm", line 8289, in main
    r = ctx.func(ctx)
  File "/sbin/cephadm", line 1764, in _default_image
    return func(ctx)
  File "/sbin/cephadm", line 5091, in command_adopt
    command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
  File "/sbin/cephadm", line 5299, in command_adopt_ceph
    osd_fsid=osd_fsid)
  File "/sbin/cephadm", line 2884, in deploy_daemon_units
    clean_cgroup(ctx, unit_name)
  File "/sbin/cephadm", line 2724, in clean_cgroup
    if not ctx.fsid:
  File "/sbin/cephadm", line 155, in __getattr__
    return super().__getattribute__(name)
AttributeError: 'CephadmContext' object has no attribute 'fsid'

Since we already have the fsid value in deploy_daemon_units (which calls
clean_cgroup) then we can pass the fsid value directly.

This fixes a regression introduced by 1fee255

Fixes: https://tracker.ceph.com/issues/51902

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-07-28 16:52:05 -04:00
Neha Ojha
847835aa7f
Merge pull request #42527 from ceph/ljflores-patch-3
doc/mgr/telemetry: fix formatting problem

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-07-28 13:00:32 -07:00
Patrick Donnelly
665b36de4e Merge PR #42349 into master
* refs/pull/42349/head:
	mon/MDSMonitor: propose if FSMap struct_v is too old
	mon/MDSMonitor: give a proper error message if FSMap struct_v is too old
	mds/FSMap: use DECODE_OLDEST to gate FSMap version
	qa: add tests for fs dump of epoch and trimming
	qa: add file system support for dumping epoch
	mon/MDSMonitor: return mon_mds_force_trim_to even if equal to current epoch
	mon: add debugging for trimming methods
	mon: fix debug spacing
	qa: add nofs upgrade suite

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
2021-07-28 10:45:08 -07:00
Patrick Donnelly
3c00047f02 Merge PR #42199 into master
* refs/pull/42199/head:
	mds: add debugging when rejecting mksnap with EPERM

Reviewed-by: Milind Changire <mchangir@redhat.com>
2021-07-28 10:36:35 -07:00
Patrick Donnelly
4f0f51e4cb Merge PR #41025 into master
* refs/pull/41025/head:
	qa: wait pgs to be clean before using the pools
	qa: ignore PG_RECOVERY_FULL and PG_DEGRADED for mds-full
	qa: wait more time since there have many more pgs than before
	qa: do not multiple the full ratio twice
	qa: do not raise for kclient for _fsync test
	qa: use the pg autoscale mode to calcuate the pg_num
	qa: set the object_size to 1M
	qa: move the is_full() to parent class

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-28 10:34:12 -07:00
Patrick Donnelly
ca905165f7 Merge PR #38388 into master
* refs/pull/38388/head:
	mds: check rejoin_ack_gather before enter rejoin_gather_finish

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Sidharth Anupkrishnan <sanupkri@redhat.com>
2021-07-28 10:30:31 -07:00
Kefu Chai
985ce536d5
Merge pull request #42453 from sebastian-philipp/githubmap-rh
.githubmap: Update Sebastian Wagner's mapping

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: David Galloway <dgallowa@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-07-29 01:01:03 +08:00
Kefu Chai
04446b967c
Merge pull request #42501 from ybwang0211/doc-cap
doc/man: add missing right parenthesis in manpage.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-07-29 00:59:10 +08:00
Kefu Chai
cdc960e0eb
Merge pull request #42495 from hjwsm1989/wip-51842
crush: cancel upmaps with up set size != pool size

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2021-07-29 00:32:00 +08:00
Kefu Chai
afb2633cb8
Merge pull request #42511 from adk3798/shutil-copy-exception
cephadm: don't fail hard on SameFileError during shutil.copy

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-07-29 00:28:53 +08:00
Kefu Chai
b148620337
Merge pull request #40337 from ideepika/wip-bugzilla-1857447
mon/PGMap: remove DIRTY field in `ceph df detail` when cache tiering  is not in use

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-07-29 00:21:49 +08:00
Kefu Chai
656ea3c171
Merge pull request #42508 from cybozu/kv-rocksdbstore-enrich-debug-message
kv/RocksDBStore: enrich debug message

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-07-29 00:20:09 +08:00
Kefu Chai
5311c2ffb5
Merge pull request #42500 from ybwang0211/doc-list-get-attr
tools/rados: improve the usage message of {get,set}omapaheader

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-07-29 00:19:51 +08:00
Sage Weil
addbb8997d mon/LogMonitor: fix crash when cluster log file is not writeable
If we are in this block, then p == channel_fds.end() and p->first is not
valid.

Also, no need to populate channel_fds with an fd of -1.

Fixes: https://tracker.ceph.com/issues/51816
Signed-off-by: Sage Weil <sage@newdream.net>
2021-07-28 11:45:19 -04:00
Laura Flores
1734647008
doc/mgr/telemetry: fix formatting problem
There was strange bolding and bullet point placement due to a missing new line in the perf description.

Signed-off-by: Laura Flores <lflores@redhat.com>
2021-07-28 10:11:17 -05:00
Kefu Chai
ff3161f8a5
Merge pull request #42502 from tchaikov/wip-bloomfilter-cleanups
include/intarith, common/bloom_filter: add popcount() and cleanups

Reviewed-by: Sage Weil <sage@redhat.com>
2021-07-28 22:56:52 +08:00
Guillaume Abrioux
38882161cc ceph-volume/tests: retry when destroying osd
Sometimes, it can happen that the osds being destroyed in those tests
are not yet marked as 'down' for some reason. Let's add some retries on
those tasks to avoid CI failures.

Fixes: https://tracker.ceph.com/issues/51903

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-07-28 16:46:33 +02:00