Commit Graph

124188 Commits

Author SHA1 Message Date
Igor Fedotov
967c5354ed
Merge pull request #41575 from ifed01/wip-ifed-fix-no-osd-daemonperf
common/PriorityCache: low perf counters priorities for submodules.

Reviewed-by: Mark Nelson <mnelson@readhat.com>
2021-06-10 23:32:00 +03:00
Abutalib Aghayev
13cd140be9 os/bluestore: Fix the size of the block in the Allocator base class to avoid
the confusing log message about the block size.

Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
2021-06-10 15:59:45 -04:00
Adam King
54055381fd cephadm: use gpg key for add-repo on ubuntu/debian
We were using the ascii version of the gpg key which
was marked as an unsupported filetype by apt-get which
caused apt-get to not make use of the repo source we
were adding.

Additionally, added aomething to make sure we update the
package list after adding the source and key

Fixes: https://tracker.ceph.com/issues/44972
Fixes: https://tracker.ceph.com/issues/45009

Signed-off-by: Adam King <adking@redhat.com>
2021-06-10 15:30:00 -04:00
Matt Benjamin
2132a5f714 rados: increase osd_max_write_op_reply_len default to 64 bytes
Agreed in #ceph-devel on 6/10.  The current controlling
rationale is that the default value should be sufficient to
marshall a SHA-512 checksum.

Fixes: https://tracker.ceph.com/issues/51166

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
2021-06-10 10:27:58 -04:00
Sage Weil
140653be0b cephadm: set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
This used to live in /etc/{sysconfig,defaults}/ceph, but that does not
apply inside the container.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-10 08:39:27 -04:00
Kefu Chai
7afd38f846 tasks/ceph_manager: ignore EACCES when waiting for quorum
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like

handle_auth_bad_method server allowed_methods [2] but i only support [2]

in the case of ceph cli, the error would look like:

[errno 13] RADOS permission denied (error connecting to the cluster)

so, to address this issue, the EACCES error is ignored when waiting
for a quorum.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-10 20:29:50 +08:00
Kefu Chai
3908c1f4cd tasks/ceph_manager: use safe_while() to refactor the wait for quorum
for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-10 20:29:50 +08:00
Kefu Chai
1a2976d8ed tools/ceph_monstore_tool: s/BOOST_SCOPE_EXIT/make_scope_guard/
more consistent this way.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-10 20:29:50 +08:00
Kefu Chai
99af14fab0 tools/ceph_monstore_tool: use make_scope_guard() for cleanup
for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-10 20:29:50 +08:00
Kefu Chai
5d431ce54d doc/rados/troubleshooting: highlight bash script with bash lexer
for better reading experience.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-10 20:29:50 +08:00
Matt Benjamin
b4f83c6223
Merge pull request #41585 from pritha-srivastava/wip-rgw-sts-session-policy-eval
rgw/sts: correcting the evaluation of session policies
2021-06-10 07:56:52 -04:00
Matt Benjamin
29820ebafe
Merge pull request #41735 from pritha-srivastava/wip-rgw-sts-ops-log-updates
rgw/sts: adding role name and role session to ops log.
2021-06-10 07:56:31 -04:00
Sebastian Wagner
b0b6429db1 cephadm: Upgrade to mypy 0.901
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-06-10 13:28:10 +02:00
Sebastian Wagner
23599d9183 pybind/mgr: Upgrade to mypy 0.901
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-06-10 13:28:10 +02:00
Sebastian Wagner
ee020ed4ff
Merge pull request #41653 from zdover23/wip-doc-cephadm-serve-man-setting-a-limit-2021-06-02
doc/cephadm: enriching "setting a limit"

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-06-10 12:52:17 +02:00
Sebastian Wagner
3afc13278e
Merge pull request #41693 from sebastian-philipp/cephadm-devenv-bootstrap-mount
doc/dev/cephadm: cephadm bootstrap --shared_ceph_folder

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
2021-06-10 12:13:40 +02:00
Kefu Chai
5475ef7843 ceph-monstore-tool: use a large enough paxos/{first,last}_committed
so the rebuild paxos transaction won't be overwritten by the ones
created before recovery completes.

when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.

so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we use a large enough number for {first,last}_committed.

Fixes: http://tracker.ceph.com/issues/38219
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-10 10:43:46 +08:00
Liu-Chunmei
2e45f16458
Merge pull request #41741 from liu-chunmei/seastore-fixe-read-invalid
crimson/seastore: fix OTree read invalid extent
2021-06-09 17:17:07 -07:00
Michael Fritch
37ff72ac9b
cephadm: validate --fsid during bootstrap
Signed-off-by: Michael Fritch <mfritch@suse.com>
2021-06-09 18:02:26 -06:00
Patrick Donnelly
26605723cf
qa: update cephfs-shell distro to ubuntu 20.04
18.04 is no longer built.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-09 16:51:49 -07:00
chunmei-liu
04ae69fdd1 crimson/seastore: fix OTree read invalid extent
Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
2021-06-09 15:34:16 -07:00
Sage Weil
64281bb394 Merge PR #41229 into master
* refs/pull/41229/head:
	qa/suites/upgrade/pacific-x/stress-split: add

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
2021-06-09 15:55:25 -04:00
Kefu Chai
75540f8760
Merge pull request #41729 from tchaikov/wip-std-filesystem-gcc-8
*: stop using <experimental/filesystem> as an alternative

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-06-09 23:41:47 +08:00
Kefu Chai
33b3e7cc22
Merge pull request #41791 from rzarzynski/wip-crimson-monc-no_auth_req_when_inactive
crimson/monc: don't serve auth requests without active mon connection.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-09 23:40:21 +08:00
Neha Ojha
c88bfc8bdd
Merge pull request #41782 from sseshasa/wip-fix-standalone-test
qa/standalone: Use osd op queue = wpq in activate_osd() within ceph-helpers.sh.

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-06-09 08:28:05 -07:00
Kefu Chai
254c1bc7d0 common/strtol: do not check for existence of <charconv>
<charconv> is available since GCC-8, see https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017

> Elementary string conversions	P0067R5	11.1 (integral types supported since 8.1)	__has_include(<charconv>), __cpp_lib_to_chars >= 201611

since we always have the access to GCC-8.1 and up, there is no need to
detect the existence of <charconv> anymore.

also, because GCC-11 introduced the support of float types support,
update the comment to reflect the change.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-09 23:26:00 +08:00
Kefu Chai
756a3512ce *: always include <filesystem>
since there is no need to be compatible with GCC older than GCC-8, so
there is no need to use <experimental/filesystem> as an alternative of
<filesystem> anymore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-09 23:26:00 +08:00
Kefu Chai
bfd911d88a os/bluestore/bluestore_tool: do not use boost:filesystem as alternative
reverts 9dedabde52

since there is no need to be compatible with GCC older than GCC-8, so
there is no need to use boost::filesystem as an alternative of
std::filesystem anymore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-09 23:25:22 +08:00
Kefu Chai
d2b0382ec6 cmake: stop detecting <experimental/filesystem>
since we've dropped the support of GCC older than v8.0, there is no need
to detect <experimental/filesystem>

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-09 23:25:22 +08:00
Kefu Chai
b6e6d15a28 cmake: require GCC-8.1 and up
for better C++17 support, for instance for a better std::filesystem
support.

the reason why 8.1 is required is that ubuntu focal provides GCC-8.1,
and RHEL/CentOS8 provides GCC-8.4.1. so we only test the build on
GCC-8.1 and up so far.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-09 23:25:22 +08:00
Kamoltat
4b00f1c2bd pybind/mg/progress: Disregard unreported pgs
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.

Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.

Fixes: https://tracker.ceph.com/issues/49988

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-06-09 15:11:32 +00:00
Sage Weil
a260fe186c Merge PR #41740 into master
* refs/pull/41740/head:
	mgr/nfs: do not depend on cephadm.utils

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-09 10:51:34 -04:00
Radoslaw Zarzynski
da3e4d9291 crimson/monc: don't serve auth requests without active mon connection.
It's yet another racing issue which happens when auth request
handling is performed during the `active_con` reset sequence.
It caused the following `nullptr` dereference at Sepia:

```
DEBUG 2021-06-09 10:27:24,059 [shard 0] ms - [osd.6(client) v2:172.21.15.170:6809/33397 >> client.? -@39840] GOT AuthRequestFrame: method=2, p
referred_modes={2, 1}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4977-g65cb255e/rpm/el8/BUILD/ceph-17.0.0-4977-g65cb255e/src/crimson/mon/MonClient.cc:595:26: runtime error: member call on null pointer of type 'struct Connection'
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4977-g65cb255e/rpm/el8/BUILD/ceph-17.0.0-4977-g65cb255e/src/crimson/mon/MonClient.cc:178:11: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
 0# 0x0000563F9C00395F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007F4A064D0B20 in /lib64/libpthread.so.0
 4# crimson::mon::Connection::get_keys() in ceph-osd
 5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
 6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
 7# 0x0000563F9D007B39 in ceph-osd
 8# 0x0000563F9D008C45 in ceph-osd
 9# 0x0000563F95FF8D70 in ceph-osd
10# 0x0000563FA1A560BF in ceph-osd
11# 0x0000563FA1A5B600 in ceph-osd
12# 0x0000563FA1C0D66B in ceph-osd
13# 0x0000563FA176B0EA in ceph-osd
14# 0x0000563FA177520E in ceph-osd
15# main in ceph-osd
16# __libc_start_main in /lib64/libc.so.6
17# _start in ceph-osd
Fault at location: 0xb0
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-06-09 14:36:36 +00:00
Patrick Donnelly
0f505dc299
qa: update scrub start code to use comma sep scrubopts
The documentation specifies this in [1] and yet we were using (I
believe) an older syntax:

    ceph tell mds.foo:0 scrub start / recursive force

instead of

    ceph tell mds.foo:0 scrub start / recursive,force

Oddly the former works at least as recently as in [2]:

    2021-06-03T07:11:42.071 DEBUG:teuthology.orchestra.run.smithi025:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub start / recursive force
    ...
    2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:{
    2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:    "return_code": 0,
    2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:    "scrub_tag": "cf7a74b2-3eb2-4657-9274-ea504b1ebf8f",
    2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:    "mode": "asynchronous"
    2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:}

[1] https://docs.ceph.com/en/latest/cephfs/scrub/
[2] /ceph/teuthology-archive/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/6148097/teuthology.log

Fixes: https://tracker.ceph.com/issues/51146
See-also: https://tracker.ceph.com/issues/51145
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-09 07:23:05 -07:00
Daniel Gryniewicz
7fe1451ea5
Merge pull request #41761 from dang/wip-dang-setattrs
RGW - Don't move attrs before setting them
2021-06-09 09:46:58 -04:00
Sebastian Wagner
b1d6f7fa50 python-common: Upgrade to mypy 0.901
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-06-09 12:56:31 +02:00
Sebastian Wagner
1f6b4744b5 qa: Upgrade to mypy 0.901
mypy 0.9 now requires stub packages

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-06-09 12:53:21 +02:00
Sebastian Wagner
20e02db29e mgr/rook: OSD create: Fix broken list-comprehension
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2021-06-09 12:39:58 +02:00
Sebastian Wagner
3fab28a55f src,qa: Upgrade to mypy 0.812
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2021-06-09 12:39:58 +02:00
Kefu Chai
997208da91 mon/MonCommands: remove obsolete mds commands
* "mds dump"
  "mds getmap",
  "mds stop",
  "mds set_max_mds",
  "mds set",
  "mds rmfailed"
  "mds add_data_pool"
  "mds rm_data_pool"
  "mds remove_data_pool"
  the commands above were marked "OBSOLETE" back in
  a8fc92933b, which was included in v13.0.1,
* "mds tell" was marked obsolete in
  e0d1127205, which was included in v12.0.2,
* "mds deactivate" was marked obsolete in
  c7bd6f02c7, which was included in v13.1.0,
* "mds newfs" was marked obsolete in
  072c41e349, which was included by v12.0.2

so according to our command retirement policy proposed by
https://ceph-users.ceph.narkive.com/iUh4e0nj/rfc-deprecating-ceph-tool-commands

> Once two major releases go by, the command will then enter the OBSOLETE
> period. This would be one major release, during which the command would
> no longer work although still acknowledged. A simple message down the
> lines of 'This command is now obsolete; please check the docs' would
> suffice to inform the user.

since the next release will be v17. it's been long enough for retiring
these OBSOLETE commands.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-09 18:17:05 +08:00
Kefu Chai
1f6b2a4bc3 mon/MonCommands: remove obsolete mon commands
the "scrub" command was marked obsolete in
e9a5ce0897, which was included by
v15.1.0, but the next release will be v17, so it's long enough to retire
this command.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-09 17:54:23 +08:00
Sridhar Seshasayee
94826eaadc qa/standalone: Use osd op queue = wpq in activate_osd()
This change is a follow-up to commit
b6e9c0903d that set the scheduler to wpq in
run_osd() and run_osd_filestore(). In addition, activate_osd() too has to
set the scheduler type to 'wpq' in order to be consistent and avoid test
failures.

The above is a temporary measure until all the standalone tests are
modified to run well with the mclock_scheduler.

Fixes: https://tracker.ceph.com/issues/51074
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-06-09 15:02:58 +05:30
Pritha Srivastava
ea61dd2c54 rgw/sts: adding role name and role session to ops log.
Also adding authentication type for all ops.

Fixes: https://tracker.ceph.com/issues/51152

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
2021-06-09 14:39:10 +05:30
Kefu Chai
09f3c21f62
Merge pull request #41727 from yanqiang-ux/handle_error_ret_FillInVerifyExtent
osd: set r only if succeed in FillInVerifyExtent

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-09 17:03:05 +08:00
dengchl01
70c6a1dba0 mgr/mgr_module:delete invalid judgment
Signed-off-by: dengchl01 <dengchl01@inspur.com>
2021-06-09 16:59:51 +08:00
yanqiang-ux
127745161f osd: set r only if succeed in FillInVerifyExtent
When read failed, ret can be taken as data len in FillInVerifyExtent, which should be avoided.
It may cause errors in crc repair or retry read because of the data len. In my case, we use FillInVerifyExtent for EC read,
when meet -EIO,we will try crc repair, which need read data from other shard accrding to data len.
And I meet assert in ECBackend.cc (loc: line 2288 ceph_assert(range.first != range.second) ), But it seems master branch not support EC crc repair.
In shot, when reuse the readop may cause unpredictable error.

Fixes: https://tracker.ceph.com/issues/51115
Signed-off-by: yanqiang-ux <yanqiang_ux@163.com>
2021-06-09 16:44:06 +08:00
Kefu Chai
13666263ab
Merge pull request #41718 from tchaikov/wip-doc-ci-deps
doc/dev: add continuous-integration

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-06-09 16:36:01 +08:00
Kefu Chai
3729788e97 doc/dev: add continuous-integration
for noting down the architecture of our CI. and how we prepare
some of the build dependencies.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-09 16:23:44 +08:00
Ernesto Puerta
6465b9a254
Merge pull request #41123 from rhcs-dashboard/host-addr-and-labels
mgr/dashboard: Include Network address and labels on Host Creation form

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2021-06-09 10:23:34 +02:00
Ernesto Puerta
2f48f7d049
Merge pull request #41548 from rhcs-dashboard/revert-npm
Revert "mgr/dashboard: Generate NPM dependencies manifest"

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2021-06-09 10:08:16 +02:00